Channels DVR stops responding randomly on macOS

Interesting. Only two minutes? Some of my prior hangs definitely outlasted two minutes, though. I'll test it out. Maybe I'll update my health check to not restart until after 3 minutes, just to see if it catches any.

It's possible it would get repeatedly stuck for two minutes at a time. We are not fully certain. The channels-dvr process would have shown 100% cpu usage in the Activity Monitor at the time if this was the same issue.

I’ve tried repeated response for longer than and constant for more than 2min, and the web server is non-responsive. Hung. In fact I’ve had it in that state for multiple days. At least in my case, Only restarting the DVR server was the only way to regain normal function. Also on a brighter note updating my server to the m1 channels (or latest?) DVR build to match my m1 mac seemed to solve my issue. So maybe it was the channels auto updater that wasn’t working, or manual download/installation of a special and or latest build of channels DVR server was needed. Still haven't had that clarified.

This seems to imply that previously you were running the x86_64 build on arm64 hardware; but when you "updated" to the arm64 build the issues went away?

(Perhaps this is a hardware/golang issue, and has nothing to do with Channels, per se. …)

It does seem that this is still happening with the recent update - away from home and can’t reach the server now after it’s been up for a few days. My DVR server is an x86 Mac mini, though I won’t be home to reset it for a couple weeks.

My DVR server updated to 2022.08.04 and restarted itself, and now I can access it remotely again. Is there any logging information that would help? Or has this issue been fixed in the new version?

Submitting diagnostics could give us some insights into what happened.

Had another extended http service unresponsive event this afternoon. Logs have been submitted as 4d6d75dd-9ba6-40ea-9ac7-2f196e1c3d17 -- It lasted about 55 minutes before I restarted it. The logs received no entries during the outage. All connections to the server timed out, and all clients gave up. But, it did continue recording in the background during the outage.

I restarted the service with a kill -QUIT, so stack traces on all threads appear to have been recorded, if that's helpful.

UPDATE: I'm actually not sure how long the outage was -- it may have been up to 13h 50min. I'd unfortunately disabled my monitoring "cron" job. But I did have a script that would curl /dvr/jobs every 10 seconds or so, and it stopped appearing in channels http logs around 1am.

Mine went unresponsive again and hasn't resumed yet - is there any logging I can get before I restart? When this happens it typically stays unresponsive until I restart it or it auto-updates, but recordings continue to happen. Obviously can't use the web interface, though I'm not sure if there' a command I can run.

I ran a /usr/bin/pkill -QUIT [c]hannels-dvr to cause the restart. @tmm said that that records additional information -- it seems to record stack traces before shutdown. Channels seems to always be recording verbose logs, so the main thing is submitting them after you restart from the WebUI under Support > Troubleshooting, which you probably already knew and have been doing. @tmm also mentioned running curl -v http://127.0.0.1:8089/status before restarting. The output of that, in my case, just shows that curl received no data until it timed out.

In the next build (v2022.08.16.2108), I've added a second http listener on port 58089 that we can use for debugging.

It would be interesting to know if both ports stop working at the same time, or if this new port keeps working when the old one dies. The new port only listens on localhost.

So please upgrade to this build when it comes out, and next time you experience the stall see if this command still works:

curl http://127.0.0.1:58089/status

and if so, you could use this to restart the DVR:

curl -XPUT http://127.0.0.1:58089/updater/force/restart

Based on these findings, perhaps we can make the DVR detect when this is happening and restart itself.

1 Like

*Update. Even with the latest channelsdvr server update web server end still freezes occasionally. Not a big deal since my solution to kill (quit/relaunch) server works a charm. Nevertheless annoying this issue has not been officially fixed yet. Until then, Kill it!!! :smiley:

Okay it appeared to be 711 minutes in the diagnostics so that's helpful to know it wasn't just 55 minutes like you had mentioned.

I have a few more requests for your debug script:

find the pid of the channels-dvr process and run lsof -nPp <pid>

also grab the output of netstat -an before restarting

These as well:

netstat -nt | grep 8089
netstat -anL
netstat -s

cc @sejmann

1 Like

This is... probably unrelated, but I've been experiencing hangs from time to time on MacOS. Today had to force kill the process today after Channels hung.

The logs don't show anything relevant to the crash, but it was associated w/ adding a show with 100+ episodes into a Virtual Channel, which seems to have maxed out the RAM on my machine. In Activity Monitor, it showed channels-dvr as using 425 threads and I think most of those were ffmpeg processes.

Update: the only error associated w/ the crash today was this line:

2022/08/21 11:36:21.298335 [TNR] Cancelling stream TVE-YouTubeTV ch6053 after no data was received for 2m0s

I'm wondering if the show that was added to the virtual channel, because it was on an SMB share, overloaded my network connection and it caused recording and indexing to both hang, and that somehow triggered a loop that look up all the RAM.

Okay, no recent hangs, but am running the below script every minute. I'll post or send the results when it catches anything. Let me know if there's anything else I might capture or change.

#!/bin/bash

export HOME="/Users/sejmann"
export PATH="/opt/local/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin"

{ date;netstat -anL | grep 8089;echo; } >>"${HOME}/Desktop/netstat.log"

if ! nc -z -G 20 127.0.0.1 8089; then 
	DATE="$(date +%F_%H_%M_%S)"
	LOG="${HOME}/Desktop/DVR_HUNG_${DATE}.txt"

	exec &> >(sed 's/^\+/\n\n###/g' >>"${LOG}")

	echo $DATE

	set -x

	curl -v http://127.0.0.1:8089/status
	tail -n15 "${HOME}/Library/Application Support/ChannelsDVR/data/channels-dvr-http.log"
	tail -n15 "${HOME}/Library/Application Support/ChannelsDVR/data/channels-dvr.log"

	lsof -nPp $(pgrep channels-dvr)
	netstat -an
	netstat -nt | grep 8089
	netstat -anL
	netstat -s
	sysctl kern.ipc.somaxconn
	netstat -anL | grep 8089

	curl http://127.0.0.1:58089/status
	if [[ $? -eq 0 ]]; then 
		curl -XPUT http://127.0.0.1:58089/updater/force/restart
	else 
		pkill -QUIT channels-dvr
	fi
fi
1 Like

Just had the Channels DVR server hang again. It was responding on port 58089, but not 8089. I ran @sejmann's script after it stopped responding - here is the output from that:

Also submitted diagnostics once the server was accessible again - the ID of the submission is 20662653-4f8c-4935-a019-79bd48c81a00

2 Likes

Could you grab the output of these two commands at the current moment:

sysctl kern.ipc.somaxconn
netstat -anL | grep 8089

On my channel's mac mini right now:

sysctl kern.ipc.somaxconn

kern.ipc.somaxconn: 128

netstat -anL | grep 8089

0/0/128 127.0.0.1.58089
15/15/128 *.8089

1 Like

Very interesting!

For comparison, here is my mac:

Current listen queue sizes (qlen/incqlen/maxqlen)
Listen         Local Address
0/0/128        127.0.0.1.58089
0/0/128        *.8089

Here's how netstat -L describes those columns:

     -L    Show the size of the various listen queues.  The first count shows the number of unaccepted
           connections.  The second count shows the amount of unaccepted incomplete connections.  The third
           count is the maximum number of queued connections.

So something is causing those to be incremented on your systems. Eventually it hits the limit of 128 and then the server stops responding altogether.

If you do a simple curl -s localhost:8089/status I assume it does nothing to change the output. Neither increment nor reset it?

What home routers are you guys using?