Channels DVR stops responding randomly on macOS

eric · August 8, 2022, 5:52pm

Submitting diagnostics could give us some insights into what happened.

sejmann · August 15, 2022, 9:06pm

Had another extended http service unresponsive event this afternoon. Logs have been submitted as 4d6d75dd-9ba6-40ea-9ac7-2f196e1c3d17 -- It lasted about 55 minutes before I restarted it. The logs received no entries during the outage. All connections to the server timed out, and all clients gave up. But, it did continue recording in the background during the outage.

I restarted the service with a kill -QUIT, so stack traces on all threads appear to have been recorded, if that's helpful.

UPDATE: I'm actually not sure how long the outage was -- it may have been up to 13h 50min. I'd unfortunately disabled my monitoring "cron" job. But I did have a script that would curl /dvr/jobs every 10 seconds or so, and it stopped appearing in channels http logs around 1am.

thully · August 15, 2022, 9:45pm

Mine went unresponsive again and hasn't resumed yet - is there any logging I can get before I restart? When this happens it typically stays unresponsive until I restart it or it auto-updates, but recordings continue to happen. Obviously can't use the web interface, though I'm not sure if there' a command I can run.

sejmann · August 15, 2022, 9:47pm

I ran a /usr/bin/pkill -QUIT [c]hannels-dvr to cause the restart. @tmm said that that records additional information -- it seems to record stack traces before shutdown. Channels seems to always be recording verbose logs, so the main thing is submitting them after you restart from the WebUI under Support > Troubleshooting, which you probably already knew and have been doing. @tmm also mentioned running curl -v http://127.0.0.1:8089/status before restarting. The output of that, in my case, just shows that curl received no data until it timed out.

tmm1 · August 16, 2022, 9:11pm

In the next build (v2022.08.16.2108), I've added a second http listener on port 58089 that we can use for debugging.

It would be interesting to know if both ports stop working at the same time, or if this new port keeps working when the old one dies. The new port only listens on localhost.

So please upgrade to this build when it comes out, and next time you experience the stall see if this command still works:

curl http://127.0.0.1:58089/status

and if so, you could use this to restart the DVR:

curl -XPUT http://127.0.0.1:58089/updater/force/restart

Based on these findings, perhaps we can make the DVR detect when this is happening and restart itself.

justinmk3 · August 17, 2022, 2:46am

*Update. Even with the latest channelsdvr server update web server end still freezes occasionally. Not a big deal since my solution to kill (quit/relaunch) server works a charm. Nevertheless annoying this issue has not been officially fixed yet. Until then, Kill it!!!

tmm1 · August 19, 2022, 5:30am

Okay it appeared to be 711 minutes in the diagnostics so that's helpful to know it wasn't just 55 minutes like you had mentioned.

I have a few more requests for your debug script:

find the pid of the channels-dvr process and run lsof -nPp <pid>

also grab the output of netstat -an before restarting

These as well:

netstat -nt | grep 8089
netstat -anL
netstat -s

cc @sejmann

mikehearn · August 21, 2022, 4:04pm

This is... probably unrelated, but I've been experiencing hangs from time to time on MacOS. Today had to force kill the process today after Channels hung.

The logs don't show anything relevant to the crash, but it was associated w/ adding a show with 100+ episodes into a Virtual Channel, which seems to have maxed out the RAM on my machine. In Activity Monitor, it showed channels-dvr as using 425 threads and I think most of those were ffmpeg processes.

Update: the only error associated w/ the crash today was this line:

2022/08/21 11:36:21.298335 [TNR] Cancelling stream TVE-YouTubeTV ch6053 after no data was received for 2m0s

I'm wondering if the show that was added to the virtual channel, because it was on an SMB share, overloaded my network connection and it caused recording and indexing to both hang, and that somehow triggered a loop that look up all the RAM.

sejmann · August 23, 2022, 3:37am

Okay, no recent hangs, but am running the below script every minute. I'll post or send the results when it catches anything. Let me know if there's anything else I might capture or change.

#!/bin/bash

export HOME="/Users/sejmann"
export PATH="/opt/local/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin"

{ date;netstat -anL | grep 8089;echo; } >>"${HOME}/Desktop/netstat.log"

if ! nc -z -G 20 127.0.0.1 8089; then 
	DATE="$(date +%F_%H_%M_%S)"
	LOG="${HOME}/Desktop/DVR_HUNG_${DATE}.txt"

	exec &> >(sed 's/^\+/\n\n###/g' >>"${LOG}")

	echo $DATE

	set -x

	curl -v http://127.0.0.1:8089/status
	tail -n15 "${HOME}/Library/Application Support/ChannelsDVR/data/channels-dvr-http.log"
	tail -n15 "${HOME}/Library/Application Support/ChannelsDVR/data/channels-dvr.log"

	lsof -nPp $(pgrep channels-dvr)
	netstat -an
	netstat -nt | grep 8089
	netstat -anL
	netstat -s
	sysctl kern.ipc.somaxconn
	netstat -anL | grep 8089

	curl http://127.0.0.1:58089/status
	if [[ $? -eq 0 ]]; then 
		curl -XPUT http://127.0.0.1:58089/updater/force/restart
	else 
		pkill -QUIT channels-dvr
	fi
fi

thully · August 24, 2022, 3:43pm

Just had the Channels DVR server hang again. It was responding on port 58089, but not 8089. I ran @sejmann's script after it stopped responding - here is the output from that:

gist.github.com

https://gist.github.com/thully/03bc2a4e73cae53e079e2db66d3bebb1

DVR_HUNG_2022-08-24_11_31_55.txt

2022-08-24_11_31_55


### curl -v http://127.0.0.1:8089/status
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying 127.0.0.1:8089...

  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0

This file has been truncated. show original

Also submitted diagnostics once the server was accessible again - the ID of the submission is 20662653-4f8c-4935-a019-79bd48c81a00

tmm1 · August 26, 2022, 8:22pm

Could you grab the output of these two commands at the current moment:

sysctl kern.ipc.somaxconn
netstat -anL | grep 8089

sejmann · August 26, 2022, 9:19pm

On my channel's mac mini right now:

sysctl kern.ipc.somaxconn

kern.ipc.somaxconn: 128

netstat -anL | grep 8089

0/0/128 127.0.0.1.58089
15/15/128 *.8089

tmm1 · August 26, 2022, 9:23pm

Very interesting!

For comparison, here is my mac:

Current listen queue sizes (qlen/incqlen/maxqlen)
Listen         Local Address
0/0/128        127.0.0.1.58089
0/0/128        *.8089

Here's how netstat -L describes those columns:

     -L    Show the size of the various listen queues.  The first count shows the number of unaccepted
           connections.  The second count shows the amount of unaccepted incomplete connections.  The third
           count is the maximum number of queued connections.

So something is causing those to be incremented on your systems. Eventually it hits the limit of 128 and then the server stops responding altogether.

If you do a simple curl -s localhost:8089/status I assume it does nothing to change the output. Neither increment nor reset it?

tmm1 · August 26, 2022, 9:27pm

What home routers are you guys using?

sejmann · August 26, 2022, 9:51pm

Interesting. The curl didn't increment, nor reset, the netstat unaccepted connections -- still at 15.

I'm using a Ubiquity EdgeMax ER-12 router with most recent (v2.0.9-hotfix.4) firmware, with no firewall rules on the LAN side, the Mac connected directly via 1GB ethernet.

I'm also using Little Snitch on the Mac mini, which for MacOS 12 changed the way they monitor and block traffic, moving from KEXT to Apple's new NetworkExtension API, although I had the network filter turned entirely off until recently, and only used it to monitor. Also, it's always had "Allow any incoming connections " and "Allow any outgoing connections" rules for channels-dvr.

I'll update my script to include that netstat.

tmm1 · August 26, 2022, 9:54pm

Interesting.. I wonder if the others here affected also are using Little Snitch.

So your output is still sitting at 15/15/128?

I was able to make those numbers go up on my own M1 mini using hping3, but I observe them resetting back to zero within a couple seconds.

$ brew install hping
$ echo; date; sudo /opt/homebrew/Cellar/hping/3.20051105/sbin/hping3 -q -c 256 -S -p 8089 --faster localhost; echo; date; netstat -anL | grep 8089; netstat -nt | grep 8089 | head -6; sleep 2; echo; date; netstat -anL | grep 8089

Fri Aug 26 14:48:05 PDT 2022
HPING localhost (utun13 127.0.0.1): S set, 40 headers + 0 data bytes
--- localhost hping statistic ---
256 packets tramitted, 0 packets received, 100% packet loss
round-trip min/avg/max = 0.0/0.0/0.0 ms

Fri Aug 26 14:48:05 PDT 2022
0/0/128        127.0.0.1.58089
128/128/128    *.8089
tcp4       0      0  127.0.0.1.8089         127.0.0.1.2722         SYN_RCVD
tcp4       0      0  127.0.0.1.8089         127.0.0.1.2721         SYN_RCVD
tcp4       0      0  127.0.0.1.8089         127.0.0.1.2720         SYN_RCVD
tcp4       0      0  127.0.0.1.8089         127.0.0.1.2719         SYN_RCVD
tcp4       0      0  127.0.0.1.8089         127.0.0.1.2718         SYN_RCVD
tcp4       0      0  127.0.0.1.8089         127.0.0.1.2717         SYN_RCVD

Fri Aug 26 14:48:07 PDT 2022
0/0/128        127.0.0.1.58089
0/0/128        *.8089

sejmann · August 26, 2022, 10:06pm

Yes, the output of netstat -anL | grep 8089 remained 15/15/128 *.8089 after running curl -s localhost:8089/status -- repeated runs of curl to localhost don't seem to increment it.

However, I just ran the netstat again, and now it's 17/17/128 *.8089 -- I'm not sure what incremented it, but another household member is now using channels in the other room via (an ethernet connected) Apple TV. And now it's 18/18/128, before even attempting your hping script.

Now I'll run: sudo hping3 -q -c 256 -S -p 8089 --faster localhost; echo; date; netstat -anL | grep 8089; netstat -nt | grep 8089 | head -6; sleep 2; echo; date; netstat -anL | grep 8089 -- it returns:

HPING localhost (utun4 127.0.0.1): S set, 40 headers + 0 data bytes

--- localhost hping statistic ---
256 packets tramitted, 0 packets received, 100% packet loss
round-trip min/avg/max = 0.0/0.0/0.0 ms

Fri Aug 26 15:05:12 PDT 2022
0/0/128        127.0.0.1.58089
128/128/128    *.8089
tcp4       0 2622136  10.0.0.11.8089         10.0.0.30.57783        ESTABLISHED
tcp4       0      0  10.0.0.11.8089         10.0.0.30.50283        ESTABLISHED
tcp4       0      0  10.0.0.11.8089         10.0.0.30.50259        ESTABLISHED
tcp4       0      0  10.0.0.11.8089         10.0.0.30.50250        ESTABLISHED
tcp4       0      0  10.0.0.11.52639        10.0.0.11.8089         TIME_WAIT
tcp4       0      0  10.0.0.11.52640        10.0.0.11.8089         TIME_WAIT

Fri Aug 26 15:05:14 PDT 2022
0/0/128        127.0.0.1.58089
18/18/128      *.8089

So, that's weird. It goes up to 128, but then resets back to 18, instead of 0.

tmm1 · August 26, 2022, 10:15pm

Very interesting indeed! Sounds like we're onto something, although I'm not sure what. It sounds like a bug in the macOS kernel to me.

Does systemextensionsctl list show anything besides little snitch?

sejmann · August 26, 2022, 10:18pm

Nope, just little snitch. (endpoint_security one is their tap into the Berkeley packet filter, which I only turned on last week.)

$ systemextensionsctl list
2 extension(s)
--- com.apple.system_extension.network_extension
enabled	active	teamID	bundleID (version)	name	[state]
*	*	MLZF7K7B5R	at.obdev.littlesnitch.networkextension (5.4.1/6256)	Little Snitch Network Extension	[activated enabled]
--- com.apple.system_extension.endpoint_security
enabled	active	teamID	bundleID (version)	name	[state]
*	*	MLZF7K7B5R	at.obdev.littlesnitch.endpointsecurity (5.4.1/6256)	Little Snitch Endpoint Security	[activated enabled]

I wonder if @thully is also using Little Snitch. I can remove Little Snitch entirely shortly, to help rule it out or in, I guess, at least to see if it affects unaccepted packets.

tmm1 · August 26, 2022, 10:28pm

Now that we have a signal to look at, here's a few things I would try:

restart channels-dvr so the numbers reset to 0/0/128
turn off remote port forwarding so no traffic from the internet is coming through
wait a day and see if the qlen is still 0 or not

then try the same but with

disable little snitch