Channels DVR stops responding randomly on macOS

In the next build (v2022.08.16.2108), I've added a second http listener on port 58089 that we can use for debugging.

It would be interesting to know if both ports stop working at the same time, or if this new port keeps working when the old one dies. The new port only listens on localhost.

So please upgrade to this build when it comes out, and next time you experience the stall see if this command still works:

curl http://127.0.0.1:58089/status

and if so, you could use this to restart the DVR:

curl -XPUT http://127.0.0.1:58089/updater/force/restart

Based on these findings, perhaps we can make the DVR detect when this is happening and restart itself.

1 Like

*Update. Even with the latest channelsdvr server update web server end still freezes occasionally. Not a big deal since my solution to kill (quit/relaunch) server works a charm. Nevertheless annoying this issue has not been officially fixed yet. Until then, Kill it!!! :smiley:

Okay it appeared to be 711 minutes in the diagnostics so that's helpful to know it wasn't just 55 minutes like you had mentioned.

I have a few more requests for your debug script:

find the pid of the channels-dvr process and run lsof -nPp <pid>

also grab the output of netstat -an before restarting

These as well:

netstat -nt | grep 8089
netstat -anL
netstat -s

cc @sejmann

1 Like

This is... probably unrelated, but I've been experiencing hangs from time to time on MacOS. Today had to force kill the process today after Channels hung.

The logs don't show anything relevant to the crash, but it was associated w/ adding a show with 100+ episodes into a Virtual Channel, which seems to have maxed out the RAM on my machine. In Activity Monitor, it showed channels-dvr as using 425 threads and I think most of those were ffmpeg processes.

Update: the only error associated w/ the crash today was this line:

2022/08/21 11:36:21.298335 [TNR] Cancelling stream TVE-YouTubeTV ch6053 after no data was received for 2m0s

I'm wondering if the show that was added to the virtual channel, because it was on an SMB share, overloaded my network connection and it caused recording and indexing to both hang, and that somehow triggered a loop that look up all the RAM.

Okay, no recent hangs, but am running the below script every minute. I'll post or send the results when it catches anything. Let me know if there's anything else I might capture or change.

#!/bin/bash

export HOME="/Users/sejmann"
export PATH="/opt/local/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin"

{ date;netstat -anL | grep 8089;echo; } >>"${HOME}/Desktop/netstat.log"

if ! nc -z -G 20 127.0.0.1 8089; then 
	DATE="$(date +%F_%H_%M_%S)"
	LOG="${HOME}/Desktop/DVR_HUNG_${DATE}.txt"

	exec &> >(sed 's/^\+/\n\n###/g' >>"${LOG}")

	echo $DATE

	set -x

	curl -v http://127.0.0.1:8089/status
	tail -n15 "${HOME}/Library/Application Support/ChannelsDVR/data/channels-dvr-http.log"
	tail -n15 "${HOME}/Library/Application Support/ChannelsDVR/data/channels-dvr.log"

	lsof -nPp $(pgrep channels-dvr)
	netstat -an
	netstat -nt | grep 8089
	netstat -anL
	netstat -s
	sysctl kern.ipc.somaxconn
	netstat -anL | grep 8089

	curl http://127.0.0.1:58089/status
	if [[ $? -eq 0 ]]; then 
		curl -XPUT http://127.0.0.1:58089/updater/force/restart
	else 
		pkill -QUIT channels-dvr
	fi
fi
1 Like

Just had the Channels DVR server hang again. It was responding on port 58089, but not 8089. I ran @sejmann's script after it stopped responding - here is the output from that:

Also submitted diagnostics once the server was accessible again - the ID of the submission is 20662653-4f8c-4935-a019-79bd48c81a00

2 Likes

Could you grab the output of these two commands at the current moment:

sysctl kern.ipc.somaxconn
netstat -anL | grep 8089

On my channel's mac mini right now:

sysctl kern.ipc.somaxconn

kern.ipc.somaxconn: 128

netstat -anL | grep 8089

0/0/128 127.0.0.1.58089
15/15/128 *.8089

1 Like

Very interesting!

For comparison, here is my mac:

Current listen queue sizes (qlen/incqlen/maxqlen)
Listen         Local Address
0/0/128        127.0.0.1.58089
0/0/128        *.8089

Here's how netstat -L describes those columns:

     -L    Show the size of the various listen queues.  The first count shows the number of unaccepted
           connections.  The second count shows the amount of unaccepted incomplete connections.  The third
           count is the maximum number of queued connections.

So something is causing those to be incremented on your systems. Eventually it hits the limit of 128 and then the server stops responding altogether.

If you do a simple curl -s localhost:8089/status I assume it does nothing to change the output. Neither increment nor reset it?

What home routers are you guys using?

Interesting. The curl didn't increment, nor reset, the netstat unaccepted connections -- still at 15.

I'm using a Ubiquity EdgeMax ER-12 router with most recent (v2.0.9-hotfix.4) firmware, with no firewall rules on the LAN side, the Mac connected directly via 1GB ethernet.

I'm also using Little Snitch on the Mac mini, which for MacOS 12 changed the way they monitor and block traffic, moving from KEXT to Apple's new NetworkExtension API, although I had the network filter turned entirely off until recently, and only used it to monitor. Also, it's always had "Allow any incoming connections " and "Allow any outgoing connections" rules for channels-dvr.

I'll update my script to include that netstat.

1 Like

Interesting.. I wonder if the others here affected also are using Little Snitch.

So your output is still sitting at 15/15/128?

I was able to make those numbers go up on my own M1 mini using hping3, but I observe them resetting back to zero within a couple seconds.

$ brew install hping
$ echo; date; sudo /opt/homebrew/Cellar/hping/3.20051105/sbin/hping3 -q -c 256 -S -p 8089 --faster localhost; echo; date; netstat -anL | grep 8089; netstat -nt | grep 8089 | head -6; sleep 2; echo; date; netstat -anL | grep 8089

Fri Aug 26 14:48:05 PDT 2022
HPING localhost (utun13 127.0.0.1): S set, 40 headers + 0 data bytes
--- localhost hping statistic ---
256 packets tramitted, 0 packets received, 100% packet loss
round-trip min/avg/max = 0.0/0.0/0.0 ms

Fri Aug 26 14:48:05 PDT 2022
0/0/128        127.0.0.1.58089
128/128/128    *.8089
tcp4       0      0  127.0.0.1.8089         127.0.0.1.2722         SYN_RCVD
tcp4       0      0  127.0.0.1.8089         127.0.0.1.2721         SYN_RCVD
tcp4       0      0  127.0.0.1.8089         127.0.0.1.2720         SYN_RCVD
tcp4       0      0  127.0.0.1.8089         127.0.0.1.2719         SYN_RCVD
tcp4       0      0  127.0.0.1.8089         127.0.0.1.2718         SYN_RCVD
tcp4       0      0  127.0.0.1.8089         127.0.0.1.2717         SYN_RCVD

Fri Aug 26 14:48:07 PDT 2022
0/0/128        127.0.0.1.58089
0/0/128        *.8089
1 Like

Yes, the output of netstat -anL | grep 8089 remained 15/15/128 *.8089 after running curl -s localhost:8089/status -- repeated runs of curl to localhost don't seem to increment it.

However, I just ran the netstat again, and now it's 17/17/128 *.8089 -- I'm not sure what incremented it, but another household member is now using channels in the other room via (an ethernet connected) Apple TV. And now it's 18/18/128, before even attempting your hping script.

Now I'll run: sudo hping3 -q -c 256 -S -p 8089 --faster localhost; echo; date; netstat -anL | grep 8089; netstat -nt | grep 8089 | head -6; sleep 2; echo; date; netstat -anL | grep 8089 -- it returns:

HPING localhost (utun4 127.0.0.1): S set, 40 headers + 0 data bytes

--- localhost hping statistic ---
256 packets tramitted, 0 packets received, 100% packet loss
round-trip min/avg/max = 0.0/0.0/0.0 ms

Fri Aug 26 15:05:12 PDT 2022
0/0/128        127.0.0.1.58089
128/128/128    *.8089
tcp4       0 2622136  10.0.0.11.8089         10.0.0.30.57783        ESTABLISHED
tcp4       0      0  10.0.0.11.8089         10.0.0.30.50283        ESTABLISHED
tcp4       0      0  10.0.0.11.8089         10.0.0.30.50259        ESTABLISHED
tcp4       0      0  10.0.0.11.8089         10.0.0.30.50250        ESTABLISHED
tcp4       0      0  10.0.0.11.52639        10.0.0.11.8089         TIME_WAIT
tcp4       0      0  10.0.0.11.52640        10.0.0.11.8089         TIME_WAIT

Fri Aug 26 15:05:14 PDT 2022
0/0/128        127.0.0.1.58089
18/18/128      *.8089

So, that's weird. It goes up to 128, but then resets back to 18, instead of 0.

Very interesting indeed! Sounds like we're onto something, although I'm not sure what. It sounds like a bug in the macOS kernel to me.

Does systemextensionsctl list show anything besides little snitch?

Nope, just little snitch. (endpoint_security one is their tap into the Berkeley packet filter, which I only turned on last week.)

$ systemextensionsctl list
2 extension(s)
--- com.apple.system_extension.network_extension
enabled	active	teamID	bundleID (version)	name	[state]
*	*	MLZF7K7B5R	at.obdev.littlesnitch.networkextension (5.4.1/6256)	Little Snitch Network Extension	[activated enabled]
--- com.apple.system_extension.endpoint_security
enabled	active	teamID	bundleID (version)	name	[state]
*	*	MLZF7K7B5R	at.obdev.littlesnitch.endpointsecurity (5.4.1/6256)	Little Snitch Endpoint Security	[activated enabled]

I wonder if @thully is also using Little Snitch. I can remove Little Snitch entirely shortly, to help rule it out or in, I guess, at least to see if it affects unaccepted packets.

Now that we have a signal to look at, here's a few things I would try:

  1. restart channels-dvr so the numbers reset to 0/0/128
  2. turn off remote port forwarding so no traffic from the internet is coming through
  3. wait a day and see if the qlen is still 0 or not

then try the same but with

  1. disable little snitch

Here's my output:
sysctl kern.ipc.somaxconn
kern.ipc.somaxconn: 128

netstat -anL | grep 8089

0/0/128        127.0.0.1.58089        
19/19/128      *.8089

I'm using the Arris BGW210 provided with my AT&T fiber connection, as well as an old Time Capsule with routing disabled (behaving as a switch) which my Mac is connected to by Ethernet. I'm not using Little Snitch.

When you say disable remote port forwarding, do you mean on my router or in Channels (i.e. disabling remote DVR)?

Okay, I'll be recording the netstat -anL minutely. Before your reply, I'd jumped the gun and uninstalled Little Snitch, so I guess my first testing period will be without Little Snitch in the mix. I've turned off Remote DVR. Unfortunately, we won't be watching any TV this weekend, so we won't be exercising channels in the usual way until Monday, although I can leave an Apple TV client playing continuously. Also installed v2022.08.26.2245 with improved macOS diagnostics.

Okay, @thully's at 19 -- definitely on to something with this! And no Little Snitch, so that's probably not a root cause.

I can maybe add a WAN port forward from :8089 to :58089 and see if any external traffic is causing the unaccepted packet increments -- I noticed @thully had some external Censys traffic in the hours before his last hang -- I wonder if they're doing some kind of error response fingerprinting involving partial packets. -- oh, 58089 is localhost only. I can't do that. Oh well.

@thully : Yes, you're exactly right, turning off "Remote DVR" shuts off the NATPMP port forwarding in your Airport Time Machine to channels, so external traffic won't reach Channel's DVR. Although, you won't be able to connect remotely, of course, with that turned off, in case you're traveling.

1 Like

The :58089 listener is only on localhost, so it won't accept packets from elsewhere. It's not setup the same way with SSL support.

You could do a simple socat listener on a random port though, expose that and see if the qlen ticks up.

Another thing that may make a difference is bumping the global somaxconn limit. Although when I tried, netstat -anL still reported a maxqlen of 128.

sudo sysctl kern.ipc.somaxconn=8192

Same here