My DVR server updated to 2022.08.04 and restarted itself, and now I can access it remotely again. Is there any logging information that would help? Or has this issue been fixed in the new version?
Submitting diagnostics could give us some insights into what happened.
Had another extended http service unresponsive event this afternoon. Logs have been submitted as 4d6d75dd-9ba6-40ea-9ac7-2f196e1c3d17 -- It lasted about 55 minutes before I restarted it. The logs received no entries during the outage. All connections to the server timed out, and all clients gave up. But, it did continue recording in the background during the outage.
I restarted the service with a kill -QUIT, so stack traces on all threads appear to have been recorded, if that's helpful.
UPDATE: I'm actually not sure how long the outage was -- it may have been up to 13h 50min. I'd unfortunately disabled my monitoring "cron" job. But I did have a script that would curl /dvr/jobs every 10 seconds or so, and it stopped appearing in channels http logs around 1am.
Mine went unresponsive again and hasn't resumed yet - is there any logging I can get before I restart? When this happens it typically stays unresponsive until I restart it or it auto-updates, but recordings continue to happen. Obviously can't use the web interface, though I'm not sure if there' a command I can run.
I ran a /usr/bin/pkill -QUIT [c]hannels-dvr to cause the restart. @tmm said that that records additional information -- it seems to record stack traces before shutdown. Channels seems to always be recording verbose logs, so the main thing is submitting them after you restart from the WebUI under Support > Troubleshooting, which you probably already knew and have been doing. @tmm also mentioned running curl -v http://127.0.0.1:8089/status before restarting. The output of that, in my case, just shows that curl received no data until it timed out.
In the next build (v2022.08.16.2108), I've added a second http listener on port 58089 that we can use for debugging.
It would be interesting to know if both ports stop working at the same time, or if this new port keeps working when the old one dies. The new port only listens on localhost.
So please upgrade to this build when it comes out, and next time you experience the stall see if this command still works:
and if so, you could use this to restart the DVR:
curl -XPUT http://127.0.0.1:58089/updater/force/restart
Based on these findings, perhaps we can make the DVR detect when this is happening and restart itself.
*Update. Even with the latest channelsdvr server update web server end still freezes occasionally. Not a big deal since my solution to kill (quit/relaunch) server works a charm. Nevertheless annoying this issue has not been officially fixed yet. Until then, Kill it!!! 
Okay it appeared to be 711 minutes in the diagnostics so that's helpful to know it wasn't just 55 minutes like you had mentioned.
I have a few more requests for your debug script:
find the pid of the channels-dvr process and run lsof -nPp <pid>
also grab the output of netstat -an before restarting
These as well:
netstat -nt | grep 8089
netstat -anL
netstat -s
cc @sejmann
This is... probably unrelated, but I've been experiencing hangs from time to time on MacOS. Today had to force kill the process today after Channels hung.
The logs don't show anything relevant to the crash, but it was associated w/ adding a show with 100+ episodes into a Virtual Channel, which seems to have maxed out the RAM on my machine. In Activity Monitor, it showed channels-dvr as using 425 threads and I think most of those were ffmpeg processes.
Update: the only error associated w/ the crash today was this line:
2022/08/21 11:36:21.298335 [TNR] Cancelling stream TVE-YouTubeTV ch6053 after no data was received for 2m0s
I'm wondering if the show that was added to the virtual channel, because it was on an SMB share, overloaded my network connection and it caused recording and indexing to both hang, and that somehow triggered a loop that look up all the RAM.
Okay, no recent hangs, but am running the below script every minute. I'll post or send the results when it catches anything. Let me know if there's anything else I might capture or change.
#!/bin/bash
export HOME="/Users/sejmann"
export PATH="/opt/local/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin"
{ date;netstat -anL | grep 8089;echo; } >>"${HOME}/Desktop/netstat.log"
if ! nc -z -G 20 127.0.0.1 8089; then
DATE="$(date +%F_%H_%M_%S)"
LOG="${HOME}/Desktop/DVR_HUNG_${DATE}.txt"
exec &> >(sed 's/^\+/\n\n###/g' >>"${LOG}")
echo $DATE
set -x
curl -v http://127.0.0.1:8089/status
tail -n15 "${HOME}/Library/Application Support/ChannelsDVR/data/channels-dvr-http.log"
tail -n15 "${HOME}/Library/Application Support/ChannelsDVR/data/channels-dvr.log"
lsof -nPp $(pgrep channels-dvr)
netstat -an
netstat -nt | grep 8089
netstat -anL
netstat -s
sysctl kern.ipc.somaxconn
netstat -anL | grep 8089
curl http://127.0.0.1:58089/status
if [[ $? -eq 0 ]]; then
curl -XPUT http://127.0.0.1:58089/updater/force/restart
else
pkill -QUIT channels-dvr
fi
fi
Just had the Channels DVR server hang again. It was responding on port 58089, but not 8089. I ran @sejmann's script after it stopped responding - here is the output from that:
Also submitted diagnostics once the server was accessible again - the ID of the submission is 20662653-4f8c-4935-a019-79bd48c81a00
Could you grab the output of these two commands at the current moment:
sysctl kern.ipc.somaxconn
netstat -anL | grep 8089
On my channel's mac mini right now:
sysctl kern.ipc.somaxconn
kern.ipc.somaxconn: 128
netstat -anL | grep 8089
0/0/128 127.0.0.1.58089
15/15/128 *.8089
Very interesting!
For comparison, here is my mac:
Current listen queue sizes (qlen/incqlen/maxqlen)
Listen Local Address
0/0/128 127.0.0.1.58089
0/0/128 *.8089
Here's how netstat -L describes those columns:
-L Show the size of the various listen queues. The first count shows the number of unaccepted
connections. The second count shows the amount of unaccepted incomplete connections. The third
count is the maximum number of queued connections.
So something is causing those to be incremented on your systems. Eventually it hits the limit of 128 and then the server stops responding altogether.
If you do a simple curl -s localhost:8089/status I assume it does nothing to change the output. Neither increment nor reset it?
What home routers are you guys using?
Interesting. The curl didn't increment, nor reset, the netstat unaccepted connections -- still at 15.
I'm using a Ubiquity EdgeMax ER-12 router with most recent (v2.0.9-hotfix.4) firmware, with no firewall rules on the LAN side, the Mac connected directly via 1GB ethernet.
I'm also using Little Snitch on the Mac mini, which for MacOS 12 changed the way they monitor and block traffic, moving from KEXT to Apple's new NetworkExtension API, although I had the network filter turned entirely off until recently, and only used it to monitor. Also, it's always had "Allow any incoming connections " and "Allow any outgoing connections" rules for channels-dvr.
I'll update my script to include that netstat.
Interesting.. I wonder if the others here affected also are using Little Snitch.
So your output is still sitting at 15/15/128?
I was able to make those numbers go up on my own M1 mini using hping3, but I observe them resetting back to zero within a couple seconds.
$ brew install hping
$ echo; date; sudo /opt/homebrew/Cellar/hping/3.20051105/sbin/hping3 -q -c 256 -S -p 8089 --faster localhost; echo; date; netstat -anL | grep 8089; netstat -nt | grep 8089 | head -6; sleep 2; echo; date; netstat -anL | grep 8089
Fri Aug 26 14:48:05 PDT 2022
HPING localhost (utun13 127.0.0.1): S set, 40 headers + 0 data bytes
--- localhost hping statistic ---
256 packets tramitted, 0 packets received, 100% packet loss
round-trip min/avg/max = 0.0/0.0/0.0 ms
Fri Aug 26 14:48:05 PDT 2022
0/0/128 127.0.0.1.58089
128/128/128 *.8089
tcp4 0 0 127.0.0.1.8089 127.0.0.1.2722 SYN_RCVD
tcp4 0 0 127.0.0.1.8089 127.0.0.1.2721 SYN_RCVD
tcp4 0 0 127.0.0.1.8089 127.0.0.1.2720 SYN_RCVD
tcp4 0 0 127.0.0.1.8089 127.0.0.1.2719 SYN_RCVD
tcp4 0 0 127.0.0.1.8089 127.0.0.1.2718 SYN_RCVD
tcp4 0 0 127.0.0.1.8089 127.0.0.1.2717 SYN_RCVD
Fri Aug 26 14:48:07 PDT 2022
0/0/128 127.0.0.1.58089
0/0/128 *.8089
Yes, the output of netstat -anL | grep 8089 remained 15/15/128 *.8089 after running curl -s localhost:8089/status -- repeated runs of curl to localhost don't seem to increment it.
However, I just ran the netstat again, and now it's 17/17/128 *.8089 -- I'm not sure what incremented it, but another household member is now using channels in the other room via (an ethernet connected) Apple TV. And now it's 18/18/128, before even attempting your hping script.
Now I'll run: sudo hping3 -q -c 256 -S -p 8089 --faster localhost; echo; date; netstat -anL | grep 8089; netstat -nt | grep 8089 | head -6; sleep 2; echo; date; netstat -anL | grep 8089 -- it returns:
HPING localhost (utun4 127.0.0.1): S set, 40 headers + 0 data bytes
--- localhost hping statistic ---
256 packets tramitted, 0 packets received, 100% packet loss
round-trip min/avg/max = 0.0/0.0/0.0 ms
Fri Aug 26 15:05:12 PDT 2022
0/0/128 127.0.0.1.58089
128/128/128 *.8089
tcp4 0 2622136 10.0.0.11.8089 10.0.0.30.57783 ESTABLISHED
tcp4 0 0 10.0.0.11.8089 10.0.0.30.50283 ESTABLISHED
tcp4 0 0 10.0.0.11.8089 10.0.0.30.50259 ESTABLISHED
tcp4 0 0 10.0.0.11.8089 10.0.0.30.50250 ESTABLISHED
tcp4 0 0 10.0.0.11.52639 10.0.0.11.8089 TIME_WAIT
tcp4 0 0 10.0.0.11.52640 10.0.0.11.8089 TIME_WAIT
Fri Aug 26 15:05:14 PDT 2022
0/0/128 127.0.0.1.58089
18/18/128 *.8089
So, that's weird. It goes up to 128, but then resets back to 18, instead of 0.
Very interesting indeed! Sounds like we're onto something, although I'm not sure what. It sounds like a bug in the macOS kernel to me.
Does systemextensionsctl list show anything besides little snitch?
Nope, just little snitch. (endpoint_security one is their tap into the Berkeley packet filter, which I only turned on last week.)
$ systemextensionsctl list
2 extension(s)
--- com.apple.system_extension.network_extension
enabled active teamID bundleID (version) name [state]
* * MLZF7K7B5R at.obdev.littlesnitch.networkextension (5.4.1/6256) Little Snitch Network Extension [activated enabled]
--- com.apple.system_extension.endpoint_security
enabled active teamID bundleID (version) name [state]
* * MLZF7K7B5R at.obdev.littlesnitch.endpointsecurity (5.4.1/6256) Little Snitch Endpoint Security [activated enabled]
I wonder if @thully is also using Little Snitch. I can remove Little Snitch entirely shortly, to help rule it out or in, I guess, at least to see if it affects unaccepted packets.