Channels DVR stops responding randomly on macOS

When its hung, what does curl -v http://127.0.0.1:8089/status report?

You can also send kill -QUIT when its stuck, which will print out some debugging information to the log which you can send to us.

1 Like

@justinmk3 FYI you appear to be running the Intel emulated version of the DVR on your M1 mini, instead of the native ARM64 version. I don't know if that's related to the issues you're having.

Sejmon. Awesome! I can confirm that so far no data corruption or any issues with the methodology I posted previously despite the warning from Aman have happened. I can also confirm that indeed it gracefully exits/restarts despite being terminated via kill. Lastly although not as important, but just more a curious thing, by/why Amman’s and others warnings about restarting the server via this method would “leave databases in a corrupted state” , and furthermore launching the binary directly “won’t setup logging”. Both of which I have tested and so far found to be NOT accurate. I wonder was it based on theoretical, or based on actually trying it in the past and having issue?

There are many issues with this command.

I think you meant killall ... || open not |. The | will pipe the output of kill into the open command, which makes no sense. That also means the open will be executed before the kill finishes, which is not ideal and could cause two processes accessing the database at the same time.

~/Users/Library is not the correct path. You meant ~/Library.

Database should generally be safe unless you force kill (i.e. killall -9). So I was mistaken before and using kill or killall is generally safe, because the server will catch the signal and shut down gracefully. The initial description sounded like the process was fully stuck, but if its able to catch the signal then I guess its still working fine apart from the web UI.

The curl command I suggested earlier will help narrow down whether http server is working at all, vs some problem with the web UI not loading fully.

As far as logging, the channels-dvr binary requires two things to run correct: a proper CWD and output redirection to the log file. The proper invocation is:

cd ~/"Library/Application Support/ChannelsDVR/data" && ../latest/channels-dvr >> channels-dvr.log 2>&1

This ensures that your data is saved in the data folder, and the output is saved in the correct log file in that directory.

If you're running it from a cron I'm not sure what directory your settings.db and recorder.db are ending up in.

1 Like

Thanks again. Yes as myself and sejmon stated b4 the web server is haulted but the other functions of the DVR recording etc still work. … thus curl commands Do not work. Yes semantics with my first post, (don’t drink and post kids) :stuck_out_tongue_winking_eye: … but the gist being the solution of using kill works, and so far does not appear to cause any corruption or launching issues. Also thanks for the heads up about the m1 native version of channels DVR server. I would have thought the auto update would have done that, but guess not? In conclusion, would be great if you could add a official restart method that actually works In this case and maybe all, (using both methods over fail safe) to the CLI

Alright, I've updated my script to run curl -v to the status URL and run pkill -QUIT and capture the channels-dvr.log and channels-dvr-http.log before and after the kill, just in case.

I'll update here when it happens next, presuming my convoluted one-liner works properly.

/bin/sh -c '/opt/local/bin/timeout 2 /usr/bin/nc -z 127.0.0.1 8089 || /usr/local/bin/pushover -t "DVR Hung" "$({ /bin/date; /opt/local/bin/curl --connect-timeout 50 -v http://127.0.0.1:8089/status; echo; tail -n15 ~/Library/Application\ Support/ChannelsDVR/data/channels-dvr{,-http}.log; /usr/bin/pkill -QUIT [c]hannels-dvr; /bin/sleep 5; /usr/bin/tail -n15 ~/Library/Application\ Support/ChannelsDVR/data/channels-dvr{,-http}.log; } 2>&1 | /usr/bin/tee -a ~/dvr_hung.txt)"'

I'm pretty sure the curl will hang indefinitely, so I've added timeout. (I actually run a script that curls 127.0.0.1:8089/dvr/jobs regularly, so I can see the upcoming schedule graphically -- that script appears to hang on the curl, too, when the web service stops accepting connections.

Anyway... fingers crossed

1 Like

Fantastic Sejman! Thanks for your post. I will give your script a try. Aman, It’s just the http server that hangs, not accepting any commands. The error is just "failed to connect". So yes I experience exactly what sejman is posting; dead web. Service. Haha.

Channel's http service hung this morning, and my health-check script (posted here) caught it.

While Channels appeared to have been idle, it looks like an external IP 167.94.138.117 (hostname canner-27.ch1.censys-scanner.com) tried to query the server for "*" with http method "PRI" right before channels hung. (I don't know if that triggered it, or if whatever they did next, if anything, wasn't recorded to the logs.)

Here's the logs capture from the moment before and after the channels was restarted, likely within a minute of the hang. If you can think of anything else I can capture when it happens next, let me know.

Thanks!

Could you submit diagnostics so we can see the full details

Logs have been submitted as 6e91117d-3e30-491c-82d3-e168b8bf84b9

1 Like

This may or may not mean anything, but I have no idea how census-scanner even requested "*" from the server, shown in the http logs as:

2022/07/14 03:32:41.164657 [HTTP] | 200 | 79.875µs | 167.94.138.117 | PRI "*"

I can't replicate. When I try:
curl -v -X PRI http://127.0.0.1:8089/*
it understandably returns 404 and ends up being logged as:
2022/07/14 17:00:16.057352 [HTTP] | 404 | 318µs | 127.0.0.1 | PRI "/*"

Whatever resource "*" is and however they requested it, they got a successful 200 response.

Oh -- I think it's an HTTP/2 thing that's supposed to be ignored by the server. Why is it ending up in the logs? I tried to get curl to do that with -k --http2, but no effect (or it's handled properly.) It does seem like census-scanner did something weird.

Okay, I figured how to get that "*" in the logs, with a
printf 'PRI * HTTP/2.0\r\nUser-Agent: nc/0.0.1\r\nHost: 127.0.0.1\r\nAccept: */*\r\n\r\n' |nc 127.0.0.1 8089
but even that shows up as a 404:
2022/07/14 17:33:56.739828 [HTTP] | 404 | 456.375µs | 127.0.0.1 | PRI "*"

So I still can't replicate the effect.

Anyway, so if it really is censys breaking the channels http service, I can at least block them at the firewall. https://support.censys.io/hc/en-us/articles/360043177092-from-faq. But, I should wait until my next hang and see if I received probes from censys, again, just beforehand. Maybe I can leave tcpdump sniffing port 8089 until it happens again and see what final censys request locked things up?

I guess the Censys requests were a red herring and that it was just a coincidence they happened to send requests right before channels' web service last hung. From the tcpdump and channels http logs, it looks like channels was probed again several times by Censys over the last few days, but no corresponding hang occurred.

Okay, I guess I'm back to just waiting for the next hang and see if I can learn anything new when it happens.

1 Like

We have some leads from the last diagnostic but another will help narrow down the issue

Alright, it happened again this morning 4:02am PDT. I submitted logs: 5c2d196f-6a37-4919-961f-e2eb93740814 -- same curl operation timed out after 25 seconds when querying http://127.0.0.1:8089/status -- restarted with a kill -3.

I’ve been having the same issue for the past couple months on my Intel Mac mini. The Channels server becomes unresponsive every few days requiring a restart, though recordings still happen during this time despite the web server/clients not working, Restarting makes it work again, but it keeps freezing every few days, which is particularly annoying when I want to access it remotely. I contacted support and provided logs from when this happened, but haven’t heard back in a while…

1 Like

@thully as a hacky workaround on your Intel Mac mini, just until the channels guys sort this out, you could run something like this in an open terminal window:

while true; do /usr/bin/nc -z 127.0.0.1 8089 || /usr/bin/pkill '[c]hannels-dvr'; sleep 60; done

It will check every minute to make sure channels is responding, and if it's not, it'll kill it, triggering channels to restart. Several of us have done this, but with launchd so we don't need a terminal window always open.

Latest prerelease has a fix which may be related to this issue.

1 Like

Interesting. Only two minutes? Some of my prior hangs definitely outlasted two minutes, though. I'll test it out. Maybe I'll update my health check to not restart until after 3 minutes, just to see if it catches any.

It's possible it would get repeatedly stuck for two minutes at a time. We are not fully certain. The channels-dvr process would have shown 100% cpu usage in the Activity Monitor at the time if this was the same issue.

I’ve tried repeated response for longer than and constant for more than 2min, and the web server is non-responsive. Hung. In fact I’ve had it in that state for multiple days. At least in my case, Only restarting the DVR server was the only way to regain normal function. Also on a brighter note updating my server to the m1 channels (or latest?) DVR build to match my m1 mac seemed to solve my issue. So maybe it was the channels auto updater that wasn’t working, or manual download/installation of a special and or latest build of channels DVR server was needed. Still haven't had that clarified.