Channels DVR stops responding randomly on macOS

You need to concentrate on getting rid of the smoke first. Open your case and keep a spray bottle handy. When it smokes, just give it a couple of squirts. Problem solved.

2 Likes

The smoke scent is dependent the model of house cat you have. If you swap in a different model in about a week you should produce a different smoke scent.

I've had channels-dvr on my m1 Mac mini become unresponsive about once a week or so, apparently becoming unresponsive overnight. I have been running pre-releases lately, so I don't know if that's contributing.

The methods that require talking to the server to trigger a restart don't work, because localhost port 8089, while not exactly closed, hangs with no response. Even using "nc -z localhost 8089" from terminal hangs indefinitely, as if it can't tell if it the port is really open or not. What I find most interesting is that channels-dvr continues recording or doing maintenance tasks while the web server is hung, as seen by activity in the logs. No channels clients can connect, of course, and acts like there's no installation running anywhere.

I have created a macOS Launch Control "cron" job run every minute that runs
/opt/local/bin/bash -c "/opt/local/bin/timeout 2 /usr/bin/nc -z 127.0.0.1 8089 || /usr/bin/pkill '[c]hannels-dvr'"

It essentially just checks that the api is responding in under 2 seconds. The timeout command (which runs another command, but enforces a timeout period) is not standard on the Mac, so I installed it with macports.) If it does respond quickly enough, which is the normal case, it exits doing nothing. If the server doesn't respond in that time, it issues a kill -15, which is a request to exit, which seems gracefully handled by the channels-dvr process, with no database corruption observed. (This is also how the OS kills running apps when the host computer is being shut down, so it's generally safe and well-handled by properly developed processes.) In any case, the com.getchannels.dvr global launch daemon's keep-alive no-matter-what automatically (re)starts channels-dvr again, and everything resumes working properly.

(My main concern is that I may restart the dvr while it's recording, causing an interruption, but since the api's not responding, I can't tell if it's idle or not. I guess I could scan the logs before restarting, to potentially delay the restart until everything's finished recording, but I'd rather cause an interruption than have clients not work at all.)

I understand this is not the recommended restart method, but I can see in the logs that when it's triggered (by an unresponsive channels web server) that channels-dvr shuts down gracefully and issues a final goodbye log, so it seems benign, at least so far.

I am happy to research why channels-dvr's web service becomes unresponsive, and would love to fix the root cause, but this thread was no help in that regard. Everything else on the machine is running fine; avail memory aplenty, cpu utilization: low. It seems like it's just the channels-dvr web service "thread" hanging for it's own internal reasons; no clues in channel's logs or the mac's system logs.

I would love to be able to turn on a more debug oriented level of logging in the channels web service to better troubleshoot -- is that possible? I'm open to the idea that there's something wrong with my Mac, but oddly, whatever it might be, it only affects channels-dvr. Multiple other distinct web services continue operating just fine.

When its hung, what does curl -v http://127.0.0.1:8089/status report?

You can also send kill -QUIT when its stuck, which will print out some debugging information to the log which you can send to us.

1 Like

@justinmk3 FYI you appear to be running the Intel emulated version of the DVR on your M1 mini, instead of the native ARM64 version. I don't know if that's related to the issues you're having.

Sejmon. Awesome! I can confirm that so far no data corruption or any issues with the methodology I posted previously despite the warning from Aman have happened. I can also confirm that indeed it gracefully exits/restarts despite being terminated via kill. Lastly although not as important, but just more a curious thing, by/why Ammanā€™s and others warnings about restarting the server via this method would ā€œleave databases in a corrupted stateā€ , and furthermore launching the binary directly ā€œwonā€™t setup loggingā€. Both of which I have tested and so far found to be NOT accurate. I wonder was it based on theoretical, or based on actually trying it in the past and having issue?

There are many issues with this command.

I think you meant killall ... || open not |. The | will pipe the output of kill into the open command, which makes no sense. That also means the open will be executed before the kill finishes, which is not ideal and could cause two processes accessing the database at the same time.

~/Users/Library is not the correct path. You meant ~/Library.

Database should generally be safe unless you force kill (i.e. killall -9). So I was mistaken before and using kill or killall is generally safe, because the server will catch the signal and shut down gracefully. The initial description sounded like the process was fully stuck, but if its able to catch the signal then I guess its still working fine apart from the web UI.

The curl command I suggested earlier will help narrow down whether http server is working at all, vs some problem with the web UI not loading fully.

As far as logging, the channels-dvr binary requires two things to run correct: a proper CWD and output redirection to the log file. The proper invocation is:

cd ~/"Library/Application Support/ChannelsDVR/data" && ../latest/channels-dvr >> channels-dvr.log 2>&1

This ensures that your data is saved in the data folder, and the output is saved in the correct log file in that directory.

If you're running it from a cron I'm not sure what directory your settings.db and recorder.db are ending up in.

1 Like

Thanks again. Yes as myself and sejmon stated b4 the web server is haulted but the other functions of the DVR recording etc still work. ā€¦ thus curl commands Do not work. Yes semantics with my first post, (donā€™t drink and post kids) :stuck_out_tongue_winking_eye: ā€¦ but the gist being the solution of using kill works, and so far does not appear to cause any corruption or launching issues. Also thanks for the heads up about the m1 native version of channels DVR server. I would have thought the auto update would have done that, but guess not? In conclusion, would be great if you could add a official restart method that actually works In this case and maybe all, (using both methods over fail safe) to the CLI

Alright, I've updated my script to run curl -v to the status URL and run pkill -QUIT and capture the channels-dvr.log and channels-dvr-http.log before and after the kill, just in case.

I'll update here when it happens next, presuming my convoluted one-liner works properly.

/bin/sh -c '/opt/local/bin/timeout 2 /usr/bin/nc -z 127.0.0.1 8089 || /usr/local/bin/pushover -t "DVR Hung" "$({ /bin/date; /opt/local/bin/curl --connect-timeout 50 -v http://127.0.0.1:8089/status; echo; tail -n15 ~/Library/Application\ Support/ChannelsDVR/data/channels-dvr{,-http}.log; /usr/bin/pkill -QUIT [c]hannels-dvr; /bin/sleep 5; /usr/bin/tail -n15 ~/Library/Application\ Support/ChannelsDVR/data/channels-dvr{,-http}.log; } 2>&1 | /usr/bin/tee -a ~/dvr_hung.txt)"'

I'm pretty sure the curl will hang indefinitely, so I've added timeout. (I actually run a script that curls 127.0.0.1:8089/dvr/jobs regularly, so I can see the upcoming schedule graphically -- that script appears to hang on the curl, too, when the web service stops accepting connections.

Anyway... fingers crossed

1 Like

Fantastic Sejman! Thanks for your post. I will give your script a try. Aman, Itā€™s just the http server that hangs, not accepting any commands. The error is just "failed to connect". So yes I experience exactly what sejman is posting; dead web. Service. Haha.

Channel's http service hung this morning, and my health-check script (posted here) caught it.

While Channels appeared to have been idle, it looks like an external IP 167.94.138.117 (hostname canner-27.ch1.censys-scanner.com) tried to query the server for "*" with http method "PRI" right before channels hung. (I don't know if that triggered it, or if whatever they did next, if anything, wasn't recorded to the logs.)

Here's the logs capture from the moment before and after the channels was restarted, likely within a minute of the hang. If you can think of anything else I can capture when it happens next, let me know.

Thanks!

Could you submit diagnostics so we can see the full details

Logs have been submitted as 6e91117d-3e30-491c-82d3-e168b8bf84b9

1 Like

This may or may not mean anything, but I have no idea how census-scanner even requested "*" from the server, shown in the http logs as:

2022/07/14 03:32:41.164657 [HTTP] | 200 | 79.875Āµs | 167.94.138.117 | PRI "*"

I can't replicate. When I try:
curl -v -X PRI http://127.0.0.1:8089/*
it understandably returns 404 and ends up being logged as:
2022/07/14 17:00:16.057352 [HTTP] | 404 | 318Āµs | 127.0.0.1 | PRI "/*"

Whatever resource "*" is and however they requested it, they got a successful 200 response.

Oh -- I think it's an HTTP/2 thing that's supposed to be ignored by the server. Why is it ending up in the logs? I tried to get curl to do that with -k --http2, but no effect (or it's handled properly.) It does seem like census-scanner did something weird.

Okay, I figured how to get that "*" in the logs, with a
printf 'PRI * HTTP/2.0\r\nUser-Agent: nc/0.0.1\r\nHost: 127.0.0.1\r\nAccept: */*\r\n\r\n' |nc 127.0.0.1 8089
but even that shows up as a 404:
2022/07/14 17:33:56.739828 [HTTP] | 404 | 456.375Āµs | 127.0.0.1 | PRI "*"

So I still can't replicate the effect.

Anyway, so if it really is censys breaking the channels http service, I can at least block them at the firewall. https://support.censys.io/hc/en-us/articles/360043177092-from-faq. But, I should wait until my next hang and see if I received probes from censys, again, just beforehand. Maybe I can leave tcpdump sniffing port 8089 until it happens again and see what final censys request locked things up?

I guess the Censys requests were a red herring and that it was just a coincidence they happened to send requests right before channels' web service last hung. From the tcpdump and channels http logs, it looks like channels was probed again several times by Censys over the last few days, but no corresponding hang occurred.

Okay, I guess I'm back to just waiting for the next hang and see if I can learn anything new when it happens.

1 Like

We have some leads from the last diagnostic but another will help narrow down the issue

Alright, it happened again this morning 4:02am PDT. I submitted logs: 5c2d196f-6a37-4919-961f-e2eb93740814 -- same curl operation timed out after 25 seconds when querying http://127.0.0.1:8089/status -- restarted with a kill -3.

Iā€™ve been having the same issue for the past couple months on my Intel Mac mini. The Channels server becomes unresponsive every few days requiring a restart, though recordings still happen during this time despite the web server/clients not working, Restarting makes it work again, but it keeps freezing every few days, which is particularly annoying when I want to access it remotely. I contacted support and provided logs from when this happened, but havenā€™t heard back in a whileā€¦

2 Likes

@thully as a hacky workaround on your Intel Mac mini, just until the channels guys sort this out, you could run something like this in an open terminal window:

while true; do /usr/bin/nc -z 127.0.0.1 8089 || /usr/bin/pkill '[c]hannels-dvr'; sleep 60; done

It will check every minute to make sure channels is responding, and if it's not, it'll kill it, triggering channels to restart. Several of us have done this, but with launchd so we don't need a terminal window always open.

1 Like

Latest prerelease has a fix which may be related to this issue.

1 Like