DVR server repeatedly crashing with `panic: runtime error: index out of range`

Last night at 4:33am EST while channels DVR was fetching guide data, a panic
occurred due to an index out of range:

2026/02/21 09:33:31.056257 [DVR] Fetching guide data for 99 stations in X-TVE @ 2026-03-02 3:30AM
2026/02/21 09:33:32.014317 [DVR]   indexed 713 airings (99 channels) [734ms fetch, 224ms index]
2026/02/21 09:33:32.136126 stderr:  panic: runtime error: index out of range [278] with length 29
2026/02/21 09:33:32.136152 stderr:  
2026/02/21 09:33:32.136154 stderr:  goroutine 109 [running]:

Because I have the channels DVR container running as a systemd service, the
container was automatically restarted a few seconds later, but the process
repeatedly crashes immediately.

Some of the crashes don't include obvious error logs, while others include a
stack trace from the zapx library.

For example, here's that same first crash as above followed by the second crash,
which does not include an error message, and the third crash, which includes the
panic messages as well as a stack trace:

2026/02/21 09:33:31.056257 [DVR] Fetching guide data for 99 stations in X-TVE @ 2026-03-02 3:30AM
2026/02/21 09:33:32.014317 [DVR]   indexed 713 airings (99 channels) [734ms fetch, 224ms index]
2026/02/21 09:33:32.136126 stderr:  panic: runtime error: index out of range [278] with length 29
2026/02/21 09:33:32.136152 stderr:  
2026/02/21 09:33:32.136154 stderr:  goroutine 109 [running]:
2026/02/21 09:33:38.721216 [SYS] Starting Channels DVR v2026.02.09.1530 (linux-x86_64 pid:2) in /channels-dvr/data
2026/02/21 09:33:38.749702 [SYS] Started HTTP Server on 8089
2026/02/21 09:33:39.210877 [HDR] Found 1 devices
2026/02/21 09:33:39.477589 [DVR] Waiting 15h25m21s until next job 1771721940-16 Primetime in Milan: The Olympics
2026/02/21 09:33:39.560530 [DVR] Recording engine started in /shares/DVR
2026/02/21 09:33:39.564320 [SYS] Bonjour service running for dvr-de3b4f611c08.local. [192.168.1.36]
2026/02/21 09:33:39.616047 [SYS] Created database snapshot: backup-20260221.093339
2026/02/21 09:33:39.616137 [SYS] Removing old backup backup-20260124.220433
2026/02/21 09:33:46.006589 [SYS] Starting Channels DVR v2026.02.09.1530 (linux-x86_64 pid:2) in /channels-dvr/data
2026/02/21 09:33:46.018534 [SYS] Started HTTP Server on 8089
2026/02/21 09:33:46.477100 [HDR] Found 1 devices
2026/02/21 09:33:46.655620 [DVR] Waiting 15h25m13s until next job 1771721940-16 Primetime in Milan: The Olympics
2026/02/21 09:33:46.753923 [DVR] Recording engine started in /shares/DVR
2026/02/21 09:33:46.757385 [SYS] Bonjour service running for dvr-6c466991cfc0.local. [192.168.1.234]
2026/02/21 09:33:46.800077 [SYS] Created database snapshot: backup-20260221.093346
2026/02/21 09:33:46.800200 [SYS] Removing old backup backup-20260125.220436
2026/02/21 09:33:46.848043 stderr:  panic: runtime error: index out of range [278] with length 29
2026/02/21 09:33:46.848098 stderr:  
2026/02/21 09:33:46.848103 stderr:  goroutine 47 [running]:
2026/02/21 09:33:46.848107 stderr:  github.com/blevesearch/zapx/v15.(*PostingsIterator).readLocation(0xc000a8e8c0, 0xc000a5b280)
2026/02/21 09:33:46.848110 stderr:      github.com/blevesearch/zapx/[email protected]/posting.go:439 +0x359
2026/02/21 09:33:46.848113 stderr:  github.com/blevesearch/zapx/v15.(*PostingsIterator).nextAtOrAfter(0xc000a8e8c0, 0xc000315aa0?)
2026/02/21 09:33:46.848116 stderr:      github.com/blevesearch/zapx/[email protected]/posting.go:531 +0x345

There are also crash instances that include a stack trace with one additional
stack frame listed:

2026/02/21 09:34:49.870711 [SYS] Starting Channels DVR v2026.02.09.1530 (linux-x86_64 pid:2) in /channels-dvr/data
2026/02/21 09:34:49.886119 [SYS] Started HTTP Server on 8089
2026/02/21 09:34:50.348296 [HDR] Found 1 devices
2026/02/21 09:34:50.552029 [DVR] Waiting 15h24m9s until next job 1771721940-16 Primetime in Milan: The Olympics
2026/02/21 09:34:50.631022 [DVR] Recording engine started in /shares/DVR
2026/02/21 09:34:50.634457 [SYS] Bonjour service running for dvr-b286abacdb7a.local. [192.168.1.218]
2026/02/21 09:34:50.681223 [SYS] Created database snapshot: backup-20260221.093450
2026/02/21 09:34:50.681340 [SYS] Removing old backup backup-20260203.220511
2026/02/21 09:34:50.748091 stderr:  panic: runtime error: index out of range [278] with length 29
2026/02/21 09:34:50.748119 stderr:  
2026/02/21 09:34:50.748122 stderr:  goroutine 134 [running]:
2026/02/21 09:34:50.748125 stderr:  github.com/blevesearch/zapx/v15.(*PostingsIterator).readLocation(0xc0008949a0, 0xc000f75e80)
2026/02/21 09:34:50.748128 stderr:      github.com/blevesearch/zapx/[email protected]/posting.go:439 +0x359
2026/02/21 09:34:50.748131 stderr:  github.com/blevesearch/zapx/v15.(*PostingsIterator).nextAtOrAfter(0xc0008949a0, 0xc00003e7b0?)
2026/02/21 09:34:50.748135 stderr:      github.com/blevesearch/zapx/[email protected]/posting.go:531 +0x345
2026/02/21 09:34:50.748138 stderr:  github.com/blevesearch/zapx/v15.(*PostingsIterator).Next(...)
2026/02/21 09:34:50.748143 stderr:      github.com/blevesearch/zapx/[email protected]/posting.go:465

This thread

appears to be a similar issue and suggests that the problem is due to a
corrupted database.

Because the server crashes immediately, I'm unable to do anything using the web
UI, including using the /restore HTTP request target mentioned in the other
thread.

Based on looking at what files are included in the database backups, I tried
removing (well moving/renaming) the data/recorder.db and data/settings.db
and starting the server again.

This did allow the server to start without crashing, but obviously without any
of the existing data or configuration known. With the server up and running, I
was able to connect to the web UI, and it prompted me to restore from backup.

Unfortunately, because channels appears to create a database snapshot and remove
an old backup every time it is started (even if the database is corrupt,
apparently...), the hundreds of automatic restarts that occurred before I
noticed the issue led to all non-corrupt backup files being replaced with
newly-generated corrupt backups.

When I try to restore to one of these backups, the server proceeds to crash.

So I have a few questions:

  • Am I even correct in my assessment that the issue here is a corrupt database
    file?

  • If so, is it the recorder.db or settings.db file (or both) that is
    corrupt?

  • Is there anything that can be done to fix the corrupted file?

    Some sort of debugging program to inspect the database (or even just a
    specification of the file format) would be helpful so that I could try to
    identify exactly where and what the corruption is and manually modify it would
    be helpful. I'd welcome any help that the devs can offer here.

  • Otherwise, is my only path forward to start over with empty databases?

  • If I do have to start over with empty db files, is there any way to (even
    partially) reconstruct the recordings database from the recordings themselves?
    I would really prefer not to have to start from scratch.

  • Can the software please be updated to be more robust such that a corrupt file
    does not lead to an immediate and recurring crash? Something along the lines
    of validating that data structures that are externally sourced (e.g. from a
    file) are internally consistent (e.g. bounds checking) before accessing them.

  • Can the software please be updated to only generate a database snapshot and
    remove an old one if the database has been verified to not be corrupt? The
    current behavior kind of defeats the purpose of generating backups since the
    known-good backups get automatically and repeatedly replaced with corrupt
    backups.

Please let me know if there's any additional information I can provide or any
additional debugging steps I should take to be of assistance.

Thank you.

The issue is in the guide data not the .db files

That makes sense given the circumstances of the first panic. I guess I was
assuming that the guide data was stored in the .db files without knowing the
internals of the project or having done a deep dive into the directory
structure.

I now see that there are .zap files in <source>.airings/store/ and
<source>.groups/store/ directories, so presumably it's one or more of these
that is corrupt.

  • Do you need me to send any of these files to you for debugging?

  • Is there a subset of these files that I could safely delete/move in order to
    avoid the panic in the short term and force the guide data to be re-fetched?

    This would obviously only work if the corruption was transient and not simply
    garbage in from the source or a result of a bug in the process that writes
    these files.

    I see that these .zap files are referenced throughout the .bolt file in
    the same directory, so I'm guessing it may not be as straightforward as just
    removing one or more of the zap files (although I'd hope that the code can
    gracefully handle and recover from ENOENT errors if a .zap file listed in
    the .bolt file is not found).

Thank you again for any help you can provide.

Delete *.airings and *.groups

Thank you, re/moving the *.airings and *.groups directories allowed the
server to start and automatically refetched the guide data without issue.

I retained all of the moved files in case they could be of use to you or the
other developers for debugging to see how the server can be made more robust in
these cases (e.g. detect inconsistent/corrupt guide data and automatically
discard it and refetch without crashing).

Let me know if you would like any or all of these files and, if so, where to
send them.

Thank you again.