Subtitles generation via ASR (Automatic Speech Recognition)

Could we add a possibility of generating subtitles for the recordings or even for the live programs on the fly?

An eighth gen quad core i5 can achieve 5x speed when using tiny.en model in whisper:

Just like with hardware transcoding, having a GPU would speed things up quite a bit and allow for the larger and more accurate models to be used.

If somebody is interested in playing with it, just install it via PIP and follow the Command-line usage section

@lsudduth

EDIT: There is also a standalone executable

Here are some benchmarks

What have you tried it on and how accurate is it for you?

Something like this would be the final piece for things like ah4c and ADBTuner. Closed captions are the only thing I really miss.

It just works. Even random foreign videos can have their subtitles generated. I always use the largest model available so medium.en for English or large-v3 - international.

I imagine with the system requirements needed to run this that it won't be making it into Channels DVR any time soon.

Even without GPU the speed is quite acceptable and on smaller models exceeds real time

Size Parameters English-only model Multilingual model Required RAM Relative speed
tiny 39 M tiny.en tiny ~1 GB ~32x
base 74 M base.en base ~1 GB ~16x
small 244 M small.en small ~2 GB ~6x
medium 769 M medium.en medium ~5 GB ~2x
large 1550 M N/A large ~10 GB 1x