Subtitles generation via ASR (Automatic Speech Recognition)

hal9000 · April 25, 2024, 5:00pm

Could we add a possibility of generating subtitles for the recordings or even for the live programs on the fly?

An eighth gen quad core i5 can achieve 5x speed when using tiny.en model in whisper:

Just like with hardware transcoding, having a GPU would speed things up quite a bit and allow for the larger and more accurate models to be used.

If somebody is interested in playing with it, just install it via PIP and follow the Command-line usage section

@lsudduth

EDIT: There is also a standalone executable

hal9000 · May 4, 2024, 5:48pm

Here are some benchmarks

chDVRuser · May 4, 2024, 6:30pm

What have you tried it on and how accurate is it for you?

cyberskier · May 4, 2024, 7:29pm

Something like this would be the final piece for things like ah4c and ADBTuner. Closed captions are the only thing I really miss.

hal9000 · May 5, 2024, 6:12am

It just works. Even random foreign videos can have their subtitles generated. I always use the largest model available so medium.en for English or large-v3 - international.

chDVRuser · May 5, 2024, 4:11pm

I imagine with the system requirements needed to run this that it won't be making it into Channels DVR any time soon.

hal9000 · May 5, 2024, 4:19pm

Even without GPU the speed is quite acceptable and on smaller models exceeds real time

Size	Parameters	English-only model	Multilingual model	Required RAM	Relative speed
tiny	39 M	tiny.en	tiny	~1 GB	~32x
base	74 M	base.en	base	~1 GB	~16x
small	244 M	small.en	small	~2 GB	~6x
medium	769 M	medium.en	medium	~5 GB	~2x
large	1550 M	N/A	large	~10 GB	1x