English audio and subtitles are actually in Spanish

That one I showed MediaInfo on says it has CC1 and CC3, but all the players I tried only let me select CC1 as if CC3 doesn't exist.

The comskip log for that recording shows the English closed captions (I assume CC1).
I would check the link @tmm1 gave to see if you can extract them with ffmpeg if ccextractor doesn't work.

I use MediaInfo a lot, but I've often experienced that much of its report relies on header information. It wasn't always this way. A number of years ago, one of its developers (Jerome Martinez) said that MediaInfo by default parses the first few hundreds of frames - typically up to a maximum of 64MB - to compose its report. I certainly doesn't seem to have done that for the files mentioned in this thread.

To help me try to use ffmpeg to extract the subtitle stream, I referenced the thread,

because it was heavy with explanation. I'll take another look at the other one to compare and contrast.

There is a difference between subtitle extraction and closed caption extraction in ffmpeg.

It worked! I renamed the file to input.ts and ran:

ffmpeg -f lavfi -i input.ts[out+subcc] -map 0:1 output.srt

and the file was parsed to a 92 KB .srt file within about 9 minutes. I also tried other stream numbers (-map 0:0, -map 0:2, -map 0:3) - due to the MediaInfo report, but nothing was found for those streams.

I'm disappointed to find that there was only one stream and that it was not in English - but now I at least have one. Now I wonder if there's a simple way to convert the Spanish to English.

I clearly didn't understand the difference between subtitles and closed captions - probably because from the viewer's standpoint they seem interchangeable. Now I know first hand that closed captions are not handled the same as subtitles because they're embedded in individual frames and must be extracted one by one in order to make an independent stream.

Subtitles are a separate track.

Closed captions are embedded inside video frames.

So only -map 0:1 will work, because only the video track has caption info.

To extract CC3, you can add -data_field second before the -i

1 Like

Oops - I misunderstood the application of the map option in this case.

So I tried:

ffmpeg -f lavfi -data_field second -i input.ts[out+subcc] -map 0:1 output3.srt

and got back:

Unrecognized option 'data_field'
Error splitting the argument list: Option not found

your ffmpeg might be too old then

I got the latest ffmpeg 5.1.1 and now I get:

No such filter: 'input.ts'
input.ts[out+subcc]: Invalid argument

What is the full command you used?

Try this one
ffmpeg -f lavfi -i "movie=input.ts[out0+subcc]" -map s output.srt

Okay,

ffmpeg -f lavfi -i "movie=input.ts[out0+subcc]" -map s output.srt

was much faster than:

ffmpeg -f lavfi -i input.ts[out+subcc] -map 0:1 output.srt

and it gave me a 101 KB file. When I compare them side-by-side they look identical - both in Spanish..

For CC3 you need to add the -data_field second

When I tried:
ffmpeg -f lavfi -data_field second -i input.ts[out+subcc] -map 0:1 output3.srt

a while back, I got:

No such filter: 'input.ts'
input.ts[out+subcc]: Invalid argument

So then I ran a mash-up:

ffmpeg -f lavfi -data_field second -i "movie=input.ts[out0+subcc]" -map s output2.srt

and got thousands of errors saying,

"[Closed caption Decoder @ 000001f73386af00] Data Ignored since exceeding screen width"

plus a 20 KB file consisting of:

1
00:00:00,000 --> 00:00:13,881
<font face="Monospace">{\an7}nHELa Herencia Un Legado de Amor</font>

2
00:00:13,881 --> 00:00:29,830
<font face="Monospace">{\an7}nHELa Herencia Un Legado de Amor</font>

3
00:00:29,830 --> 00:00:45,779
<font face="Monospace">{\an7}nHELa Herencia Un Legado de Amor</font>

4
00:00:45,779 --> 00:01:01,728
<font face="Monospace">{\an7}nHELa Herencia Un Legado de Amor</font>
...

etc. all the way up to line 197.

Upload the video and I will take a look.

Closed Captions are Spanish only on their website. Not that it means anything.

Looks like -data_field in ffmpeg doesn't work as expected. Someone made another patch to pull out CC3: [FFmpeg-devel] ccaption_dec: add support for multiple channels

EDIT: Actually it says -data_field 1 should work. That sample had three captions instead of just one or two. So seems like a different case altogether.

Regarding "La Herencia, Un Legado de Amor" - I noticed that in this video header as well. That is a completely different show so maybe the header has some remnants of another program that was used as a template.

I've gotta step away for a couple hours. I'll try -data_field 1 when I get back.

I downloaded the latest CCExtractor 0.94 and ran a report against my recording.
It says it has both CC1 and CC3, but when I asked it to extract CC3 I got an empty .srt file.

REPORT

> ccextractorwinfull -12 -out=report "T:\recording.mpg"

File: T:\recording.mpg
Stream Mode: Transport Stream
Program Count: 1
Program Numbers: 5 
PID: 2058, Program: 5, AC3 audio
PID: 2059, Program: 5, AC3 audio
PID: 2060, Program: 5, H.264 video
//////// Program #5: ////////
DVB Subtitles: No
Teletext: No
ATSC Closed Caption: Yes
EIA-608: Yes
XDS: Yes
CC1: Yes
CC2: No
CC3: Yes
CC4: No
CEA-708: Yes
Services: 1 2 
Primary Language Present: Yes
Secondary Language Present: Yes

MPEG-4 Timed Text: No

EXTRACT

> ccextractorwinfull -12 "T:\recording.mpg"
T:\recording_1.srt has captions
T:\recording_2.srt is an empty file

I'll try for the CEA-708 captions tomorrow.

1 Like

I got CCExtractor 0.94 as well, but it was the GUI version. It's not clear to me how to get a report like yours.

I've uploaded the original file (before renaming to input.ts) to the Drop Box linked above.

I tried using -data_field 1 in a few different contexts:

ffmpeg -f lavfi -data_field 1 -i input.ts[out+subcc] -map 0:1 output.srt
ffmpeg -f lavfi -data_field 1 -i "movie=input.ts[out+subcc]" -map 0:1 output.srt
ffmpeg -f lavfi -data_field 1 -i "movie=input.ts[out0+subcc]" -map s output.srt

The first one still says,
No such filter: 'input.ts'
input.ts[out+subcc]: Invalid argument

The second one produced the same issue I mentioned in post #25 above (English audio and subtitles are actually in Spanish - #25 by tluxon)

The third one resulted in essentially the same output as the second.