English audio and subtitles are actually in Spanish

I got the latest ffmpeg 5.1.1 and now I get:

No such filter: 'input.ts'
input.ts[out+subcc]: Invalid argument

What is the full command you used?

Try this one
ffmpeg -f lavfi -i "movie=input.ts[out0+subcc]" -map s output.srt

Okay,

ffmpeg -f lavfi -i "movie=input.ts[out0+subcc]" -map s output.srt

was much faster than:

ffmpeg -f lavfi -i input.ts[out+subcc] -map 0:1 output.srt

and it gave me a 101 KB file. When I compare them side-by-side they look identical - both in Spanish..

For CC3 you need to add the -data_field second

When I tried:
ffmpeg -f lavfi -data_field second -i input.ts[out+subcc] -map 0:1 output3.srt

a while back, I got:

No such filter: 'input.ts'
input.ts[out+subcc]: Invalid argument

So then I ran a mash-up:

ffmpeg -f lavfi -data_field second -i "movie=input.ts[out0+subcc]" -map s output2.srt

and got thousands of errors saying,

"[Closed caption Decoder @ 000001f73386af00] Data Ignored since exceeding screen width"

plus a 20 KB file consisting of:

1
00:00:00,000 --> 00:00:13,881
<font face="Monospace">{\an7}nHELa Herencia Un Legado de Amor</font>

2
00:00:13,881 --> 00:00:29,830
<font face="Monospace">{\an7}nHELa Herencia Un Legado de Amor</font>

3
00:00:29,830 --> 00:00:45,779
<font face="Monospace">{\an7}nHELa Herencia Un Legado de Amor</font>

4
00:00:45,779 --> 00:01:01,728
<font face="Monospace">{\an7}nHELa Herencia Un Legado de Amor</font>
...

etc. all the way up to line 197.

Upload the video and I will take a look.

Closed Captions are Spanish only on their website. Not that it means anything.

Looks like -data_field in ffmpeg doesn't work as expected. Someone made another patch to pull out CC3: [FFmpeg-devel] ccaption_dec: add support for multiple channels

EDIT: Actually it says -data_field 1 should work. That sample had three captions instead of just one or two. So seems like a different case altogether.

Regarding "La Herencia, Un Legado de Amor" - I noticed that in this video header as well. That is a completely different show so maybe the header has some remnants of another program that was used as a template.

I've gotta step away for a couple hours. I'll try -data_field 1 when I get back.

I downloaded the latest CCExtractor 0.94 and ran a report against my recording.
It says it has both CC1 and CC3, but when I asked it to extract CC3 I got an empty .srt file.

REPORT

> ccextractorwinfull -12 -out=report "T:\recording.mpg"

File: T:\recording.mpg
Stream Mode: Transport Stream
Program Count: 1
Program Numbers: 5 
PID: 2058, Program: 5, AC3 audio
PID: 2059, Program: 5, AC3 audio
PID: 2060, Program: 5, H.264 video
//////// Program #5: ////////
DVB Subtitles: No
Teletext: No
ATSC Closed Caption: Yes
EIA-608: Yes
XDS: Yes
CC1: Yes
CC2: No
CC3: Yes
CC4: No
CEA-708: Yes
Services: 1 2 
Primary Language Present: Yes
Secondary Language Present: Yes

MPEG-4 Timed Text: No

EXTRACT

> ccextractorwinfull -12 "T:\recording.mpg"
T:\recording_1.srt has captions
T:\recording_2.srt is an empty file

I'll try for the CEA-708 captions tomorrow.

1 Like

I got CCExtractor 0.94 as well, but it was the GUI version. It's not clear to me how to get a report like yours.

I've uploaded the original file (before renaming to input.ts) to the Drop Box linked above.

I tried using -data_field 1 in a few different contexts:

ffmpeg -f lavfi -data_field 1 -i input.ts[out+subcc] -map 0:1 output.srt
ffmpeg -f lavfi -data_field 1 -i "movie=input.ts[out+subcc]" -map 0:1 output.srt
ffmpeg -f lavfi -data_field 1 -i "movie=input.ts[out0+subcc]" -map s output.srt

The first one still says,
No such filter: 'input.ts'
input.ts[out+subcc]: Invalid argument

The second one produced the same issue I mentioned in post #25 above (English audio and subtitles are actually in Spanish - #25 by tluxon)

The third one resulted in essentially the same output as the second.

I downloaded the portable windows version zip file.

Running ccextractorwinfull.exe without parameters shows the help test

So either CCExtractor and ffmpeg have long outstanding bugs, or our recordings say they have EIA-608 CC3 but really don't.

So trying to extract CEA-708's

ccextractorwinfull -svc 1 T:\recording.mpg
and
ccextractorwinfull -svc 1,2 T:\recording.mpg
and
ccextractorwinfull -svc all T:\recording.mpg
All result in closed caption output in the file T:\recording.p5.svc01.srt
So there are CEA-708 captions in Program 5, Service 1

ccextractorwinfull -svc 2 T:\recording.mpg
Results in no output
So would assume there are no CEA-708 captions in Program 5, Service 2

CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: T:\recording.mpg
[Extract: 0] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[CEA-708: 1 decoders active]
[CEA-708: using charset "none" for service 2]
[Timing mode: Auto] [Debug: No] [Buffer input: Yes]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: T:\recording.mpg
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
Changed fps using NAL to: 59.940060

Found large gap(6118306) in PTS! Trying to recover ...

Found large gap(6118322) in PTS! Trying to recover ...

Found large gap(6118314) in PTS! Trying to recover ...

Found large gap(6118310) in PTS! Trying to recover ...

Found large gap(6118308) in PTS! Trying to recover ...

Found large gap(6118312) in PTS! Trying to recover ...

Found large gap(6118318) in PTS! Trying to recover ...

Found large gap(6118316) in PTS! Trying to recover ...

Found large gap(6118320) in PTS! Trying to recover ...
Premature end of file - Transport Stream packet is incomplete (expected 188 bytes, got 112).
100%  |  59:59
Number of NAL_type_7: 2766
Number of VCL_HRD: 0
Number of NAL HRD: 2766
Number of jump-in-frames: 1798
Number of num_unexpected_sei_length: 0

Total frames time:        00:59:59:028  (215726 frames at 59.94fps)

Min PTS:                                14:10:36:717
Max PTS:                                15:10:35:937
Length:                          00:59:59:220
Done, processing time = 4 seconds
Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues

Interesting. So did you see the text added to the T:\recording.p5.svc01.srt file? What language(s)?

English, but that's to be expected as it's a PBS recording of Nature.
I also used command line curl to record 30 minutes of PBS direct from my HDHR Prime tuner and I'm seeing the same thing. Shows it has EIA-608 CC1 & CC3, CEA-708 svc1 & svc2, but I can only extract EIA-608 CC1 and CEA-708 svc1.

I ran a ccextractor report on my file, input.ts::

> ccextractorwinfull -12 -out=report input.ts
File: input.ts
Stream Mode: Transport Stream
Program Count: 1
Program Numbers: 718
PID: 5393, Program: 718, H.264 video
PID: 5395, Program: 718, AC3 audio
PID: 5396, Program: 718, AC3 audio
PID: 5403, Program: 718, MPEG-2 User Private
//////// Program #718: ////////
DVB Subtitles: No
Teletext: No
ATSC Closed Caption: Yes
EIA-608: Yes
XDS: Yes
CC1: Yes
CC2: No
CC3: Yes
CC4: No
CEA-708: Yes
Services: 1 2 3 4 5 6
Primary Language Present: Yes
Secondary Language Present: Yes

MPEG-4 Timed Text: No

Next I ran it for default output.:

> ccextractorwinfull input.ts -o output.srt
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: input.ts
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[CEA-708: 63 decoders active]
[CEA-708: using charset "none" for all services]
[Timing mode: Auto] [Debug: No] [Buffer input: Yes]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: input.ts
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
Changed fps using NAL to: 59.940060

Found large gap(10336508) in PTS! Trying to recover ...

Found large gap(10336506) in PTS! Trying to recover ...

Found large gap(10336516) in PTS! Trying to recover ...

Found large gap(10336512) in PTS! Trying to recover ...

Found large gap(10336510) in PTS! Trying to recover ...

Found large gap(10336514) in PTS! Trying to recover ...
XDS Notice: Program is now La Herencia Un Legado de Amor
XDS Notice: Network is now Univision
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-14 (Parents Strongly Cautioned)
XDS:
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-G (General Audience)
XDS:
XDS Notice: Program is now
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-14 (Parents Strongly Cautioned)
XDS:
XDS Notice: Program is now Los Ricos Tambien Lloran
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-G (General Audience)
XDS:
XDS Notice: Program is now
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-14 (Parents Strongly Cautioned)
XDS:
XDS Notice: Program is now Los Ricos Tambien Lloran
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-G (General Audience)
XDS:
XDS Notice: Program is now
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-14 (Parents Strongly Cautioned)
XDS:
XDS Notice: Program is now Los Ricos Tambien Lloran
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-G (General Audience)
XDS:
XDS Notice: Program is now
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-14 (Parents Strongly Cautioned)
XDS:
XDS Notice: Program is now Los Ricos Tambien Lloran
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-G (General Audience)
XDS:
XDS Notice: Program is now
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-14 (Parents Strongly Cautioned)
XDS:
XDS Notice: Program is now Los Ricos Tambien Lloran
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-G (General Audience)
XDS:
XDS Notice: Program is now
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-14 (Parents Strongly Cautioned)
XDS:
XDS Notice: Program is now Los Ricos Tambien Lloran
XDS Notice: Program is now
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-G (General Audience)
XDS:
Premature end of file - Transport Stream packet is incomplete (expected 188 bytes, got 84).
100%  |  60:59
Number of NAL_type_7: 2925
Number of VCL_HRD: 0
Number of NAL HRD: 2925
Number of jump-in-frames: 1827
Number of num_unexpected_sei_length: 0

Total frames time:        01:00:59:022  (219322 frames at 59.94fps)

Min PTS:                                23:57:03:288
Max PTS:                                24:58:02:676
Length:                          01:00:59:388
Done, processing time = 7 seconds
Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues

Two files were produced - output.srt and output.p718.svc01.srt.
Output.srt was a file that had the subtitles in Spanish with no font information.
Output.p718.svc01.srt was a file that the same Spanish subtitles but with font information.
Based on the naming, I assumed the second file contained the CEA-708 services all combined into one file.which was confirmed by running:

> ccextractorwinfull -svc 1,2,3,4,5,6 input.ts
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: input.ts
[Extract: 0] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[CEA-708: 6 decoders active]
[CEA-708: using charset "none" for service 1]
[CEA-708: using charset "none" for service 2]
[CEA-708: using charset "none" for service 3]
[CEA-708: using charset "none" for service 4]
[CEA-708: using charset "none" for service 5]
[CEA-708: using charset "none" for service 6]
[Timing mode: Auto] [Debug: No] [Buffer input: Yes]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]

-----------------------------------------------------------------
Opening file: input.ts
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
Changed fps using NAL to: 59.940060

Found large gap(10336508) in PTS! Trying to recover ...

Found large gap(10336506) in PTS! Trying to recover ...

Found large gap(10336516) in PTS! Trying to recover ...

Found large gap(10336512) in PTS! Trying to recover ...

Found large gap(10336510) in PTS! Trying to recover ...

Found large gap(10336514) in PTS! Trying to recover ...
Premature end of file - Transport Stream packet is incomplete (expected 188 bytes, got 84).
100%  |  60:59
Number of NAL_type_7: 2925
Number of VCL_HRD: 0
Number of NAL HRD: 2925
Number of jump-in-frames: 1827
Number of num_unexpected_sei_length: 0

Total frames time:        01:00:59:022  (219322 frames at 59.94fps)

Min PTS:                                23:57:03:288
Max PTS:                                24:58:02:676
Length:                          01:00:59:388
Done, processing time = 6 seconds

This resulted in another file named Output.p718.svc01.srt, which was I identical to the file of the same name produced by running CCExtractor in default mode.
So after all that, it seems that the video only has one language in it - Spanish.

1 Like

At least now you know CCExtractor is lightning fast compared to using ffmpeg!

That's for sure, and I also agree with your previous assessment:

"So either CCExtractor and ffmpeg have long outstanding bugs, or our recordings say they have EIA-608 CC3 but really don't."