What is the full command you used?
Try this one
ffmpeg -f lavfi -i "movie=input.ts[out0+subcc]" -map s output.srt
Okay,
ffmpeg -f lavfi -i "movie=input.ts[out0+subcc]" -map s output.srt
was much faster than:
ffmpeg -f lavfi -i input.ts[out+subcc] -map 0:1 output.srt
and it gave me a 101 KB file. When I compare them side-by-side they look identical - both in Spanish..
For CC3 you need to add the -data_field second
When I tried:
ffmpeg -f lavfi -data_field second -i input.ts[out+subcc] -map 0:1 output3.srt
a while back, I got:
No such filter: 'input.ts'
input.ts[out+subcc]: Invalid argument
So then I ran a mash-up:
ffmpeg -f lavfi -data_field second -i "movie=input.ts[out0+subcc]" -map s output2.srt
and got thousands of errors saying,
"[Closed caption Decoder @ 000001f73386af00] Data Ignored since exceeding screen width"
plus a 20 KB file consisting of:
1
00:00:00,000 --> 00:00:13,881
<font face="Monospace">{\an7}nHELa Herencia Un Legado de Amor</font>
2
00:00:13,881 --> 00:00:29,830
<font face="Monospace">{\an7}nHELa Herencia Un Legado de Amor</font>
3
00:00:29,830 --> 00:00:45,779
<font face="Monospace">{\an7}nHELa Herencia Un Legado de Amor</font>
4
00:00:45,779 --> 00:01:01,728
<font face="Monospace">{\an7}nHELa Herencia Un Legado de Amor</font>
...
etc. all the way up to line 197.
Upload the video and I will take a look.
Closed Captions are Spanish only on their website. Not that it means anything.
Looks like -data_field in ffmpeg doesn't work as expected. Someone made another patch to pull out CC3: [FFmpeg-devel] ccaption_dec: add support for multiple channels
EDIT: Actually it says -data_field 1 should work. That sample had three captions instead of just one or two. So seems like a different case altogether.
Regarding "La Herencia, Un Legado de Amor" - I noticed that in this video header as well. That is a completely different show so maybe the header has some remnants of another program that was used as a template.
I've gotta step away for a couple hours. I'll try -data_field 1 when I get back.
I downloaded the latest CCExtractor 0.94 and ran a report against my recording.
It says it has both CC1 and CC3, but when I asked it to extract CC3 I got an empty .srt file.
REPORT
> ccextractorwinfull -12 -out=report "T:\recording.mpg"
File: T:\recording.mpg
Stream Mode: Transport Stream
Program Count: 1
Program Numbers: 5
PID: 2058, Program: 5, AC3 audio
PID: 2059, Program: 5, AC3 audio
PID: 2060, Program: 5, H.264 video
//////// Program #5: ////////
DVB Subtitles: No
Teletext: No
ATSC Closed Caption: Yes
EIA-608: Yes
XDS: Yes
CC1: Yes
CC2: No
CC3: Yes
CC4: No
CEA-708: Yes
Services: 1 2
Primary Language Present: Yes
Secondary Language Present: Yes
MPEG-4 Timed Text: No
EXTRACT
> ccextractorwinfull -12 "T:\recording.mpg"
T:\recording_1.srt has captions
T:\recording_2.srt is an empty file
I'll try for the CEA-708 captions tomorrow.
I got CCExtractor 0.94 as well, but it was the GUI version. It's not clear to me how to get a report like yours.
I've uploaded the original file (before renaming to input.ts) to the Drop Box linked above.
I tried using -data_field 1 in a few different contexts:
ffmpeg -f lavfi -data_field 1 -i input.ts[out+subcc] -map 0:1 output.srt
ffmpeg -f lavfi -data_field 1 -i "movie=input.ts[out+subcc]" -map 0:1 output.srt
ffmpeg -f lavfi -data_field 1 -i "movie=input.ts[out0+subcc]" -map s output.srt
The first one still says,
No such filter: 'input.ts'
input.ts[out+subcc]: Invalid argument
The second one produced the same issue I mentioned in post #25 above (English audio and subtitles are actually in Spanish - #25 by tluxon)
The third one resulted in essentially the same output as the second.
I downloaded the portable windows version zip file.
Running ccextractorwinfull.exe without parameters shows the help test
So either CCExtractor and ffmpeg have long outstanding bugs, or our recordings say they have EIA-608 CC3 but really don't.
So trying to extract CEA-708's
ccextractorwinfull -svc 1 T:\recording.mpg
and
ccextractorwinfull -svc 1,2 T:\recording.mpg
and
ccextractorwinfull -svc all T:\recording.mpg
All result in closed caption output in the file T:\recording.p5.svc01.srt
So there are CEA-708 captions in Program 5, Service 1
ccextractorwinfull -svc 2 T:\recording.mpg
Results in no output
So would assume there are no CEA-708 captions in Program 5, Service 2
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: T:\recording.mpg
[Extract: 0] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[CEA-708: 1 decoders active]
[CEA-708: using charset "none" for service 2]
[Timing mode: Auto] [Debug: No] [Buffer input: Yes]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]
-----------------------------------------------------------------
Opening file: T:\recording.mpg
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
Changed fps using NAL to: 59.940060
Found large gap(6118306) in PTS! Trying to recover ...
Found large gap(6118322) in PTS! Trying to recover ...
Found large gap(6118314) in PTS! Trying to recover ...
Found large gap(6118310) in PTS! Trying to recover ...
Found large gap(6118308) in PTS! Trying to recover ...
Found large gap(6118312) in PTS! Trying to recover ...
Found large gap(6118318) in PTS! Trying to recover ...
Found large gap(6118316) in PTS! Trying to recover ...
Found large gap(6118320) in PTS! Trying to recover ...
Premature end of file - Transport Stream packet is incomplete (expected 188 bytes, got 112).
100% | 59:59
Number of NAL_type_7: 2766
Number of VCL_HRD: 0
Number of NAL HRD: 2766
Number of jump-in-frames: 1798
Number of num_unexpected_sei_length: 0
Total frames time: 00:59:59:028 (215726 frames at 59.94fps)
Min PTS: 14:10:36:717
Max PTS: 15:10:35:937
Length: 00:59:59:220
Done, processing time = 4 seconds
Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues
Interesting. So did you see the text added to the T:\recording.p5.svc01.srt file? What language(s)?
English, but that's to be expected as it's a PBS recording of Nature.
I also used command line curl to record 30 minutes of PBS direct from my HDHR Prime tuner and I'm seeing the same thing. Shows it has EIA-608 CC1 & CC3, CEA-708 svc1 & svc2, but I can only extract EIA-608 CC1 and CEA-708 svc1.
I ran a ccextractor report on my file, input.ts::
> ccextractorwinfull -12 -out=report input.ts
File: input.ts
Stream Mode: Transport Stream
Program Count: 1
Program Numbers: 718
PID: 5393, Program: 718, H.264 video
PID: 5395, Program: 718, AC3 audio
PID: 5396, Program: 718, AC3 audio
PID: 5403, Program: 718, MPEG-2 User Private
//////// Program #718: ////////
DVB Subtitles: No
Teletext: No
ATSC Closed Caption: Yes
EIA-608: Yes
XDS: Yes
CC1: Yes
CC2: No
CC3: Yes
CC4: No
CEA-708: Yes
Services: 1 2 3 4 5 6
Primary Language Present: Yes
Secondary Language Present: Yes
MPEG-4 Timed Text: No
Next I ran it for default output.:
> ccextractorwinfull input.ts -o output.srt
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: input.ts
[Extract: 1] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[CEA-708: 63 decoders active]
[CEA-708: using charset "none" for all services]
[Timing mode: Auto] [Debug: No] [Buffer input: Yes]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]
-----------------------------------------------------------------
Opening file: input.ts
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
Changed fps using NAL to: 59.940060
Found large gap(10336508) in PTS! Trying to recover ...
Found large gap(10336506) in PTS! Trying to recover ...
Found large gap(10336516) in PTS! Trying to recover ...
Found large gap(10336512) in PTS! Trying to recover ...
Found large gap(10336510) in PTS! Trying to recover ...
Found large gap(10336514) in PTS! Trying to recover ...
XDS Notice: Program is now La Herencia Un Legado de Amor
XDS Notice: Network is now Univision
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-14 (Parents Strongly Cautioned)
XDS:
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-G (General Audience)
XDS:
XDS Notice: Program is now
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-14 (Parents Strongly Cautioned)
XDS:
XDS Notice: Program is now Los Ricos Tambien Lloran
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-G (General Audience)
XDS:
XDS Notice: Program is now
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-14 (Parents Strongly Cautioned)
XDS:
XDS Notice: Program is now Los Ricos Tambien Lloran
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-G (General Audience)
XDS:
XDS Notice: Program is now
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-14 (Parents Strongly Cautioned)
XDS:
XDS Notice: Program is now Los Ricos Tambien Lloran
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-G (General Audience)
XDS:
XDS Notice: Program is now
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-14 (Parents Strongly Cautioned)
XDS:
XDS Notice: Program is now Los Ricos Tambien Lloran
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-G (General Audience)
XDS:
XDS Notice: Program is now
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-14 (Parents Strongly Cautioned)
XDS:
XDS Notice: Program is now Los Ricos Tambien Lloran
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-G (General Audience)
XDS:
XDS Notice: Program is now
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-14 (Parents Strongly Cautioned)
XDS:
XDS Notice: Program is now Los Ricos Tambien Lloran
XDS Notice: Program is now
XDS: ContentAdvisory: US TV Parental Guidelines. Age Rating: TV-G (General Audience)
XDS:
Premature end of file - Transport Stream packet is incomplete (expected 188 bytes, got 84).
100% | 60:59
Number of NAL_type_7: 2925
Number of VCL_HRD: 0
Number of NAL HRD: 2925
Number of jump-in-frames: 1827
Number of num_unexpected_sei_length: 0
Total frames time: 01:00:59:022 (219322 frames at 59.94fps)
Min PTS: 23:57:03:288
Max PTS: 24:58:02:676
Length: 01:00:59:388
Done, processing time = 7 seconds
Issues? Open a ticket here
https://github.com/CCExtractor/ccextractor/issues
Two files were produced - output.srt and output.p718.svc01.srt.
Output.srt was a file that had the subtitles in Spanish with no font information.
Output.p718.svc01.srt was a file that the same Spanish subtitles but with font information.
Based on the naming, I assumed the second file contained the CEA-708 services all combined into one file.which was confirmed by running:
> ccextractorwinfull -svc 1,2,3,4,5,6 input.ts
CCExtractor 0.94, Carlos Fernandez Sanz, Volker Quetschke.
Teletext portions taken from Petr Kutalek's telxcc
--------------------------------------------------------------------------
Input: input.ts
[Extract: 0] [Stream mode: Autodetect]
[Program : Auto ] [Hauppage mode: No] [Use MythTV code: Auto]
[CEA-708: 6 decoders active]
[CEA-708: using charset "none" for service 1]
[CEA-708: using charset "none" for service 2]
[CEA-708: using charset "none" for service 3]
[CEA-708: using charset "none" for service 4]
[CEA-708: using charset "none" for service 5]
[CEA-708: using charset "none" for service 6]
[Timing mode: Auto] [Debug: No] [Buffer input: Yes]
[Use pic_order_cnt_lsb for H.264: No] [Print CC decoder traces: No]
[Target format: .srt] [Encoding: UTF-8] [Delay: 0] [Trim lines: No]
[Add font color data: Yes] [Add font typesetting: Yes]
[Convert case: No][Filter profanity: No] [Video-edit join: No]
[Extraction start time: not set (from start)]
[Extraction end time: not set (to end)]
[Live stream: No] [Clock frequency: 90000]
[Teletext page: Autodetect]
[Start credits text: None]
[Quantisation-mode: CCExtractor's internal function]
-----------------------------------------------------------------
Opening file: input.ts
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
Changed fps using NAL to: 59.940060
Found large gap(10336508) in PTS! Trying to recover ...
Found large gap(10336506) in PTS! Trying to recover ...
Found large gap(10336516) in PTS! Trying to recover ...
Found large gap(10336512) in PTS! Trying to recover ...
Found large gap(10336510) in PTS! Trying to recover ...
Found large gap(10336514) in PTS! Trying to recover ...
Premature end of file - Transport Stream packet is incomplete (expected 188 bytes, got 84).
100% | 60:59
Number of NAL_type_7: 2925
Number of VCL_HRD: 0
Number of NAL HRD: 2925
Number of jump-in-frames: 1827
Number of num_unexpected_sei_length: 0
Total frames time: 01:00:59:022 (219322 frames at 59.94fps)
Min PTS: 23:57:03:288
Max PTS: 24:58:02:676
Length: 01:00:59:388
Done, processing time = 6 seconds
This resulted in another file named Output.p718.svc01.srt, which was I identical to the file of the same name produced by running CCExtractor in default mode.
So after all that, it seems that the video only has one language in it - Spanish.
At least now you know CCExtractor is lightning fast compared to using ffmpeg!
That's for sure, and I also agree with your previous assessment:
"So either CCExtractor and ffmpeg have long outstanding bugs, or our recordings say they have EIA-608 CC3 but really don't."
It appears that KUNS stopped including the English captions on CC3 in the past few years, so it looks like the program I was referring to in the original post never held English captions on CC3 even though Spanish captions are now in both CC1 and CC3.
I have a number of episodes from this station that were recorded in 2015 and they DO have English captions that work as expected - with Spanish on CC1 and English on CC3. I am now struggling to preserve them if I re-encode the video files to about 1/10 their original size of over 6GB/hour. I haven't been able to get CCExtractor to extract anything from CC3 even though they play great in VLC.