One of our favorite ways to travel is on a river cruise – the ships are small, you’re always within sight of land, the food is good, and the ports are interesting. Our favorite river cruises have been on AmaWaterways in Europe and Southeast Asia, and we’ve got three trips planned with them through 2023. And we’ve got a favorite travel agent for Ama cruises – Dave Natale, aka River Cruise King.
Dave sends out a lot of information, which is a good thing. Lately, he’s branched out and is producing video talks on YouTube, like this one – they’re interesting and informative, but they overlap with his other communications, and it’s hard to skim a video to find the new information.
YouTube automatically subtitles videos, and it’s easy to download the subtitles along with the video using a program like Downie or youtube-dl. I wanted to extract the text from the subtitle file so I could read it. The subtitle file is in XML, but I’d already had to figure out how to convert the subtitles from their native form into SRT (SubRip Subtitle) so I could use them on Plex – an SRT file looks like this:
3 00:00:13,200 --> 00:00:14,559 [Music]hi everybody 4 00:00:14,559 --> 00:00:17,199 hi everybodyit's friday so that means it's time for 5 00:00:17,199 --> 00:00:17,840 it's friday so that means it's time foranother
It’s easy to extract just the text (it’s every fourth line), but there’s a lot of overlap from line to line, which made it hard to read. So I wrote a very simple, stupid, and probably inefficient program to eliminate the overlap and create a spreadsheet with two columns, like this:
|00:00:14||it’s friday so that means it’s time for|
|00:00:17||round table talk with the river cruise|
Then it’s easy to open up the spreadsheet, read the text, and know exactly where to start watching the video for the parts that are the most interesting to us.
Dave’s latest video is 42 minutes long – it took less time than that to write and debug the program!
I’ve been told that patience is a virtue, but I’m still waiting to have that proven to me.