Listen to any kind of syndicated talk radio program and you'll
usually hear about some companion website the program has. Usually,
there are a handful of free things you can get on a program's
website, but many of these sites have a pay-to-play members' area
where the really good content is. This includes MP3 downloads of the shows,
access to live audio and/or video streams, special behind-the-scenes
content, forums, desktop backgrounds, etc.
The MP3 downloads are very convenient for people who don't have
the luxury of sitting in front of a radio (or driving a car) for a solid
three hours while a radio program is broadcast (with advertisements).
It's also a boon for people who find radio advertisements annoying.
The only problem with the MP3 downloads is that theme music and produced
portions of the program can not, by law, be included in the MP3 file
because otherwise the MP3 would be a copyright violation.
Live streams, on the other hand, are not subject to the above described
restriction because they're like a broadcast in nature. They're
not a time-shift of the original program. So, if you listen to the live
stream or even listen to a pre-recorded program as a stream, music and
produced segments may be included.
I listen to the Glenn
Beck radio program quite often. I used to download the MP3 files to
listen to in the car, but it got annoying everytime Glenn and his producers
would put together a segment like "Sportscasters at the 2031
animal-human hybrid baseball games", or "The History Of the
Democratic Superdelegates" and I would hear Glenn say,
"Listen to this... [pause] Oh man! That was great! Wasn't that
great, Stu? Oh yeah! Alright! Dan? Wasn't that just the best? Yeah.
Oh yeah."
I decided I needed to figure out how to save a stream.
I knew it was possible. Lots of software applications exist for any
operating systems that will convert audio from a live stream into a static
WAV file or similar. The open source program mplayer is one such
example.
Breaking it down
First of all, I needed to figure out how the stream content made its way
to my computer.
After I've logged into the Glenn Beck website as an
Insider, I can click a link to listen to a stream of a particular
hour of the program (or the whole program) in Windows Media format or
RealAudio format. I figured I'd have better luck extracting the audio
from the Windows Media format, so I went that route. Instead of just
clicking the link and letting my web browser find some program that could
handle the content, I saved the content to a file and then looked at the
file.
The file it saved was a fairly straightforward XML file that looked
something like this:
<ASX VERSION="3.0">
<TITLE>Glenn Beck</TITLE>
<AUTHOR>Premiere Radio Networks</AUTHOR>
<COPYRIGHT>Copyright 2008</COPYRIGHT>
<ENTRY>
<TITLE>Glenn Beck 1</TITLE>
<AUTHOR>Premiere Radio Networks</AUTHOR>
<COPYRIGHT>Copyright 2008</COPYRIGHT>
<REF HREF="mms://a0011.v67134.c6713.g.vm.akamaistream.net/7/0011/6713/v08060322/glennbeck.download.akamai.com/6713/_!/shows/2008/06/03/GLENNBECKWIN20080603.WMA?auth=blahblahblahblahblah" />
<REF HREF="http://a0011.v67134.c6713.g.vm.akamaistream.net/7/0011/6713/v08060322/glennbeck.download.akamai.com/6713/_!/shows/2008/06/03/GLENNBECKWIN20080603.WMA?auth=blahblahblahblahblahblah
</ENTRY>
<ENTRY>
<TITLE>Glenn Beck 2</TITLE>
<AUTHOR>Premiere Radio Networks</AUTHOR>
<COPYRIGHT>Copyright 2008</COPYRIGHT>
<REF HREF="mms://a0011.v67134.c6713.g.vm.akamaistream.net/7/0011/6713/v08060322/glennbeck.download.akamai.com/6713/_!/shows/2008/06/03/GLENNBECKWIN20080603_CLIP01.WMA?auth=blahblahblahblahblahblah" />
<REF HREF="http://a0011.v67134.c6713.g.vm.akamaistream.net/7/0011/6713/v08060322/glennbeck.download.akamai.com/6713/_!/shows/2008/06/03/GLENNBECKWIN20080603_CLIP01.WMA?auth=blahblahblahblahandblah" />
</ENTRY>
...and so on.
This XML defines the MMS URLs for each segment of the show. There are several
segments each hour. These individual MMS URLs are what I needed to feed to
the application that was going to convert the audio stream to a file. In my
case, I decided to use mplayer because it's just so
good at everything it does!
The command line for doing the stream-to-file conversion looks like
this:
mplayer -vc null -vo null -ao pcm:fast:file=dumpfile.wav \
'mms://a0011.v67134.c6713.g.vm.akamaistream.net/blahblahblah...'
The real magic in the above command is where I use -ao pcm to
tell mplayer to use the PCM file writer audio output driver
(instead of sending the audio to my speakers).
This gives me a WAV file which I'll want to convert to an MP3 or
Ogg-Vorbis file.
To convert a WAV file generated by the mplayer command above to
an MP3 file, I use the open source lame tool:
lame -mf -q2 dumpfile.wav GlennBeck.mp3
Or, convert it to Ogg-Vorbis (the completely open and
better-sounding-than-MP3 lossy audio codec):
oggenc -q2 --downmix -o GlennBeck.ogg dumpfile.wav
I've now covered the basic mechanical components of converting an
audio stream into an MP3 or Ogg-Vorbis file. Next I automate it all.
Automation
Because I'm a long-time Perl junkie, I investigated
how I could use a Perl script to act as the glue between the components and
get the whole process of capturing a stream and converting it to MP3 or
Ogg-Vorbis.
In the above walk-through, I manually logged into the Glenn Beck website
with my web browser. To really completely automate this puppy, I wanted the
script to log in for me. It didn't take me very long to figure out
the Perl CPAN module WWW::Mechanize was what I needed to use.
WWW::Mechanize does several handy things for the programmer.
It loads and parses web pages and can follow links, populate forms, and
other basic kinds of interaction. It keeps track of its own cookies and
session data too.
To get into the Insider area of the Glenn Beck website, members must
enter their username and password on the Insider login
page.
Looking at the HTML source for this page, I learned the form was named
"aform", the username field was named
"iUName", and the password field was named
"iPassword".
I now had all the information I needed for WWW::Mechanize to
log in:
my $agent = WWW::Mechanize->new(
cookie_jar => {},
);
my $resp = $agent->get('http://www.glennbeck.com/content/insider');
if($resp->is_success) {
$resp = $agent->submit_form(
form_name => 'aform',
fields => { 'iUName' => 'myusername',
'iPassword' => 'shhhhhhhh!', },
button => 'submit');
Walking through the code above: First, I create the
WWW::Mechanize object with an in-memory cookie jar (cookie_jar
=> {}). Next, I use the object to get() the log-in page. If
everything works well so far, I tell the object to find the form named
"aform", fill in the username and password fields, and
submit the form.
One thing I realized as I was debugging my script was that after I logged
in on the Insider page, I was immediately redirected to another page. In
order for my script to work, it needed to follow the redirect. This was an
easy fix:
my $agent = WWW::Mechanize->new(
cookie_jar => {},
redirect_ok => 1,
);
The page I got redirected to has the links on it for the streaming
audio, so I'm exactly where I want to be if I want to capture and
convert the latest and greatest Glenn Beck Program audio stream.
WWW::Mechanize can find links within the page with a variety of
methods. One of these leverages Perl's excellent support for regular
expressions. You can also search for links by the order in which they
appear. The link I'm looking for looks like this:
<a href="http://www.premiereinteractive.com/cgi-bin/members.cgi?stream=shows/GLENNBECKWIN20080604&site=glennbeck&type=win_show"><img src="http://media.glennbeck.com/images/common/header_media5off.jpg" name="icon5" width="26" height="34" border="0" id="icon5" onMouseOver="MM_swapImage('icon5','','http://media.glennbeck.com/images/common/header_media5on.jpg',1)" onMouseOut="MM_swapImgRestore()" /></a>
So, my script has the following:
$link = $agent->find_link( url_regex => qr/${datestr}.*win_show$/);
$resp = $agent->get($link);
This assumes I have a scalar variable $datestr that contains
a formatted date for the show I want to capture.
Originally, I was going to use one of Perl's several XML-parsing
modules to make sense of the XML in the stream link, but in the end all I
needed was a regular expression to extract the mms: URLs.
my $xml = $resp->decoded_content;
my (@urls) = $xml =~ m/HREF="(mms:[^"]+)"/msg;
This gives me a list of URLs stored in @urls. Now I just need
to feed them to mplayer:
$i = 1;
foreach my $u (@urls) {
my $seq = sprintf("%02d", $i);
my @cmd = ( 'mplayer',
'-vc', 'null',
'-vo', 'null',
'-ao', "pcm:fast:file=${datestr}-${seq}.wav",
$u);
system(@cmd);
if ($? == -1) {
print "failed to execute: $!\n";
}
elsif ($? & 127) {
printf "child died with signal %d, %s coredump\n",
($? & 127), ($? & 128) ? 'with' : 'without';
}
else {
printf "child exited with value %d\n", $? >> 8;
}
$i++;
}
This little ditty creates an output file for each of the segment
streams. These are named something like 20080604-05.wav.
When the loop is finished, I have several WAV files sitting on the
disk. Now I need to somehow sew them all together into one big WAV file so
I can convert it to an MP3 or Ogg-Vorbis file. For this, I turn to
sox. I decided to have the Perl script generate a shell script to
run all the sox and lame commands needed.
open FH, ">/tmp/${datestr}.sh";
foreach my $j (1..($i-1)) {
my $seq = sprintf("%02d", $j);
print FH 'sox ', "${datestr}-${seq}.wav", " -t raw - | cat >> /tmp/${datestr}.raw", "\n";
}
print FH 'sox -w -s -c 1 -r 22050 ', "/tmp/${datestr}.raw ${datestr}.wav\n";
print FH "lame -mf -q2 ${datestr}.wav ${datestr}.mp3 ";
print FH "--tt \"Glenn Beck Show - $datestr\" ";
print FH "--ta \"Glenn Beck\" --add-id3v2\n";
close FH;
Then, I run the shell script:
system('sh', "/tmp/${datestr}.sh");
Finally, I do a little cleanup:
unlink "/tmp/${datestr}.sh", "/tmp/${datestr}.raw", map({"${datestr}-$_.wav"} (1..($i-1)));
And, I'm done. There are many other ways I could have gone about
doing this, but I found a way that worked and ran with it. I'd love
to hear from people who have done something similar and how they did it.