15 January 2013

Ripping redux

A while back I wrote about double-blind testing various MP3 bitrates in order to decide what format to rip CDs to. The short summary of my testing was that I could easily hear the difference between 320kbps MP3 and lower bitrates, but that the difference between 320kbps and lossless was tougher to hear, at least under the circumstances of the test.

However, as a result of what I learned, I decided to rip everything to lossless FLAC files. I’ve also been playing FLAC files rather than MP3s, and I’m pretty sure that I can distinguish even 320kbps from lossless—but only with particular sections of particular pieces of music.

I reached this conclusion because when listening, I had moments where I heard familiar tracks with unfamiliar clarity. They were pieces of music which I had listened to many times as MP3, and now listened to for the first time in years as lossless audio.

The first time I noticed the phenomenon, the moment of clarity was from Sophie Trudeau’s violin, as featured on a Godspeed You! Black Emperor album. In retrospect this makes sense, as violins have a very jagged waveform, a comparatively high pitch, and subtle harmonics. The fact that the MP3 encoder was also having to deal with everything else that was going on underneath the violin must have made the task even harder for it. So if you want to do your own double-blind tests, I’d suggest some music featuring violins.

Sophie Trudeau

However, most of the times when I suddenly heard the music anew, it was because the stereo image was much clearer with the lossless files. Instruments were clearly separated, and the separation didn’t drift. I suddenly remembered an interview with Laurie Anderson in 2011, in which she talked about how MP3 damaged the stereo sound field. At the time I was skeptical, but now I think she’s right. My listening tests had even hinted at this, with the stereo image being one of the things that had distinguished my 320kbps sample.

The problem is that two of the tricks MP3 uses to compress the audio relate specifically to the stereo image. The first trick is to combine the left and right signals in certain frequency ranges into a single mono signal, on the grounds that the human ears can’t determine direction at those frequencies. This is the same logic used to justify having a single subwoofer on your home audio system. I’ve never been entirely convinced by it; I have a dual-subwoofer setup in the living room, and on tracks that have lots of low frequency stereo effects, you can definitely tell. Try the start of Orbital’s “In Sides”, for example.

The second MP3 trick is to convert the separate left and right channels into a combined L+R channel which records most of the music, and a smaller L-R channel which records the differences between left and right that determine the stereo image. This is an old trick used in FM radio.

So MP3 is basically filtering out a big chunk of the stereo image, then taking what’s left, separating it out from the music, and squeezing it to be fairly low resolution. It’s not hard to see how this could do bad things to the stereo image of a piece of music. It’s also clear that for many people, it’s not important—people still listen to mono, and plenty of people listen to stereo on equipment that has inadequate spatial separation between the left and right speakers.

I’ve also pretty much gotten CD ripping down to a science.

Choice of software

For fast and accurate ripping, Exact Audio Copy on Windows is your best bet. On Linux, I’m sorry to say that your best bet is Exact Audio Copy running under WINE. The closest platform-native open source alternative is Morituri, but it’s painfully slow. It can take an hour to rip a CD that EAC will rip in a few minutes. I don’t know why it’s so slow, as cdparanoia is pretty much as fast as EAC; but for some reason, it keeps speeding up and slowing down the drive during the ripping process, rather than pulling in the entire track in one go.

One place where Morituri wins over EAC is metadata. EAC pulls all its info from freedb, which is full of fairly inaccurate data entered by any anonymous yahoo who uses software like EAC. I’ve found that feeding it something obscure like The Hafler Trio will often result in it substituting random metadata for some other release. Morituri uses the Musicbrainz database, which is much more accurate, but also a fair bit less complete.

Musicbrainz have their own application, Picard, which can be used to look up CDs and to add richer metadata to ripped audio files. I’ve generally found that the best approach is to rip with EAC, ignoring minor metadata crappiness, and then use Picard to correct the information later.

Stupid CD tricks

There’s another thing Morituri does better than EAC, but it’s of rather limited importance: when a CD includes audio hidden in the track 1 pre-gap, Morituri automatically rips it as track zero.

This pre-gap audio trick is used to put hidden tracks on some CDs. If you play or rip the disc normally, you don’t hear the track. If, however, you start playing track 1 in a CD player and then immediately rewind back past time 0:00, you find the hidden audio.

Clever, but ultimately kinda annoying. Wikipedia has a useful list of albums with hidden pre-gap tracks.

Often, later releases of a CD got rid of the pre-gap offset and had the extra material as a regular track. Sometimes it depends on country. The original German release of Rammstein’s “Reise, Reise” apparently has a recording from the black box recorder of a plane crash as a pre-gap track, but the US release includes it as normal audio at the start of track 1.

Hardware

It helps to have more than one CD drive to rip with. Sometimes a scratched disc will rip better on a particular drive.

There’s also really no telling which discs will have errors during ripping. My copy of Negativland’s “Free” had one tiny scratch which somehow made the last track fail to rip—but a CD single by Propellerheads that was covered in literally dozens of scratches ripped perfectly.

A lot depends on the CD’s manufacture. Sometimes you’ll get a disc where the hole is punched off-center; this is often apparent from the louder noise it makes in the drive. In that case, turning down the drive speed can help extract troublesome tracks. Also, CDs play from the middle outwards, so the later tracks are more likely to suffer errors if the hole is slightly off.

The worst problem is pinholes. If you hold a CD up to the light and see a lot of holes in the aluminium surface, you’re probably going to have trouble getting a good rip. My copy of “Force Majeure” by Tangerine Dream has no scratches at all, but pinholes make track 3 rip with errors.

Scratched CD

Dealing with errors

Not every error is a problem. Different pressings of CDs have slightly different mastering, so it’s not uncommon to rip a CD and find that every track is flagged as not matching what’s in the AccurateRip database. The time to beware is when only one or two tracks fail to match, and the rest are as expected—that means you probably have an actual error.

First, try a visual inspection. See if you can find the likely dirt or scratch, bearing in mind that (as mentioned earlier) CDs play outwards from the middle. Focus your cleaning attempts on that specific area, rather than doing anything to the whole disc that might make the situation worse.

Try a microfiber cloth, and rip the track again. If that doesn’t improve the rip, the toothpaste method is worth a try. I’ve got one of those fancy rotary disc resurfacing things, and it has never fixed a problem that toothpaste couldn’t fix.

If there are no scratches and the error is due to pinholes, no amount of cleaning or polishing will help. However, the errors won’t necessarily be audible. Try ripping the track a few times, ideally with different drives, and look for two rips that have the same MD5 checksum. Listen to that rip, and decide if it’s good enough.

If all else fails, the good news is that everyone’s getting rid of their CDs right now, so there are bargains to be had on Amazon and in used CD stores.

Backing up

First of all, you might be tempted to set up an expensive RAID array. Don’t bother. RAID does two things: it helps ensure high availability, and it helps performance. Neither of these is likely to be a concern for your music collection. Streaming and decompressing lossless audio from disc is trivial for any modern computer, and it won’t kill you if you have to take your music files offline for an afternoon to restore a backup.

So no, you don’t need RAID. Just a backup on another disc somewhere. And RAID is not a backup. Putting your files on a RAID array will not prevent them from being accidentally deleted, or from becoming corrupted due to human error or software error.

To guard against corruption, you could use a fancy filesystem that has integrity checks, like ZFS. However, FLAC files have their own checksum internally, computed on just the audio data. You can use the command-line FLAC tool’s ‘–test’ mode to verify that the audio data is still the same as when the file was encoded.

That just leaves the other problem, of detecting if entire files go missing. You could use a tool like tripwire, but I ended up writing my own script in Ruby to periodically compare the contents of a music directory against a file listing filenames and FLAC checksums. It outputs a list of files which have been added since it was last run, files which have moved, and files which have been deleted. The moved files are detected by comparing checksums, thereby allowing me to shuffle and reorganize files and improve metadata without causing the code to think lots of files have been deleted. If there’s a better tool for the task, I’d be interested to hear about it.

Photo credits: jaswooduk, the girl who owns the world.

© mathew 2017