5 November 2012

MP3, AAC, ABCDE

The problem

I love music, so I’ve got a lot of CDs. I don’t want to have a lot of CDs, though, because they take up space. While I appreciate cover art and read liner notes, I find that I don’t ever hunt down a CD to gaze at the artwork or read the notes — I’m more likely to look for information about it on the Internet. So, for a while now I’ve been considering ditching the CDs and going digital only. The question, of course, was what format to use.

The “I don’t want to risk losing anything” choice for ripping CDs is FLAC. There’s no audio loss, but you only compress the data to about half the size. That leaves you with a terabyte or two of data to deal with — and since it represents your entire music collection, you’d better keep it backed up, ideally outside your home. Backing up that much data is still a painful and expensive proposition.

So then we come to the lossy compression formats. I consider there to be two worth considering: MPEG-1 layer III (aka MP3), and MPEG-4 (AAC). Sorry, Ogg proponents, but not all my devices will play Ogg files, and the last MP3 patent expires in 5 years anyway.

A somewhat related question is where to buy music. Buying more CDs is only making the problem worse. So the main options are the iTunes store, Amazon MP3, and Google Play. I decided to test their chosen lossy compression formats.

  • The iTunes store uses 256kbps AAC.
  • Amazon uses 256kbps MP3.
  • Google Play uses 320kbps CBR MP3.

Clearly 320kbps MP3 is likely to be better than 256kbps MP3, but will it be better than the allegedly superior AAC at 256kbps? Is it worth buying from Amazon rather than Google if they’re cheaper? Just how much better is AAC really?

For ripping CDs, I had an extra format to consider: variable bit rate, where the encoder tries to use only as many bits per second as are needed to compress each given second of music. (Well, technically MP3 is chunked in frames that are a 64th of a second, but you get the general idea.) Quiet sections of music that don’t have much going on result in not many bits per second being spent; complicated sections will ramp up until the maximum 320kbps is used.

It’s possible that Amazon use VBR, and that their quoted 256kbps is an average or target rate. Unfortunately, they don’t say, and it apparently varies between albums. So, I took the approach of comparing to constant bit rate 256kbps MP3.

The procedure

I should start by saying that I’m not a ‘golden ears’ hi-fi nut. I obviously don’t believe in the inherent superiority of analog sources, or else I wouldn’t be considering going digital-only. I also don’t believe in bullshit like unidirectional current accelerators, special $1,000 digital cables, or CD demagnetizers. If you do, you should just stop reading now, because you’ll have so many issues with my testing methods that you’ll dismiss my results anyway.

My audio equipment is midrange, I don’t have a $5,000 amplifier and power-amp combo or anything like that. If you have a setup like that and listen on electrostatic headphones, your results may well differ from mine.

I started out by ripping a CD audio track to a raw wave file (WAV). I then encoded that raw audio file into each of my lossy test formats. I did it that way rather than ripping the CD with different settings, because I wanted to be sure that the encoders all had exactly the same data going in.

For the AAC encoding, I used iTunes with the “iTunes Plus” setting, because Apple’s music store offerings will be encoded with Apple’s encoder, and part of what I want to do is evaluate where I should buy music. I’ve read that Apple has a ‘pro’ encoder that they use for their content, but I don’t have access to that, so iTunes it is.

For the MP3 encoding, I used LAME. Why not iTunes? Well, mostly because in past tests, I found that LAME did a better job than iTunes at the same bit rate. Before iTunes 5 or so, iTunes had a terrible MP3 encoder; my guess is that Apple don’t see it as a big priority compared to AAC. I used the -h switch with LAME for all files; I used –vbr-new –preset standard for the VBR file, and –preset cbr for the CBR files.

I’ve seen it argued that the Frauenhofer encoder these days is better than LAME for constant bit rate encoding. At the risk of spoiling the surprise, that turned out not to be important to my conclusions.

Once I had my four encodings of the music, the next step was to decode them all into wave files again, using iTunes. Obviously I wanted to make sure that all the audio went through the same player, one that’s generally considered adequate and is used by most people.

More importantly, though, I wanted to make sure that my five files were all the same size, ready for the next step.

The thing is, when you’re comparing audio, it’s really easy to convince yourself that you can hear something, if you know what you’re supposed to be hearing. So I wanted to do a true blind test—I would make sure that I didn’t know which files were which until after I had finished comparing them.

I did this by writing a quick Ruby program which took the 5 input files, randomly shuffled the array containing their names, and then ran through the names in sequence copying them to A.wav, B.wav, and so on—while simultaneously writing a log file listing which source file had become A, which had become B, and so on.

require 'fileutils'
shuffled = ARGV.shuffle
n = 65
out = File.open("log.txt", "w")
for file in shuffled
  outfile = "#{n.chr}.wav"
  FileUtils.cp("#{file}", "#{outfile}")
  out.puts "#{outfile} = #{file}"
  n += 1
end
out.close`

I could now compare the files A thru E, make notes and pick my winner, and only then look at the log file to find out which I had picked.

The music

For my test audio, I wanted something that fulfilled some key criteria.

  1. I wanted it to be from a digitally mastered album, so that limitations of old analog recording technology wouldn’t mask areas of encoder weakness.
  2. It should be some kind of electronica, as that’s mostly what I listen to.
  3. It should have lots of retro synth sounds, with square, pulse and triangle waves. The reason for this is that MP3 encoding is based around Fourier transforms. The Fourier series for a true square wave requires infinite bandwidth, and a sawtooth is nearly as bad, whereas a sine wave can be encoded much more easily. Basically, the more near-vertical lines the waveform has, the harder it is to MP3 encode.
  4. It should have a wide range of volume levels, to check encoder performance for quiet as well as loud music.
  5. It should have significant stereo effects, as MP3 is typically encoded in joint stereo mode.

Joint stereo is a trick involving separating out the left/right pan position of the encoded frequencies, and then restoring it at playback, rather than encoding the entire left and right channel separately. It often results in noticable compression artifacts.

Basically, I was trying to pick something that was maximally likely to cause the lossy encoders to give an unacceptable result. My choice was the same as last time I did some MP3 testing: Emerge by Fischerspooner.

To see why I picked it, here’s the output from LAME’s VBR encoder:

    LAME 3.99.5 64bits (http://lame.sf.net)
Using polyphase lowpass filter, transition band: 18671 Hz - 19205 Hz
Encoding 03 Emerge (lossless).wav to 03 Emerge (lossless).mp3
Encoding as 44.1 kHz j-stereo MPEG-1 Layer III VBR(q=2)
    Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA 
 11038/11038 (100%)|    0:10/    0:10|    0:23/    0:23|   27.796x|    0:00 
 32 [   71] %*
 40 [    2] %
 48 [    0] 
 56 [    1] %
 64 [    0] 
 80 [    1] %
 96 [    3] %
112 [    3] %
128 [  300] %****
160 [ 2447] %%%%%%********************************
192 [ 1729] %%%%%%%%%******************
224 [  666] %%%********
256 [ 1457] %%%%%%%%***************
320 [ 4358] %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*****************************
-------------------------------------------------------------------------------
   kbps        LR    MS  %     long switch short %
  243.0       36.3  63.7        39.1  16.7  44.2
Writing LAME Tag...done
ReplayGain: -9.2dB

Around a third of the track was judged by LAME to need the full 320kbps to encode, and a third of it forced LAME to fall back to the less space-efficient LR encoding for the stereo image.

The listening

After some initial listening, I picked three key sections of the track for closer comparison. Section 1 was a quiet bit near the start (around 0:39 in, if you have the CD). Section 2 was a medium intensity chunk with vocals, lots of synths, and some guitar fuzzbox very quietly in the background (around 2:22). Section 3 was the crescendo (around 3:33 onward) featuring two separate sets of vocals at different stereo pan points, synths, effects, the works. There’s a bit of clipping at the end of the track, as you can see if you open the raw file in Audacity, but nothing too bad.

I used the command-line utility SOX to extract out my chosen sections, producing files 1A, 2A, 3A, 1B, 2B, 3B, all the way to 1E, 2E, 3E. File 2A was section 2 of the track, encoded via encoder A, and so on.

I took my 15 wave files, put them on a flash drive, and copied them over to my PlayStation 3. The PS3 is connected via all-digital connections; it would send the 44.1kHz 16 bit audio data to the Denon receiver, which would do all the analog stuff. Remember, all the files were pre-decoded raw audio data, so the PS3’s MP3 and AAC decoding wouldn’t be an issue. I also set the Denon to DIRECT mode, bypassing the DSP, so that wouldn’t affect the results.

I plugged in the best headphones I currently own, a set of Shure SRH-440 studio headphones. (I used to be a big Sennheiser fan, but the 440s are better than my previous midrange Sennheisers.)

I sat down with some scrap paper and listened, using the PS3 controller to jump effortlessly between my audio clips.

The results

I compared the clips in pairs, with the aim of continuing until I found a winner. I’ll now run through the comparisons in the order I did them. I’ll reveal the encodings as I go, but remember, I had no idea which audio formats matched which filenames until after I had finished.

A vs B (lossless vs AAC)

This was a walkover. Sample A won by a landslide for sections 2 and 3, where I noted that the sound was “confused”—when there was a lot going on, AAC seemed to blend it all into a bit of an unintelligible mess. Sorry, Apple, but 256kbps AAC is not CD quality.

A vs C (lossless vs 320kbps CBR MP3)

This was the comparison I had the hardest time with. I rated slightly C worse on the quieter bits, but gave it a tiny nod for the stereo imaging at the end, which I thought sounded a bit clearer. I’d have called it a draw if I hadn’t been forcing myself to pick a winner.

C vs D (320kbps CBR vs LAME –preset standard -h –vbr-new)

This was another easy one. I thought C was quite clearly better, especially when the music got raucous in the third section.

C vs E (320kbps MP3 vs 256kbps MP3)

This was another tricky one, and in this case I need to ‘cheat’ a little and interpret my notes in the light of the eventual reveal.

I preferred encoder C for the quieter parts of the music, but I wrote in my notes that in sections 2 and 3, the synth lead was darting back and forth across the stereo image. Since I didn’t know which file was the original lossless rip, for all I knew that might have been an intentional studio effect such as a stereo flanger.

Well, it turns out that no, that was an encoding artifact, rather than something all the encoders failed to deal with. So 256kbps CBR fails badly compared to 320kbps and pretty much everything else.

So the final winner was: A and C in pretty much a dead heat.

Round two

Not everyone listens to synth bleeps all the time. I wanted to see if I could get a similar ranking using encodings of orchestral music. I went to find something that was digitally recorded and mastered in a modern studio, but using acoustic instruments you’d find in an orchestra. It needed to be a manageable length, and have wide variations in volume level. I picked “The Object Is A Hungry Wolf” by Andrew Poppy, from the remastered “Andrew Poppy on ZTT” set.

Same procedure was followed as before. However, I won’t go through the pairwise comparisons, because the sad truth was that I couldn’t pick out any kind of winner when comparing any of the samples. LAME’s VBR encoder said it hardly needed more than 256kbps for any of the frames of audio, and I guess it was right.

Conclusions

I was pleasantly surprised to find that in some cases, I could hear the difference between encodings, and that the perceived quality was in line with what you’d expect given the bit rates.

The Hydrogenaudio wiki says that LAME’s standard preset is transparent, and that encoding with other settings will have no noticable effect on quality. My results seem to contradict this; I had no hesitation ranking 320kbps LAME-encoded MP3 as better than the standard preset, when using the Fischerspooner track as source material. This is a big disappointment for me, for one simple reason: I have all my music ripped and encoded via LAME’s standard preset.

Apple’s iTunes Plus AAC files are not good enough to be considered ‘near CD quality’ in my view. The difference was subtle, but noticable.

However, my Sansa Clip Zip MP3 player has noticably better audio quality than either of my iPods, even after switching to better earbuds; and my MacBook Pro has rather poor quality audio output compared to my Lenovo music server machine and my external USB audio interface. Apple aren’t the only ones who cut corners on audio either—the Google/Samsung Galaxy Nexus has sadly inferior audio compared to the Sansa, and I don’t use it for music as a result.

So I’m guessing that if you’re listening on a Mac or an iPod or your mobile phone, you probably won’t be able to detect a difference between iTunes AAC files and CD audio. For that matter, you probably won’t notice the difference between 256kbps MP3 files and CDs.

But the difference is there. So if you care about making sure you aren’t giving up sound quality even when using good equipment, my personal conclusion is that 320kbps MP3 is the way to go. As far as I can tell, 320kbps MP3 is utterly indistinguishable from lossless audio, given the best equipment at my disposal, and using music that’s hard to encode.

That means that for buying music, my preference is probably going to be Google Play, bleep.com, and any other online store offering maximum bitrate MP3 files. And I think that realistically, there’s no call for FLAC encoding everything; I might as well use MP3 and make my life a lot easier.

Shortcomings of my approach

There are some obvious limitations of the procedures I followed.

I used iTunes for all the compressed audio decoding. There may be better MP3 decoders; there are certainly worse ones. It’s possible that by using a better decoder, you could get acceptable performance at lower bit rates than I did.

I didn’t test Fraunhofer’s MP3 encoder against LAME. Maybe 256kbps would be perfectly fine if encoded using a different encoder.

I didn’t test Apple’s internal AAC encoder that they use for the iTunes store. Maybe iTunes purchases are better quality than CD rips to the same format; I’d be interested to know if anyone has compared them.

I didn’t test AAC against MP3 at similar bitrate. That wasn’t intentional, I just eliminated AAC from consideration, not knowing that that was what I was doing, before I got to the MP3. It doesn’t matter to my conclusion, but if you want to know whether you should be buying from iTunes or Amazon, I can’t answer that.

I didn’t consider better-than-CD source material. Maybe Neil Young is right, and we need more than 16 bits at 44.1kHz; or maybe Monty is right and CD is good enough. I’m reserving judgement on that until I have the chance to try a proper A/B test.

The obvious counterpoint to going with just 320kbps MP3, is that I should keep FLAC so I can encode into other bit rates, depending on device. That suggestion probably made sense a few years ago, but now that I have an MP3 player with 40GB of storage via a MicroSD slot, I can’t see myself ever caring that I’m wasting space on 320kbps MP3 when I could have gotten away with (say) 256kbps.

You might similarly say that eventually FLAC will make sense. Well, yeah, before too long we’ll have affordable terabyte flash drives for backup. But right now, FLAC is a disk-hogging pain, and since it’s unnecessary in audio terms and there’s no longer a need to keep it around for making low bitrate MP3s on demand, why bother?

Yes, MP3 is lossy. It isn’t perfect. But hey, I lived with vinyl LPs and audio cassettes for a decade or so. I don’t need perfection, and the effects of bad mastering and the loudness wars are a much bigger problem than any tiny artifact introduced by 320kbps MP3.

© mathew 2017