Last issue, we took a look at some big claims that Neil Young made back in February about the sound quality of contemporary music files. Within 48 hours, several other journals rekindled this dusty old story as well.
By April 4th, Rolling Stone ran a much–cited piece detailing how Neil Young filed trademarks for terms including “Studio Quality Sound” late last year. Meanwhile, engineer Allen Farmello wrote a heartfelt editorial for the Tape Op blog, unambiguously titled “The Problem with A-B’ing and Why Neil Young is Right about Sound Quality.”
I was excited to see this story get a little extra attention in some major outlets. It was also interesting to realize just how eagerly this topic is still debated, despite how long the debate has been raging, and how much great science has been done on audio perception over the last several decades.
In the past month, Allen’s Op-Ed inspired almost 20,000 words worth of comments which ran the gamut of classic arguments from both sides of the age-old Subjectivist v. Objectivist debate, thanks to a visit from Ethan Winer, author of the new book The Audio Expert.
Allen took the “subjectivist” side, bringing up critiques of AB-style tests like the one we presented last issue. Ethan took the more empirical “objectivist” stance, reminding readers that “blind testing is the gold standard in all branches of science, especially for things that relate to perception,” and questioning why audio should be “exempt for some reason”.
They both raise fair points, and it was heartening to see an online debate that remained so friendly and intellectually honest. Although these debates can become heated in some quarters, there’s no real reason for the two sides to be at odds. These two camps have even informed one another on issues like brickwall filtering and harmonic distortion, and in the end, we can take the best from both perspectives to find even better ways to learn which sonic distinctions are meaningful, and which ones are a whole bunch of noise.
More on that in a minute. First, the results of our audio poll.
Can trained listeners hear the difference?
Last month we presented you with a couple of high-definition WAV files and a couple of contemporary AAC files, and asked if you could hear the difference in a blind test.
There’s no better audience for this kind of thing than TMimaS readers. You come from a special breed of audio nuts and music lovers. You’re the kinds of people who like to read 3,000 word love-sonnets to classic condenser microphones, histories of esoteric albums, and profiles of seminal producers.
The majority of you are audio engineers, professional musicians, and ambitious hobbyists, and I figured that if anyone would be able to tell these file types apart, it would be you guys.
So, how did you do?
Well… please accept my warm congratulations to the 49% of you who guessed right.
That’s right: even among our readers, the results came out no better than a coin flip. And we didn’t even need a huge sample size to get a result that’s consistent with the tremendous mountains of research already done in this field.
Although thousands read the article, those of you with the confidence to actually vote in the official poll barely numbered in the hundreds. What’s especially telling is that those of you who considered themselves “trained listeners” in our quiz did no better than those who self-reported as “untrained listeners”. In a subtle audible distinction did exist, this shouldn’t be the case.
Editor’s Note: If you’d like to find out how you did, the high-definition WAV file was choice A for song #1, and choice B for song #2. If you voted, you should have an email with your original answer in it.
But please remember: this simple test can’t be used to prove whether or not you can tell the files apart. Only a blind ABX test could hope to suss that out for you.
What does this prove?
High-definition proponents
argue that simple AB tests like this one aren’t entirely conclusive, and
I agree. An AB test can’t completely rule out unconscious preferences,
if they do exist. We need other studies to address that.
With that said, there are a few things that tests like these do show beyond reasonable doubt. Foremost, that Neil Young’s most radical claims about contemporary sound quality are exaggerated at best.
Worst consumer format ever… or among the best?
Although I love Neil Young more than may be healthy, all the available evidence runs counter to statements like this one:
“We’re in the 21st century and we have the worst sound that we’ve ever had. It’s worse than a 78 [rpm record]”
10 years ago, I would have been on Neil Young’s side all the way, but in 2012, a high-resolution AAC file can deliver measurably greater fidelity in the audible spectrum than the majority of consumer formats ever did. This includes the 78 rpm record, not to mention things like AM and FM Radio, wax cylinder, 8 track cassette and phonautograph.
Don’t get me wrong. I love vinyl records. I love them for a whole slew of reasons, despite the fact that they have their own sonic limitations and that the majority of systems are equipped with needles and cartridges that cause significant distortion and fidelity loss.
Even accounting for all that, vinyl records are satisfyingly tangible, they sound damn cool, and – perhaps most important to the listeners’ subconscious – they sound familiar.
As Nobel-winning cognitive scientist Daniel Kahneman writes in his masterwork of perception, Thinking Fast and Slow, “Familiarity breeds liking.”
So go ahead, please, keep buying vinyl. It’s a great format. It’s just not great for the reasons Neil Young has been telling you.
“Everything is amazing and nobody is happy.” (Louis CK)
It’s strange that I should find myself coming across as a champion for digital technologies. I’m not.
I still like reading books made out of paper, and I still like mixing on an analog console, to tape if possible. I love the sound of cassettes and 78rpm jukeboxes, and I love the look of The Empire Strikes Back on the VHS tape I dubbed off Channel 11 when I was a kid. (It comes complete with vintage adverts for claymation dinosaur specials).
But this doesn’t mean any of these formats sound “closer to the source”. Although professional tape recorders are pretty hi-fi compared to many historical mediums, I enjoy mixing to a 1/2” Studer tape deck at 15ips, not because it sounds exactly like what I feed into it – but because it doesn’t.
The effect of a Studer tape
deck on a mix is very subtle and untrained listeners may miss it on a
conscious level. However, it’s still a meaningful difference, and any
good engineer should be able to identify the sound of a Studer in a
blind ABX test, at least at 15ips, if not at 30ips, where it becomes a
little more difficult.
When you take into account that very subtle differences like these are apparent in blind ABX tests, the fact that trained listeners can’t hear the difference between contemporary AACs and higher resolution formats becomes even more telling.
To date, no trained listener has ever reliably picked out a properly encoded 320kbps AAC file from any higher-resolution file in a blind test. (But I’d still love to try, if someone wants to quiz me!)
While we can’t yet rule out the possibility that there may be subconscious differences we have yet to pick up on, it’s clear that if any difference does exists, it’s more subtle than the sound of some of the best tape machines ever made, and subtle enough to have escaped the ears of our best listeners.
5% of the sound quality… or nearly 100%?
Even some of the Neil Young remarks that seem true enough on the surface turn out to be suspect when you look at them closely.
Young says that we now have “5% of the information” and “5% of the sound quality” of the source master, alluding to the fact that data compressed files can be from anywhere 5% to 60% the size of an original master.
Not only does this 5% figure suggest that most commercial masters are recorded and mixed at the super-high resolution of 24-bit/192khz (they’re not), it also ignores the fact that we can just as easily say AACs use 0% of the original data to recreate the sound, much as a vinyl record uses 0% of the electrons found on the original tape master.
In “lossy” compressed formats, the data coding is so different that it isn’t even “reduced” as much as it’s “completely rewritten in a way that takes up less space.”
While it’s true that MP3s can contain as little as 5% of the bits in extreme examples, when we to listen to the output of a high-resolution AAC against a 24-bit WAV, all tests reveal that good AACs certainly retain 98% or more of the original sound quality. (And that’s being generous to the naysayers.)
This is quite a feat when you think about it. When a tape machine falls slightly out of calibration, or as a vinyl disc or record needle wears out, it’s impossible to approach this level of fidelity. Any good engineer would likely hear those differences in a blind AB test immediately.
This isn’t to say that higher-resolutions don’t have their benefits. They certainly do – particularly at the recording, mixing, mastering, and archival stages. All that I want to do is ask whether we’re focusing on the improvements that make a difference, or chasing red herrings.
The front of the donkey… or the back?
Thankfully, simple listening tests easily dispel Young’s most dangerous piece of advice: That to improve the experience of music fans, we should focus on what he calls “the front end of the donkey” (file formats) rather than “the back end of the donkey” (playback systems).
The truth is that digital audio engineers have been focusing on file formats, working hard to improve them for the past decade. In 2002, I would have been crusading right alongside Neil, but today the weakest link in the average listener’s experience isn’t his high-resolution AAC file – it’s his playback system.
Listeners today often experience their music through laptop speakers and earbuds, and they hear it through D/A converters and analog signal paths that leave a lot to be desired. Up to a point, even small improvements in those areas can make a tremendous difference. The kind of difference that music fans can hear in an instant that makes them feel like we aren’t lying to them.
If Neil Young does eventually release a high-end audio player, this is where the real benefits are to be had. Prior high-def formats like DVD-Audio and SACD failed not because of format wars, but because – independent of more meaningful improvements – consumers just didn’t care about the unnoticed increase in bitrate.
Whenever I read backasswards remarks like “Neil gets it right. We’ve got good listening devices, but the underlying product is crap,” as Bob Lefsetz wrote in February, it makes me want to pull my hair out until I look like, well, Bob Lefsetz.
The strengths and limitations of A/B tests
As much as AB tests have their strengths, I’ll be the first to admit they have their limitations as well. Before we explore those, let’s make sure we understand how they work.
In his Op Ed, Allen argues that AB tests can encourage poor listening habits by enticing us to rapidly flip between two sources in a vain attempt to hear subtle differences.
While I agree that rapid flipping is a terrible listening strategy, I’m not certain that Allen has identified a testing limitation with that point.
Whenever I’m able to hear subtle difference between nearly identical sounds in AB tests, it’s because I stop thinking and listening on a conscious level. It’s much better to treat the sound like a 3D poster or Zen koan, allowing your mind float off, and letting each sound wash over you for 15, 30, 45 seconds or more.
For those of you who think that 15 or 30 seconds could still be considered “rapid flipping”, I dare you to pause for a moment and slowly count to 30 right now. Go ahead. I’ll wait.
(Didn’t finish? That’s okay. Just remember that this is not the mental state you want to be in when you’re listening critically.)
The Strengths of blind A/B tests
Using techniques like this one, all sorts of listeners are able to distinguish extremely subtle differences between sounds.
There are even studies that show untrained listeners can unconsciously pick out 1 dB volume increases in blind A/B tests. For those of you who aren’t great with dB scale, this is roughly equivalent to the sound level of a flea farting.
Because of their ability to help us overcome the placebo effect and confirmation bias, blind AB tests have the power to help us make important decisions – and to keep us from making bad ones.
When a new set of Burl converters arrived at Strange Weather, owner Marc Goodman, engineer Daniel James Schlett and I level-matched them with the old converters and blind-tested each other repeatedly to discover if we could hear a genuine improvement. You’d have to be crazy to spend $20,000 on a set of converters without being sure it’s worth it.
Thanks to blind AB testing, we were able to discover that this new investment wasn’t snake-oil at all. If we hadn’t been able to do this, the Burl would have gone right back to the dealer, and the money would have been spent on something else that did make a noticeable difference while listening blind.
So, while Allen is right to say that quickly flipping between sounds is an unbelievably dumb listening strategy, it doesn’t negate the usefulness of blind AB tests in helping us to separate the differences that matter from the ones that we invent in our highly suggestible minds.
(For more on that topic of just how powerful self-suggestion can be, definitely read Nobel winner Daniel Kahneman’s Thinking Fast and Slow. It’s so good it gets two plugs in this article.)
The Limitations
Where Allen does bring up a great critique is in his reminder about the dangers of the audio “sip test”, which I’ve written about before.
He suggests that conclusions we draw about the differences between two sounds in the short-run may not always hold in the log run. When it comes to this, I couldn’t agree more.
That’s why last month, I asked readers to think of a type of test that would adhere to the rules of logic while taking these entirely valid criticisms into account. In the discussion that followed Allen’s rebuttal on TapeOp.com, a commenter named Colin G put it as well as anyone else did or could:
“I take the author’s point about A/B testing, but any meaningful comparison has to be blind; tell listeners one is better and they’ll hear a difference even if both are identical.
If you want to prove hi-res formats are better, send test subjects away and make them live with each for a month, but, crucially, don’t tell them which is which.
That is the minimum you need to say there’s a genuine benefit to hi-res audio.”
That’s the one! A good
old-fashioned blind randomized trial, just like the ones used in
psychology and in medicine. I made this same suggestion to Ethan Winer
in an email conversation last February.
“It would be interesting to see a test where a large group of listeners are randomly assigned either a [high-definition] music library or a [AAC] library. The listeners would have their usage tracked and answer questionnaires about their enjoyment over the course of several months. (Blind, of course) …
There’s a real danger that lies in the potential for us scientifically-minded folk to make a mess of things by becoming too reductionist in our conclusions.
Food scientists in the 19th and 20th centuries did some truly great things, and they helped feed billions more human beings than we once thought the earth could support. But at the same time, they led us backwards in a few ways, and helped bring about the prevalence of foods that were less nutritious than their predecessors, due to the limitations of the tests of the day.
Only after newer, better-designed studies did the bulk of mainstream science turn back in the favor of whole grains over refined, dietary nutrients over vitamin supplements, and pro-biotic treatments over certain drugs.
That’s why I’d love to see studies that look at the long-term effects of greater file resolutions on listening pleasure and quality of life. Even if they still come back with the same results we already have, it would be the scientifically rigorous thing to do!”
A study like this may seem frivolous at first glance, but when you take into account the immense cost of creating and marketing a new technology the cost is negligible.
(Not to mention the drain in resources, batteries and bandwith that would come from millions of listeners switching needlessly to a file format 2 to 20 times larger.)
Focus on the things that matter.
In the meantime, let’s face it: If you’re a trained listener and you find yourself rapidly flipping back and forth between two sources, desperate to identify some kind of difference, maybe it’s because that difference isn’t very meaningful at all.
While any passionate listener is wise to push the limits of her listening through non-blind practice and blind testing, when she comes up against a difference that seems inconsequential, isn’t it best to focus on the big wins instead?
If you move a mic 4” along the radius of any speaker cab, you’ll have identified a difference that any concerned listener could hear in a blind A/B test. The same is true for swapping a condenser mic for a dynamic mic, adjusting the release settings on a compressor, or finding the pocket of the groove. These are the choices that matter in the recording studio.
When it comes to listening for pleasure, isn’t it better to invest money in upgrades that are sure to be an improvement, rather than in one that’s “maybe better in theory… kinda… but, oh wait a minute, science says that even in theory it’s not much better at all”?
Buy yourself some great headphones or speakers instead and you’ll have a tangible connection to your music that will smoke the competition in any A/B test.
The hidden danger of the hi-def crusade
While it’s fantastic that Neil Young is trying to make recorded music feel more valuable to consumers by fighting to improve the listening experience, his current strategy may be having the opposite effect.
In the days following Young’s unsubstantiated trash-talk, blogs all over the world were abuzz with sycophants eager to denigrate the quality of the playback formats that they rely on every day. Based on the results of listening tests, it’s unlikely any of them took even a few minutes to compare formats for themselves first.
If we mislead listeners into believing that they’re buying a product that’s “the worst we’ve ever had”, how do we expectthem to be excited about spending their hard-earned money on new music that they’ll love?
All Neil Young has done so far is help devalue a remarkably transparent-sounding technology that will continue to be the dominant format for years to come.
Editor’s Note:
As a reminder of just how long it takes consumers to adopt new technologies, in 2011, CD albums outsold download albums by more than 150%.
This is despite the fact that today’s downloads are more convenient, cost-effective, eco-friendly, and (at long-last) essentially indistinguishable from its higher-resolution source.
As much as we love the unique sound and feel of vinyl and tape cassette, you can’t say those things about either.
As far as recording formats go, I’d happily listen to Neil Young’s Harvest on anything from a wire recorder to a laser disc and love it. On that album, a group of people came together and made meaningful choices. They knew how to go for the big wins and it shows, whether you’re listening on a 3” AM radio or on real-deal studio monitors in a controlled environment.
I for one, am glad that they didn’t wait around for the invention of a consumer format better than the vinyl disc to convince people it was music worth buying. At 256kbps, it’s worth every penny of its $9.99 pricetag. Probably more. Just don’t tell me it’s trash until you close your eyes, listen, and compare.
Justin Colletti is an audio engineer and journalist. He is a staff writer for SonicScoop and managing editor of Trust Me, I’m a Scientist.