Sound in the cloud: streaming music presents new challenges to audio quality.
The shift from downloading to streaming is tectonic by digital standards, and, like the transition from optical media to downloads before it, it will be accompanied by much teeth gnashing and lawyer wrangling as royalty payment standards are worked out as much in the courts as in the smoke-free back rooms of Palo Alto, Calif.. In fact, London's Financial Times asserts that attempts by Spotify and others to launch in the U.S. have been delayed mainly by the complexity of negotiating licensing terms with record companies and music publishers, not the underlying technology.
The Sound of Streaming Music
But what the transition means for the audio quality of music is less murky to parse. Just as download files were finding an audiophile channel, with bulked up 256Kbps versions and even lossless WAV, FLAC, and ALAC files making their way around the internet, streaming comes along and presents music with an even broader landscape.
"Wouldn't it be nice if one of these days people who write music and people who write code could actually get together with each other?" asks Robert Reams, a tech entrepreneur who sold his most recent company, Neural Audio, to DTS, Inc. 2 years ago. He says that the processing of music is going to have to change to accommodate streaming. Reams says that spatial (imaging) and temporal (timing) differentials between the left and right axis of music are to blame for most of the artifacts that lossy codecs such as MP3 and AAC create when they process music.
"Dynamic spatial and energy offsets within the content affect the coding efficiency of modern lossy codecs," Reams explains. "Any time the content is mixed [in such a way as to create these conflicts], the sound may be rendered needlessly flawed. Linear formats [i.e., magnetic media] are forgiving in regards to sloppy spatial mapping, perceptually irrelevant spectra, and noise. Lossy coding is not."
The offsets Reams refers to are created as part of the music mix, which is implicitly unbalanced by nature. Differences of as little as 1 dB between the same information in each stereo channel can cause the codec to view it as an error rather than a work of art. Ideally, the effect of a codec should be addressed at the point the music is mixed (and MP3 developer Fraunhofer-Gesellschaft debuted something exactly like that at the NAMM Show in Las Vegas in January, which we'll get to in a minute). But, Reams acknowledges, with the biggest news of last year being the arrival of The Beatles on iTunes, the vast majority of music that will be streamed will be legacy content, anything from a year old to 5 decades old.
"If the mastering [deck's] dead azimuth was not perfectly aligned, the audio on the left channel may not be perfectly synchronized with the audio on the right, a little before or a little after. An offset of even one sample can generate enough energy to induce a problem with the [codec] processing," he explains.
Tim Carroll, founder and president of Linear Acoustic, Inc., which makes processing equipment used in streaming and file delivery, emphasizes that starting with a clean track is crucial to music surviving the streaming process with as much integrity as possible. "Most codecs are pretty benign these days--AAC at 224Kbps compared to what we've been used to is pretty spectacular, and the HE-AAC data rate can be lower but still maintain the same quality," he says.
However, the data rates needed to stream fully lossless music files are high and are not going to be the norm in the streaming environment anytime soon, especially to mobile users. Thus, Carroll stresses, the music-mastering stage of the record-making process will be critical, and it will have to work even harder to avoid migrating the problems caused by the ongoing "loudness war" to the streaming environment. The phrase refers to an unspoken, but very real, competition on the part of music content distributors (i.e., record labels) to digitally master and release recordings with higher real, and perceived, levels of loudness. Labels and artists are looking for records that stick out aurally, simply by being louder, whether on the radio or in a club. The practice goes back decades, to the days when Motown Records founder Berry Gordy Jr. would have his staff engineers analyze the top 10 records every week, including how relatively loud they were, so he could have a benchmark for the loudness of Motown' singles. The result is a loud record, perceptually speaking, but it is often sonic mush, artistically speaking, with the dynamic range of each individual instrument blunted and subsumed by the force of the track's overall level.
"When you master a [track] to the point of clipping, every transient, like the kick drum, becomes a square wave, and the processing engine is going to have the hardest time trying to code that, because [clipping] creates harmonics that distract the codec," Carroll explains. "That's just bitrate wasting, because the codec is having to work harder. Codecs choke on clipping artifacts. In a low bitrate environment like streaming, the name of the game is cleanliness from the beginning, free of clipping and square waves and overloads; stuff you can't recover from."
Reams agrees that too much heat in the mix can overwhelm the codec's processing ability. "When people are trying to get that big FM radio sound on their tracks, they're applying something line an Orban or dbx [audio] compressor, but applying too much [audio] compression"--"brickwalling" the mix, in the jargon, using very fast attack and release parameter settings--"and what you get instead is overmodulation as a result of overall aggressive energy compression. It's nice and loud, but it creates sideband points in the spectrum--you put in one tone but you get three or more points on the spectrum that the codec needs to encode. The more frequencies it has to process, the more noise gets added, or it starts overquantizing. That reduces the individual presence of each instrument."
Goran Tomas is a broadcast and streaming specialist working in Zagreb, Croatia. He says that while there is an array of potential codecs that streaming music can use, the consensus has focused on AAC for high-bitrate streaming and AACplus for lower-bitrate environments. AAC supports inclusion of 48 full-bandwidth audio channels of up to 96 kHz in one stream plus 16 low frequency effects (LFE, limited to 120Hz) channels, up to 16 "coupling" or dialog channels, and up to 16 data streams. Its effectiveness increases as the variable bitrate goes up. AACplus, aka HE-AAC (high-efficiency AAC) is optimized for streaming rates down to below 64Kbps and is often used for digital and internet radio streaming.
"HE-AAC uses spectral band replication techniques to artificially regenerate higher frequencies," Tomas explains. "It's not quite the same level of audio quality as you would get from MP3 or AAC at higher [throughput] levels, but in a low-bandwidth environment, below 64Kbps, AACplus is more efficient, provides full-frequency response, and performs quite exceptionally well."
As the bitrate environment goes up, so does codec performance. AAC encoded music at 96Kbps compares favorably to FM radio broadcasts, and according to listening tests by the EBU (European Broadcasting Union), at 128Kbps it is nearly as transparent as a compact disc.
Preconditioning the music file prior to encoding will become a more important step as music moves into the commercial streaming model. Even if all content intended for streaming application is subjected to the steps outlined previously for mixing and mastering, there will be inevitable differences between individual song files. Tomas recommends the use of a broadcast-type multipurpose processor, such as the Telos Omnia Audio ONE, A/XE, or Orban Optimod-PC, which can apply dynamic compression, limiting, automatic gain control (AGC), and equalization. "You want to give the codec as high a level as you can before distortion and as consistent an audio file as possible, without enormous dynamic changes," he says. "Also, as the [input] level gets lower, the rate of artifacts seems to increase. The level should be consistent and near the upper end of the meter, but not over-limited. Otherwise, the codec will create what we call 'overshots'--exaggerating the existing transient peaks in the program material, because codecs filter in narrow frequency banks. So you don't want to feed the codec near 0 dB full scale, because the codec will generate overshots and since they don't have any more headroom and they will clip, causing distortion and degrading the audio quality." Tomas recommends a codec input level of no more than -3 dB full scale.
Maximizing the efficiency of the codec's processing is a key concern. Tomas recommends using a look-ahead limiter for final peak level control. Any clipping generated by even minor overshots creates harmonics that the codec will interpret as information, wasting processing power and encoding time. He also suggests applying a low-pass filter, a process that codecs will do as part of their processing algorithms. But filtering slightly below the codec's programmed frequency filtering cutoff can ease the load on the codec. This allows it to increase processing efficiency in the spectrum where most of the information is located and minimize artifact generation. For instance, low-pass filter an AAC codec, which will automatically cut off at 16 kHz at 96Kbps, at slightly below 16 kHz. "Artifacts are more likely to occur at higher frequencies," Tomas says. "Filtering reduces the burden on the codec."
Kyle Wesloh, manager of broadcast operations and production for Minnesota Public Radio and American Public Media, streams both prerecorded and live music content. When the same content goes to the internet and to air, the streaming signal is sent through the Telos Omnia Audio ONE multiband system that it is processed through as though it were an FM signal, Wesloh says. That's because they've discovered that streaming music audiences are often looking for an experience similar to what they get from FM radio. One outcome of that has been to raise the output of their streamed content to bring it closer to that of the broadcast signal. "We started comparing the signals about a year ago and realized that the streaming signal was quiet compared with the broadcasts, and with other stations streaming online," he explains. "The original thinking was, the stream was going to be the place where the music was safe, where it could have the dynamics that get compressed for FM broadcast. But the reality is, the stream was being listened to like you listen to radio--in the car, in the office, surrounded by background noise--and our streaming signal wasn't performing well in that environment."
Of course, that sets up the potential for the kind of issues that we've already discussed: Codecs want to see a decent amount of level but not so much as to create overshots and artifact-inducing harmonics. Wesloh says they were aware of that and tried to steer a course that would optimize the codec processing but still offer a louder and somewhat dynamic outcome. They apply a light touch of AGC and compression via the Telos unit to smooth the dynamics and encode for an outcome normalized to -3 dB lull scale. For live music, such as the station's broadcasts of the Minnesota Orchestra, they take an even lighter approach, running the signal through a Crane Song audiophile type of stereo compressor. This smoothes it out enough to minimize the more aggressive kind of processing it gets from the Telos or the Orban professional-grade processors.
The result sounds more like broadcast audio, says Wesloh. "And that's what audiences want." And meeting audience expectations is going to be critical in the increasingly competitive streaming environment. "Ubiquitous connectivity to the car is happening now," says Wesloh, who had just returned from the CES show in Las Vegas, where the automobile was evident as the new front in the media wars. "When you're in the car, streaming content has to sound like you're used to it sounding from the car radio," he says.
Optimizing music for streaming might have just become a bit easier, thanks to a new plug-in from Sonnox Ltd. using codecs from Fraunhofer-Gesellschaft, the folks who brought us MP3. Introduced at the NAMM Show in January, the new Sonnox Fraunhofer Pro-Codec plug-in enables content producers and mastering engineers to audition codecs in real time, instead of having to encode each mix to MP3 or AAC, preview it, tweak it, and then re-render it. The entire process is accomplished on-the-fly, and it is intended to let the engineer focus on producing a precompensated, optimized mix. The Pro-Codec plug-in enables mix and mastering engineers to audition up to five codecs in real time within a DAW (digital audio workstation) environment and batch encode to multiple formats simultaneously. All major codecs, including MP3, MP3 Surround, AAC-LC and HE-AAC are supported, as are lossless codecs such as MP3-HD and HD-AAC.
Sonnox--the U.K. pro audio spinoff that originally developed the Sony Oxford digital audio console--created an intuitive FFT display to illustrate the input signal, output difference signal, and a graphical indication of the audibility of codec-induced noise. Bitstream integrity meters indicate potential decoding overloads. Instant A-B auditioning enables engineers to switch artifact-free between codecs. A blind listening mode (ABX) augments codec comparisons. They seem to have thought of everything--the $499 Pro-Codec plug-in is compatible with Pro Tools, Logic Studio, Cubase, Nuendo, SONAR, Sequoia and WaveLab, and both Mac and Windows platforms are supported. In the same month, Fraunhofer announced it partnered with Texas Instruments, Inc. (TI) to allow MPEG Surround audio streaming on TI's DSP platform. Using the technology, internet broadcasters and other streaming music services will be able to provide a surround audio experience at bitrates as low as 64Kbps for 5.1 channels. Surround audio's six tracks are downmixed in the codec to a stereo pair plus a metadata track that provides the decoder with the proper time, coherence, spatial, and other parameters to re-create the surround field for the listener.
This looks like it will be streaming's big year. Music companies are hoping to hear the jingle of profits from their streaming subscriptions. And subscribers should be listening to a sonically improved streaming experience.
Dan Daley (firstname.lastname@example.org) is an experienced journalist and author, covering the business and technology of the entertainment industry for more than 17 years. His work has appeared in numerous publications including Billboard, The New York Daily News, Mix magazine, GRAMMY magazine, American Way, Spin, USA TODAY, and many others.
Comments? Email us at email@example.com, or check the masthead for other ways to contact us.