DV Info Net

DV Info Net (https://www.dvinfo.net/forum/)
-   All Things Audio (https://www.dvinfo.net/forum/all-things-audio/)
-   -   Does the resolution of an audio recorder really matter? (https://www.dvinfo.net/forum/all-things-audio/105493-does-resolution-audio-recorder-really-matter.html)

Martin Pauly October 14th, 2007 08:30 PM

Quote:

Originally Posted by Petri Kaipiainen (Post 758647)
Waveform "detail" which somebody here mentioned in this thread is nothing more than higher frequences.

Ah, now that makes a lot of sense - this was the piece in the puzzle that I was missing. Thanks a lot, Petri!

- Martin

Peter Moretti October 15th, 2007 04:35 AM

Quote:

Originally Posted by A. J. deLange (Post 758100)
... You have, therefore, n - 1 - 2 -2 = n-5 bits of dynamic range which is 6*(n-5) dB and, if you know the dynamic range of the sounds to be recorded you can figure out the required n. If 6*(n-5) is greater than the dynamic range of the source adding extra bits doesn't do a thing for you. This is, apparently, a hard concept to grasp but nevertheless true. Another thing to consider is that a 24 bit A/D converter is most unlikely to have 6*(24 - 16) dB more dynamic range than a 16 bit A/D converter.
...

Thank you very much for your detailed explanation. I do have a few questions.

Using your formula, a 16-bit recording should have approximately 66 dB of dynamic range. A 24-bit recording should have 114 dB. That's a 48 dB increase.

But you also state that realizing all of the 24-bit increase is unlikely because of the A/D converter.

1) What is the issue with 24-bit versus 16-bit A/D converters?

2) Is there a "real world" estimate for how much more dynamic range 24-bits actually provides.

3) Do some 24-bit recorders come close to providing the full additional 48 dB? I will be using a Sound Devices 702T which is supposed to have:

A/D Dynamic Range
114 dB, A-weighted bandwidth
110 dB, 20 Hz – 22 kHz bandwidth

While I don't know what "A-weighted bandwidth" is, do these specs mean the SD 702T can provide up to 110 dB of "real world" dynamic range?

Thanks for your help.

Petri Kaipiainen October 15th, 2007 05:25 AM

Quite frankly I do not know where A. J. deLange manages to loose his 5 bits worth of dynamic range. It is certain that the theroretical 16*6=96 dB does not hold in real practice, but there were already in the eighties some CD with real dynamic range of more than 70 dB, and with proper dithering some audio companies have been able to squeeze more than 96 dB out of 16 bits. In a word: 6*(n-5)dB: I do not buy it.

I'll test this with SD722 and a 6dB self noise mic when I get home.

1) modern 24 bit converters are just as good as 16 bit converters, 24 bit recorders have only a 24 bit converter anyway, everything is A/D:ed at 24 bits and downconverted with dither when recording at 16 bits.

2) 8*6=48 dB more. But the reality is that no analog system is quiet enough to utilize this. The best systems have about 120 dB dynamic range, so you get about 30dB more at best. And this depends greatly on the machine, some cheap 16/24 bit recorders are analog limited, 24 bits actually gives almost no quality advantage.

3) SD 7-series recorders are among the best, and even them fall 30 dB short of the theoretical maximum. But no fear: hardly anything in real life has over 140 dB of dynamic range, and NOBODY has systems to reproduce such range (and good for them, as they would be deaf already).

I have a system which can put out about 115 dB peak level across the audible range (75 liter 3.5 way custom made monitors with 500W amp, Genelec 7071 sub), but even in my separate studio/music room the noise floor is about 45 dB: usable dynamic range is under 80 dB, and my conditions are quite ideal for a home setting.

Petri Kaipiainen October 15th, 2007 09:41 AM

I made a simple dynamic range test as promised with SD722 recorder, the mic was NT1-A “The Worlds Quietest Studio Condenser Microphone" with 5 dB of self noise. The room was fairly quiet, the instrument was a hemispherical steel bowl (stolen for the purpose from our german shepard) striken with a potato smasher...

Here are the results from Adobe Audition analyze function (takes were about 15 sec with mostly silence, some spoken notes about the take and 4-5 blows to the bowl 8 inches from the mic).

---------
16 bit take:
Left Right
Min Sample Value: -30011 -30011
Max Sample Value: 30010 30010
Peak Amplitude: -.76 dB -.76 dB
Possibly Clipped: 0 0
DC Offset: -.002 -.002
Minimum RMS Power: -94.58 dB -94.61 dB
Maximum RMS Power: -4.07 dB -4.07 dB
Average RMS Power: -37.39 dB -37.39 dB
Total RMS Power: -23.84 dB -23.84 dB
Actual Bit Depth: 16 Bits 16 Bits

Using RMS Window of 50 ms

24 bit take:
Left Right
Min Sample Value: -30048 -30048
Max Sample Value: 30048 30048
Peak Amplitude: -.75 dB -.75 dB
Possibly Clipped: 0 0
DC Offset: 0 0
Minimum RMS Power: -93.31 dB -93.31 dB
Maximum RMS Power: -3.9 dB -3.9 dB
Average RMS Power: -39.78 dB -39.78 dB
Total RMS Power: -25.55 dB -25.55 dB
Actual Bit Depth: 24 Bits 24 Bits

Using RMS Window of 50 ms
----------

What we see from this is that I got a dynamic range of almost 94 dB* with 16 bit sampling, and in this case 93 dB range with 24 bit sampling. This has of course nothing to do with the sampling depth, but the room noise floor which was about 95 dB below the highest blows in this case and waried slightly**. I took the 24bit take just out of curiosity. Even in this extreme case 24 bits did not give any benefits. Which proves that 16 bits is plenty enough for final release, even if 24 bits is nice to have sometimes when recording.

I think this clearly proves that dynamic ranges of over 90 dB can be easily achieved with 16 bits contrary to what A. J. deLange claims in his post. I still can not understand where he gets his ideas, which clearly do not hold water.
----

*) from absolute peak to lowest RMS value. From asolute peak to absolute low it would have been even more, maybe 95 dB, almost the theoretical maximum.

**) the value differences really have nothing to do with 16/24 differences, but nonstandard test "procedures"...

A. J. deLange October 15th, 2007 03:43 PM

Lots of questions here. Let's start with the last first and see if we can get a little water into the bowl. The basic ideas are complex enough as to be well beyond what we can reasonably discuss here. Those in search of all the details will have to consult an engineering text which deals with the theory and application of A/D converters. First and foremost we must point out that "dynamic range", the parameter under discussion here, can be, and is, defined in a variety of ways often peculiar to the industry involved. I should point out that my experience is in telecommunications where the best (IMO) definition is a thing called the noise power ratio (NPR). Spur free signal to noise ratio, weighted signal to noise ratio (this is what "A weighting" is about) are others. In any case the concept is the same but the numbers may vary somewhat depending on the definition. For example, if we define dynamic range to be the maxium power level which the device can represent relative to the minimum then the dynamic range is 6*(n-1) dB (one bit gone for the sign). In a 16 bit A/D with the LSB encoding 1 mV the largest (magnitude) signal encodable (the "rail") is -2^15 = -32768 with power 6*15 = 90 dB above the LSB. If we define the dynamic range in terms of the ratio of the rail to the rms quantizing noise level the dynamic range becomes 90 + 10.8 = 100.8 dB.

These are valid definitions for dynamic range but not terribly useful ones. Better ones are usually motivated by questions like "What are the maximum and minimum useful signal levels my system can handle". The key here lies in the definition of "useful". It is usually defined in terms of signal to noise plus distortion ratio. A handy definition for useful is that the system should not degrade the sensor's performance by more than about 0.4 dB. This is handy because it requires that the system noise and distortion must be 10 dB below the signal power. This is the basis for setting sensor self noise at the level of the LSB. Quantizing noise is then 10.8 dB down on the sensor self noise and the sensor's signal to self noise ratio is degraded by only 0.4 dB by the fact of A/D conversion i.e. practically speaking, the A/D conversion has no effect on the quality of weak (quiet signals). If a particular set of circumstances allows the sensor SNR to be degraded by more or less than 0.4 dB then the setting of the LSB relative to the sensor noise floor can be adjusted to accomodate this requirement.

At the loud end of the dynamic range the bad guy is distortion and again the question is as to how much distortion can be tolerated. One definition is that the distortion power should not excede the self noise power of the sensor. Now note that the distortion can come from the sensor itself or from the A/D converter. The goal in system design is to choose an A/D converter whose dynamic range is greater than that of the sensor i.e. one whose quantizing noise can be set lower than the sensor's self noise but which does not overload at voltages appreciably higher than the output voltage at which the sensor overloads. To borrow from the r.f. engineer, just as the quantizing noise should be 10 dB or more below the self noise the "IP3" should be 10 dB or more above the "IP3" of the microphone. This leads to yet another definition of dynamic range: 2/3 the difference between IP3 and self noise but I have never seen this defintion applied to audo equipment. But we are interested in the A/D itself. The peak voltage in a sine wave is sqrt(2) times (3dB) greater than the rms voltage. Thus if a sinewave is applied to an A/D converter with rms power a bit more than 3 dB below the rail it should not clip. This leads to yet another set of possible dynamic range definitions: max distortion free sin wave power to LSB or to quantizing noise power. With sine loading the quantizing noise is not noise but a series of tones so this isn't a particularly (IMO) useful definition. This brings in "dithering" which is the addition of noise at the input whose function is to decorrelate the quantizing "noise" (which spreads the quantizing error tones into more noise like waveforms) at the cost of slightly reduced dynamic range. There are several ways to do this and we're now getting really esoteric.

Real signals are seldom sin waves but rather more complex and the relationship between peak and rms voltages is random. In general if the signal is the sum of several sources (band, orchestra, outdoor noises etc - but notice that I have left speech off the list) the voltage is distributed as a gaussian random variable (bell shaped histogram). Extensive modeling of A/D converters has been done for gaussian loads and this has shown that the output noise plus distortion is dominated by quantizing noise until the load approaches about 13 dB below the rail at which point overload noise rapidly takes over. Thus for gaussian loads one must not load the A/D above that level (7335 counts in a 16 bit A/D). If one does, distortion greater than the quantizing noise will be incurred violating our rule that the A/D conversion should not degrade the sensor SNR by more than 0.4 dB or so. Note that distortion power increases by at leasdt 3 dB for each dB increase in overload.

This brings us to Petri's numbers. A rms load of 4 db down on the rail in a 50 ms widow may or may not represent overload depending on the rapidity with which the transient dies out and the purity of the tone. If the dog dish rings like fine crystal then the signal is close to sinusoidal and so no overload. My dogs' (Leonbergers) dishes clunk and so while the signal probably isn't truely gaussian it certainly isn't sinusoidal either. And this brings up the point as to whether the dynamic range is based on an rms or peak value (it could be peak signal to rms noise for example). So before this gets totally out of hand here are some definitions of dynamic range which could be applied to a 16 bit A/D:

Max instantaneous signal (rail)to LSB: 15*6 = 90 dB n-1 bits
Max instantaneous signal to rms quantizing noise: 15*6 + 10.8 = 100.8 dB n + .8 bits
Max rms sinwave to rms quantizing noise: 15*6 + 10.8 - 3 = 97.8 n+ 0.3 bits
Max rms gaussian to rms quantizing noise: 15*6 + 10.8 - 13 = 81.8 n - 2.3 bits (NPR)
Max rms gaussian to 11 dB above mic noise at LSB:15*6 - 13 -11= 66 n-5 bits
Max instantaneous to 11 dB above mic noise at LSB: 79 n - 2.7 bits

The last 2 values represent (to me anyway) reasonable approximations to a good system load and are the source of the 6*(n-5) bits number from yesterday. Here I define the bottom end of the dynamic range as a signal 10 db above the mic noise and the top end as the maximum distortion free gaussian rms (66 db) or absolute instantaneous peak (79 dB) load.

Back to "A weighting". This is another definition of dynamic range based on signal to quantizing + overload noise ratio with the signal having a particular spectral distribution (there is also C weighting) derived from the spectrum of speech.

There are several reasons why adding a bit to an A/D doesn't always buy 6 dB. Among them are that the dynamic range of the analogue hardware in the A/D converter itself comes into play. Another factor is that when the quantizing noise of the A/D gets very low "phase noise" in the sample clock begins to become appreciable (the samples are not taken at precisely spaced intervals). This is not a problem in the A/D component in itself but rather the circuitry which drives it. Clocks have to be very good.

Just to be clear: I am by no means saying that 16 bits is good enough. Even the comittee which defined the CD years back recognized that. With the technology of the day it was a reasonable engineering compromise. Certainly, even though 24 bits may not grant 48 dB more dynamic range (and no, I don't know how much is real but I'd love to find out - my favorite test, the NPR, is practically speaking impossible, AFAIK, for A/D's of that depth - I've generally found an n bit A/D to have n-1.5 to n-2 effective bits but these are for A/D's that clock much faster than audio A/Ds) it is certainly appreciably better than 16. This is provided that one does not make the number one mistake in loading A/D's which is setting the sensor self noise higher than the LSB so that it's "toggling a reasonable number of bits". This gains nothing and throws away dynamic range at the top.

Wow!

Ty Ford October 15th, 2007 08:44 PM

Good 24/48 is better than bad 24/48.
Good 24/48 may be better than bad 24/96.

Regards,

Ty Ford

Dan Brockett October 15th, 2007 11:07 PM

Some input
 
Hi all:

Good thread with a lot of knowledge imparted. My .02 worth from 10 years selling consumer and professional audio gear is that of course, a higher bit rate is more desirable, if the quality of the A/D and D/A convertors are good.

Most acoustic musicians can speak, at least in rudimentary terms, about the benefits of overtones that occur at exponential frequencies at much higher frequencies than can be heard by human ears. Many people feel that this the main factor that makes a Stradivarious, a Stradivarious and what makes a fine piano, a fine piano. All other things being equal, it makes sense to me that overtones, occuring at musical intervals, add depth and richness and DO affect the frequencies that we can hear at. Sympathetic overtones are an important part of high quality audio.

Whether or not overtones extra frequency content that occurs at musical intervals relate directly to extended sample rates is highly debatable. Most proponents of formats like SACD are champions of the sound because they say that the formats are much smoother. more linear and more "analog sounding" than CDs or 48Khz sampled sources. After years of working with the relatively bad audio sound quality of Beta SP, I think that it is debatable also as to whether most of your audience can tell any difference between high quality sound versus low quality/resolution sound. Most the cars I drive today sound hilarious when I look at the CD player/radio and realize that the "boom sizzle" coming from the speakers is coming out while the tone controls are set to the middle of the dial, a supposedly neutral setting (no cut/no boost usually). With the popularity of iPods, cheap computer speakers, awful sounding car stereos, etc., I believe that the average consumer has almost no ear for audio quality so pursuing sound for picture to a 24/96 level is mostly underappreciated by the audiences.

Unless the project is shown in a theatrical setting with a high quality sound system, I feel that 16/48 is perfectly adequate for dialog-based material that most of your audience will hear on a badly setup, highly inaccurate EQ'd home theater system.

Personally, I feel that the era of the audiophile is over anyway. The CD and iPod prove that convenience trumps sound quality for 99% of people these days.

Dan

Mike Peter Reed October 16th, 2007 04:49 AM

The way I am reading all this ... there is a very very minor advantage to recording at 96kHz because even if your mic cannot resolve above 20kHz, it will be subject to zero attenuation at the top end. But - this is dependent upon whether the recorder reconstructs lower sampling rates from its highest, and how well it does it. If it reconstructs well, 48kHz should be absolutely indistinguishable from 96kHz. If it reconstructs badly (eg Fostex FR2?) then best record with the higher rate to begin with if you are looking for the best reproduction.

As to 96kHz taking up twice as much space - that's really not an issue in the 21st century where CF is available at 64GB and counting. (and 8GB is pretty cheap).

As we see mics that can resolve above 20kHz (eg the Sennheiser 8000 series and no doubt others) then 96kHz (or beyond) could become the common denominator. I am skeptical about using 96kHz for dialogue, but I do it anyway because on the indie shows I work on one minute I'm doing dialogue, the next scene could be somebody playing the cello, then some wild sound fx ... so 96kHz for me is a sweet spot and means I don't hand in recordings of different sample rates each day, or forget to swap sampling rate between scenes.

I'm not advocating recording 96kHz as standard, far from it. I do what works for me, and the production (mainly the editor and his NLE).

The audiophile is dead, long live the audiophile.

Petri Kaipiainen October 16th, 2007 05:50 AM

The irony of 24/96 "super hi-fi" is the fact that you can not get both 24 bit dynamics and 40kHz frequency range at the same time. A mic which can resolve frequencies of over 20 kHz must have a small diaphragm, but small diaphragm mics have inherently bad s/n ratios. Even the very best large diaphragm high voltage condensors do not deliver anywhere near 24 bit resolutions. Special mics reaching to 40kHz and beyond are typically at least 20 dB worse (see Sanken and DPA sites).

Even then 24 bits has it's value as level setting safety feature, but 96 kHz sampling really adds NOTHING audible to the recording. But as it harms nobody, use it if it make you happy (or if you are a bat).

A comments to a previous post: those over 20 kHz high frequency components adding something audible to the musical signal like purists claim: yes, they do make lower frequency interference components, but as those are withing the recording system's specifications they get recorded just as they are heard; there is no need to record the higher make-up components.

Final fact: I have not seen any scientifically valid double blind test made where test subjects could even tell a 16/44.1 AD-DA conversion from the original high quality analog live signal. Or where they could distiquish 16/48 from 24/96 SACD. I think this proves that 16/44.1 or 48 is plenty good enough final format for everything. Most of the time (100% actually) it is not the tecnical specs of the system but raw reality (mic quality, mic placement, acoustics, backround noise, reproduction system, listening room acoustics & noise floor) which sets the real limits. Using time and money and intellectual capacity (?) like we do here to worry about 48kHz sampling not being good enough is total waste of time...

A. J. deLange October 16th, 2007 06:01 AM

It's been intimated a couple of times and said a couple of times here. Let me say it again. The ultimate test is a double blind triangle (ABC) test.

In most music we hear today the compression scheme has done lots more damage to the signal than the A/D conversion ever did.

Petri Kaipiainen October 16th, 2007 06:57 AM

And we have to remember that DV audio is pristine uncompressed WAV (this is a VIDEO discussion board...) at better than CD quality. And that while HDV vastly improves the picture quality, the audio side is severelly compromised by lossy MP2 compression at about 1:5 ratio. Even then it is passable.

It is indeed an irony that while some lone souls advocate super-hifi standards like SACD because they think CD quality is not good enough (I think it is based not on ears but "because it is there" syndrome), the buying public buys not even CD:s but MP3 files, audibly inferor to CD:s...

And about AD conversion doing something to the signal: it certainly does, but it also certainly makes it possible to record the signal in a vastly superior way, and cheaply, compared to ANY $$$$$$ analog system ever invented. I have no complaints at all, bless you AD converter!

Gints Klimanis October 16th, 2007 10:53 AM

Quote:

Originally Posted by Emre Safak (Post 757954)
... but doesn't the DAW process the audio internally at a higher resolution, like 32-bit float? As long as you acquire with a high enough resolution to exceed your equipment's SNR, shouldn't you be fine?

32-bit floating point is still 24-bit resolution, just with a larger range.
Your point about processing resolution exceeding source resolution is important.

Mike Peter Reed October 16th, 2007 01:28 PM

4:4:4

Because it is there?

Petri Kaipiainen October 16th, 2007 01:44 PM

Bad analogy. 4:4:4 preserves somethign we can see. 96kHz sampling preserves something that is there, but not for us to hear.

rather: 16/48 WAV, because it is there (not MP2 or MP3)...

Peter Moretti October 16th, 2007 10:10 PM

Quote:

Originally Posted by Ty Ford (Post 759499)
Good 24/48 is better than bad 24/48.
Good 24/48 may be better than bad 24/96.

Regards,

Ty Ford

Ty,

I think there is a sentiment being expressed that very good 16/48 ~= 24/48. I don't know if that's true or not.

But to get very good 16/48, it seems to me that you'd be using a 24-bit recorder just set to 16-bits.

So while the "good 16 is all you need" arguement might be true, I don't know if it really matters pratically. With either option, you'll need a high quality recorder, which will invariably be 24-bit capable.


All times are GMT -6. The time now is 02:52 AM.

DV Info Net -- Real Names, Real People, Real Info!
1998-2024 The Digital Video Information Network