Compression Question: From 1080 to 720. How does it work?

David Heath · February 28th, 2016, 07:09 AM

Quote:

Originally Posted by Jon Fairhurst

But don't discount information theory when it comes to oversampling. 1-bit audio doesn't just work in some conditions. SACDs just plain work. You can in fact trade sampling frequency for bit depth.

I'm not arguing against most of the basic principles there. And maybe the best example of "1-bit sampling" is the traditional way photographs were reproduced in newspapers when the printing process was "black ink or nothing" - so a photograph which appeared to have grey scales was composed of black dots.

I've also heard a suggestion put forward for very high frame rate TV. The problem for such is obviously the huge amounts of raw data. Inter-frame coding is obviously one way forward, but another suggestion is for one bit coding within each frame, and gradations of tone for a pixel to be conveyed by how many frames have it "white" and how many "black". Obviously we are talking about very high frame rates indeed, but in theory the principle is as you say in both those cases - oversampling spatially in the first case, temporally in the second to trade sampling frequency for bit depth.

Quote:

Originally Posted by Jon Fairhurst

https://en.wikipedia.org/wiki/Oversampling
"Oversampling improves resolution..." (In this case, they mean bit depth resolution.) https://en.wikipedia.org/wiki/Oversampling#Resolution

That is really talking about audio sampling in the time domain, but yes, I'm sure a lot of the principle holds good in the spatial domain.

But note it puts some maths to it: "The number of samples required to get n bits of additional data precision is
number of samples = (2^n)^2 = 2^2n."

So if we want to move from 8 bit to 10 bit, (2 bits of additional data precision), that formula predicts us to need 2^4 times as many samples - 16x as many! Not 4x.

And even then it qualifies it.

Quote:

This averaging is only possible if the signal contains equally distributed noise which is enough to be observed by the A/D converter.[3] If not, in the case of a stationary input signal, all 2^n samples would have the same value and the resulting average would be identical to this value; so in this case, oversampling would have made no improvement.

Which is more or less what I said earlier about "we have to think how the four pixels in your example are derived, and whether it's a good real world model".

So whilst I don't disagree with the basic principle of what you're saying - that oversampling can be traded for better bitdepth - I do disagree with simply saying "downscale a 8 bit 4K signal to FHD and it can be considered as 10 bit."

The above formula predicts that the BEST that could be hoped for is a "9bit" signal, and even this is dependent on circumstance. You'd need to be talking about 16x as many samples to really get 10 bit, in other words, 8K.

Jon Fairhurst · February 29th, 2016, 12:59 PM

Good catch on the formula; however, the article first says this:

When oversampling by a factor of N, the dynamic range increases by log2(N) bits, because there are N times as many possible values for the sum.

So the basic formula says four times the samples gives two more bits of dynamic range.

The next statements about noise are in the context of an A/D or D/A converter. The A/D is the equivalent of a camera sensor system. They assume that for a given A/D technology, if you speed up the clock, it will have a shorter sampling time, which would increase noise. This is similar to increasing the resolution of a video sensor. That makes each pixel smaller, so the noise increases.

But the context I'm presenting is signal-only. I'm not comparing the 4K downsampled signal to what you would have gotten had the sensor been 2K with its inherent lower noise. In our case, we have a given signal with its given noise depending on camera, ISO, etc. We are just looking at the extra dynamic range without comparing it to an engineering tradeoff with a lower res, lower noise camera.

And yeah, the part about needing distributed noise is important. This is a problem in synthetic media, but not typically with real scenes as the signal varies. One real pixel might be 0.1 shy of the recorded value while the next is 0.1 too hot. Scene variation gives us that randomness, even when the noise is quantized. But yeah, don't apply a heavy handed noise reduction to create plastic faces before the downsampling. The downsampled signal would show that same, low resolution, low noise, inaccurate face tone. So, yeah, conditions need to be right.

Also keep in mind that a good digital low pass filter (used for downsampling) has many taps across many samples horizontally and vertically. So each new 2K pixel gets a small contribution from a wide range of 4K pixels. This helps ensure that the noise contribution is random as each new pixel gets fed by more than its nearest neighbor.

The bottom line is that one "can" get the equivalent of more bits of information by downsampling, but only if that additional information hasn't already been lost.

David Heath · February 29th, 2016, 03:32 PM

Quote:

Originally Posted by Jon Fairhurst

Good catch on the formula; however, the article first says this:

When oversampling by a factor of N, the dynamic range increases by log2(N) bits, because there are N times as many possible values for the sum.

So the basic formula says four times the samples gives two more bits of dynamic range.

That's not how I read it. Even before that quote, the article states that:

Quote:

For instance, to implement a 24-bit converter, it is sufficient to use a 20-bit converter that can run at 256 times the target sampling rate. Combining 256 consecutive 20-bit samples can increase the signal-to-noise ratio at the voltage level by a factor of 16 (the square root of the number of samples averaged), effectively adding 4 bits to the resolution and producing a single sample with 24-bit resolution.

In other words, to increase the bitdepth by 4 bits - here from 20-24 bits - it's necessary to have 256x as many samples. Which is in line with the later formula I used ("The number of samples required to get n bits of additional data precision is: number of samples = (2^n)^2 = 2^2n.")

So in that case we're talking about 4 extra bits - so n=4 - so we need 2^2*4 as many samples. In other words, 256. This is all consistent with needing 16x as many to get an equivalence with 2 extra bits.

What you quote above should be seen as an interim step.

The way I see it is that in your earlier example (4 values of 13,13,13,14, averaging to give 13.25) it's an idealised example, which may not be typical - and is not likely to be. The next block of 4 may be 13,13,14,14 and give an averaged value of 13.5, the next may be 13,13,13,13 and so on on a statistical basis. It's only when you get up to 16 samples that statistically you can realistically expect three times as many "13" values as "14". (In practice I'd expect other values such as 12 and 15, but the average to become more predictably 13.25.) But this always assumes that what is really 13.25 gets digitised randomly, and not always perfectly to the nearest integer - when it would always be 13, and the average of any number of samples will then always be 13 exactly. But I think we're agreed on that.....?

Jon Fairhurst · February 29th, 2016, 09:30 PM

What's weird is that SACD (1-bit) samples at just 64 times 44.1 kHz, yet competes with DVD-A, which is up to 24 bits at 192 kHz. Of course, it's sigma-delta, which is a bit different than PCM, but still...

And regarding the 13.25 example, that's where the many samples in the filter help out. It's not just nearest neighbor, but point taken that it relies on a random distribution to work.

February 29th, 2016, 12:59 PM	#17
Jon Fairhurst Inner Circle Join Date: May 2006 Location: Camas, WA, USA Posts: 5,513	Re: Compression Question: From 1080 to 720. How does it work? Good catch on the formula; however, the article first says this: When oversampling by a factor of N, the dynamic range increases by log2(N) bits, because there are N times as many possible values for the sum. So the basic formula says four times the samples gives two more bits of dynamic range. The next statements about noise are in the context of an A/D or D/A converter. The A/D is the equivalent of a camera sensor system. They assume that for a given A/D technology, if you speed up the clock, it will have a shorter sampling time, which would increase noise. This is similar to increasing the resolution of a video sensor. That makes each pixel smaller, so the noise increases. But the context I'm presenting is signal-only. I'm not comparing the 4K downsampled signal to what you would have gotten had the sensor been 2K with its inherent lower noise. In our case, we have a given signal with its given noise depending on camera, ISO, etc. We are just looking at the extra dynamic range without comparing it to an engineering tradeoff with a lower res, lower noise camera. And yeah, the part about needing distributed noise is important. This is a problem in synthetic media, but not typically with real scenes as the signal varies. One real pixel might be 0.1 shy of the recorded value while the next is 0.1 too hot. Scene variation gives us that randomness, even when the noise is quantized. But yeah, don't apply a heavy handed noise reduction to create plastic faces before the downsampling. The downsampled signal would show that same, low resolution, low noise, inaccurate face tone. So, yeah, conditions need to be right. Also keep in mind that a good digital low pass filter (used for downsampling) has many taps across many samples horizontally and vertically. So each new 2K pixel gets a small contribution from a wide range of 4K pixels. This helps ensure that the noise contribution is random as each new pixel gets fed by more than its nearest neighbor. The bottom line is that one "can" get the equivalent of more bits of information by downsampling, but only if that additional information hasn't already been lost. __________________ Jon Fairhurst

February 29th, 2016, 09:30 PM	#19
Jon Fairhurst Inner Circle Join Date: May 2006 Location: Camas, WA, USA Posts: 5,513	Re: Compression Question: From 1080 to 720. How does it work? What's weird is that SACD (1-bit) samples at just 64 times 44.1 kHz, yet competes with DVD-A, which is up to 24 bits at 192 kHz. Of course, it's sigma-delta, which is a bit different than PCM, but still... And regarding the 13.25 example, that's where the many samples in the filter help out. It's not just nearest neighbor, but point taken that it relies on a random distribution to work. __________________ Jon Fairhurst