Dithering part three – real world 2D quantization dithering

In previous two parts of this blog post mini-series I described basic uses mentioned blue noise definition, referenced/presented 2 techniques of generating blue noise and one of many general purpose high-frequency low-discrepancy sampling sequences.

In this post, we will look at some more practical example – use of (blue) noise in 2D image dithering for quantization!

You can find Mathematica notebook for this post here and its pdf version here.

Bit quantization in 2D

Finally, we are getting to some practical use case. When you encode your images in 8 bits (typical framebuffer) – you are quantizing. When you encode your GBuffers, render targets and others – you are quantizing. Even typical 8 bit is enough to cause some banding on very smooth, slow gradients – like skies or lens flares.

We will cover here a more extreme example though – extreme 3 bit quantization of a linear gradient.

2dditheringorigsignal.png

2dditheringquantizenodither.png

We call those quantization artifacts – 8 visible bands the “banding” effect.

As we learned in previous parts, we can try to fix it by applying some noise. At first, let’s try regular, white noise:

2dditheringquantizewhitenoise.png

Doesn’t look too good. There are 2 problems:

  • “Clumping” of areas, identical to one we have learned before and we will address it in this post.
  • Still visible “bands” of unchanged values – around center of bands (where banding effect was not contributing to too much error.
2dditheringwhitebands.gif

Error visualized.

Those bands are quite distracting. We could try ti fix them by dithering even more (beyond the error):

2dditheringquantizewhitenoisex2.png

This solves this one problem! However image is too noisy now.

There is a much better solution that was described very well by Mikkel Gjoel.

Using triangular noise distribution fixes those bands without over-noising the image:

2dditheringquantizewhitenoisetriangle.png

Use of triangular noise distribution.

Since this is a well covered topic and it complicates analysis a bit (different distributions), I will not be using this fix for most of this post. So those bands will stay there, but we will still compare some distributions.

Fixed dithering patterns

In previous part we looked at golden ratio sequence. It is well defined and simple for 1D, however doesn’t work / isn’t defined in 2D (if we want it to be uniquely defined).

One of oldest, well known and used 2D dithering patterns is so called Bayer matrix or ordered Bayer. It is defined as a recursive matrix of a simple pattern for level zero:

1 2

3 0

With next levels defined as:

4*I(n-1) + 1  —– 4*I(n-1) + 2

4*I(n-1) + 3  —– 4*I(n-1) + 0

It can be replicated with a simple Mathematica snippet:

Bayer[x_, y_, level_] :=
Mod[Mod[BitShiftRight[x, level], 2] + 1 + 2*Mod[BitShiftRight[y, level], 2],
   4] + If[level == 0, 0, 4*Bayer[x, y, level – 1]]

2dditheringbayer.png

8×8 ordered Bayer matrix

What is interesting (and quite limiting) about Bayer is that due to its recursive nature, signal difference is maximized only in this small 2×2 neighborhood, so larger Bayer matrices add more intermediate steps / values, but don’t contribute much to any visible pattern difference. Therefore most game engines that I have seen used up to 4×4 Bayer pattern with 16 distinctive values.

If you plot a periodogram (frequency spectrum) of it, you will clearly see only 2 single, very high frequency dots!

2dditheringbayerperiodogram.png

2D periodogram – low frequencies are in the middle and high frequencies to the sides.

Obviously signal has some other frequencies, but much lower intensities… Plotting it in log scale fixes it:

2dditheringbayerperiodogramlogscale.png

So on one hand, Bayer matrix has lots of high frequency – would seem perfect for dithering. However presence of strong single frequency bands tends to alias it heavily and produce ugly pattern look.

This is our quantized function:

2dditheringquantizebayer.png

If you have been long enough playing with computers to remember 8 bit or 16 bit color modes and palletized images, this will look very familiar – as lots of algorithms used this matrix. It is very cheap to apply (a single look up from an array or even bit-magic ops few ALU instructions) and has optimum high frequency content. At the same time, it produces this very visible unpleasant patterns. They are much worse for sampling and in temporal domain (next 2 parts of this series), but for now let’s have a look at some better sequence.

Interleaved gradient noise

The next sequence that I this is working extremely well for many dithering-like tasks, is “interleaved gradient noise” by Jorge Jimenez.

Formula is extremely simple!

InterleavedGradientNoise[x_, y_] :=
FractionalPart[52.9829189*FractionalPart[0.06711056*x + 0.00583715*y]]

But the results look great, contain lots of high frequency and produce pleasant, interleaved smooth gradients (be sure to check Jorge’s original presentation and his decomposition of “gradients”):

interleavedgradientnoise.png

What is even more impressive is that such pleasant visual pattern was invented by him as a result of optimization of some “common web knowledge” hacky noise hashing functions!

Unsurprisingly, this pattern has periodogram containing frequencies that correspond to those interleaved gradients + their frequency aliasing (result of frac – similar to frequency aliasing of a saw wave):

2dditheringinterleavedgradientnoiseperiodogram.png

And the 3D plot (spikes corresponding to those frequency):

2ddithering3dplotinterleaved.png

Just like with Bayer, those frequencies will be prone to aliasing and “resonating” with frequencies in source image, but almost zero low frequencies given nice, clean smooth look:

2dditheringquantizeinterleavedgradient.png

Some “hatching” patterns are visible, but they are much more gradient-like (like the name of the function) and therefore less distracting.

Blue noise

Finally, we get again to using a blue noise pre-generated pattern. To recall from previous part, blue noise is loosely defined as a noise function with small low frequency component and uniform coverage of different frequencies. I will use here a pattern that again I generated using my naive implementation of “Blue-noise Dithered Sampling” by Iliyan Georgiev and Marcos Fajardo.

So I generated a simple 64×64 wrapping blue noise-like sequence (a couple hours on an old MacBook):

2dbluenoise.png

It has following periodogram / frequency content:

2dditheringblueperiodogram.png

And in 3D (who doesn’t love visualizations?!😀 ):

2ddithering3dplot.png

Compared to white noise, it has a big “hole” in the middle, corresponding to low frequencies.

2dwhitenoisevsbluenoise.gif

White noise vs blue noise in 2D

At the same time, it doesn’t have linear frequency increase for higher frequencies, like audio / academic definition of blue noise. I am not sure if it’s because my implementation optimization (only 7×7 neighborhood is analyzed + not enough iterations) or the original paper, but doesn’t seem to impact the results for our use case in a negative way.

Without further ado, results of dithering using 2D blue noise:
2dditheringquantizeblue.png

It is 64×64 pattern, but it is optimized for wrapping around – so border pixels error metric is computed with taking into account pixels on the other side of the pattern. In this gradient, it is repeated 2×2.

And this is how it compared to regular white noise:

2dwhitenoisevsbluenoise.gif

White noise vs blue noise

Because of high frequency contents only, you can’t see this problematic “clumping” effect.

It also means that if we oversample (like with all those new fancy 1080p -> 1440p -> 2160p displays), blur it or apply temporal (one of next parts), it will be more similar to original pattern! So when we filter them with 2-wide Gaussian:

2dditheringwhitevsbluefiltered.png

Left: Gaussian-filtered white noise dithering. Right: Gaussian-filtered blue noise dithering.

And while I said we will not be looking at triangle noise distribution in this post for simplicity, I couldn’t resist the temptation of comparing them:

2dwhitenoisevsbluenoisetriangle.gif

White noise vs blue noise with triangular distribution remapping applied.

I hope that this at least hints at an observation that with a good, well distributed large enough blue noise fixed pattern you might get results maybe not the same quality level of error diffusion dithering, but in that direction and orders of magnitude better than standard white noise.

All four compared

Just some visual comparison of all four techniques:

2dditheringallfourcompared.png

White noise, blue noise, Bayer, interleaved gradient noise

2dditheringalltechniquescompared.gif

And with the triangular remapping:

2dditheringallfourcomparedtriangular.png

2dditheringallfourcomparedtriangular.gif

 

My personal final recommendations and conclusions and here would be:

  • Whenever possible, avoid ordered Bayer! Many game engines and codebases still use it, but it produces very visible unpleasant patterns. I still see it in some currently shipped games!
  • If you cannot spare any memory look-ups but have some ALU, use excellent interleaved gradient noise by Jorge Jimenez. It produces much more pleasant patterns and is extremely cheap with GPU instruction set! However patterns are still noticeable and it can alias.
  • Blue noise is really great noise distribution for many dithering tasks and if you have time to generate it and memory to store it + bandwidth to fetch, it is the way to go.
  • White noise is useful for comparison / ground truth. With pixel index hashing it’s easy to generate, so it’s useful to keep it around.

Summary

In this part of the series, I looked at the topic of quantization of 2D images for the purpose of storing them at limited bit depth. I analyzed looks and effects of white noise, ordered Bayer pattern matrices, interleaved gradient noise and blue noise.

In next part of the series (coming soon), we will have a look at the topic of dithering in more complicated (but also very common) scenario – uniform sampling. It is slightly different, because often requirements are different. For example if you consider rotations, values of 0 and 2pi will “wrap” and be identical – therefore we should adjust our noise distribution generation error metric for this purpose. Also, for most sampling topics we will need to consider more that 1 value of noise.

Blog post mini-series index.

Edit 10/31/2016: Fixed triangular noise remapping to work in -1 – 1 range instead of -0.5-1.5. Special thanks to Romain Guy and Mikkel Gjoel for pointing out the error.

References

http://loopit.dk/banding_in_games.pdf “Banding in games”, Mikkel Gjoel

https://engineering.purdue.edu/~bouman/ece637/notes/pdf/Halftoning.pdf C. A. Bouman: Digital Image Processing – January 12, 2015

B. E. Bayer, “An optimum method for two-level rendition of continuous-tone pictures”, IEEE Int. Conf. Commun., Vol. 1, pp. 26-1 1-26-15 (1973).

http://www.iryoku.com/next-generation-post-processing-in-call-of-duty-advanced-warfare “Next generation post-processing in Call of Duty: Advanced Warfare”, Jorge Jimenez, Siggraph 2014 “Advances in Real Time Rendering in games”

“Blue-noise Dithered Sampling”, Iliyan Georgiev and Marcos Fajardo

https://github.com/bartwronski/BlueNoiseGenerator 

http://www.reedbeta.com/blog/quick-and-easy-gpu-random-numbers-in-d3d11/ “Quick And Easy GPU Random Numbers In D3D11”, Nathan Reed

Posted in Code / Graphics | Tagged , , , , , , , , , , , , | 1 Comment

Dithering part two – golden ratio sequence, blue noise and highpass-and-remap

In previous part of the mini-series I covered dithering definition and how dithering changes error characteristics of simple 1D quantization and functions.

In this part I will try to look at what blue noise is, but first wanted to have a look at a number sequence that I used in the previous post and I find very useful.

You can find a Mathematica notebook for golden sequence here and its pdf version here.

For the second part of the post you can find the notebook here and its pdf version here.

Golden ratio sequence

In previous post I used “some” quasi-random function / sequence and mentioned that it’s not perfect, but very useful. The sequence is a sequence made of fractional part of next multiplications of a golden number.

FractionalPart[N*GoldenRatio]

I found idea for using it in a paper “Golden Ratio Sequences For Low-Discrepancy Sampling” by Colas Schretter and Leif Kobbelt.

This is incredibly fascinating sequence, as it seems to distribute next values very well and quite far apart:

golden_sequence_lines.gif

The differences between next elements are in modulus 1:

{0.381966, 0.618034, 0.381966, 0.381966, 0.618034, 0.381966, 0.618034, 0.381966}

So oscillating golden number modulo 1 itself and 2 minus golden number modulo 1. Both numbers are distant enough from zero and one to produce well-distributed sequence where next samples add lots of information.

Even for small number of “samples” in the sequence, they cover whole 0-1 range very well:

golden_sequence_comparison.gif

Numbers plotted as colors also look “pleasant”:

golden1d.png

If we look at its periodogram:

GoldenNumberSequencePeriodogram.png

We also find some fascinating properties. First of all, energy seems to increase with frequencies. There are visible “spikes” in some frequencies and what is even more interesting is that every next spike happens at a frequency that is golden ratio times higher and it has golden ratio times more energy! I don’t have any explanation for it… so if you are better than me at maths, please contribute with a comment!

This frequency characteristic is extremely useful, however doesn’t satisfy all of our dithering needs. Why? Imagine that our source signal that we are dithering contains same frequencies. Then we would see extra aliasing in those frequencies. Any structure in noise used for dithering can become visible, and can produce undesired aliasing.

I still find this sequence extremely valuable and use it heavily in e.g. temporal techniques (as well as hemispherical sampling). There are other very useful low-discrepancy sequences that are heavily used in rendering – I will not cover those here, instead reference the physically based rendering “bible” – “Physically Based Rendering, Third Edition: From Theory to Implementation” by Matt Pharr, Wenzel Jakob and Greg Humphreys and chapter 7 that authors were kind enough to provide for free!

For now let’s look at blue noise and theoretically “perfect” dithering sequences.

Blue noise

What is blue noise? Wikipedia defines it as:

Blue noise is also called azure noise. Blue noise’s power density increases 3 dB per octave with increasing frequency (density proportional to f ) over a finite frequency range. In computer graphics, the term “blue noise” is sometimes used more loosely as any noise with minimal low frequency components and no concentrated spikes in energy.

And we will use here this more liberal definition (with no strict definition of frequency distribution density increase).

We immediately see that previous golden ratio sequence is not blue noise, as it has lots of visible spikes in spectrum. Perfect blue noise has no spikes and therefore is not prone to aliasing / amplifying those frequencies.

There are many algorithms for generating blue noise, unfortunately many of them heavily patented. We will have a look at 2 relatively simple techniques that can be used to approximate blue noise.

Generating blue noise – highpass and remap

The first technique we will have a look at comes from Timothy Lottes and his AMD GPUOpen blog post “Fine art of film grain”.

The technique is simple, but brilliant – in step one let’s take a noise with undesired frequency spectrum and just reshape it by applying high pass filter.

Unfortunately, arbitrary high pass filter will produce a signal with very uneven histogram and completely different value range than original noise distribution:

afterHighpass.png

After arbitrary highpass operation of random noise originally in 0-1 range.

This is where part 2 of the algorithm comes in. Remapping histogram to force it to be in 0-1 range! Algorithm is simple – sort all elements by value and then remap the value to position in the list.

Effect is much better:

afterHighpassAndRemap.png

Unfortunately, histogram remapping operation also changes the frequency spectrum. This is inevitable, as histogram remapping changes relative value of elements not linearly. Values in middle of the histogram (corresponding to areas that originally had lost of low frequency component) will be changed much more than values in areas with high frequency content. This way part of high-pass filtering effect is lost:

highpass_periodogram_before_after_remap.png

Comparison of histogram before (red) and after (black) the remap, renormalized manually. Note how some low frequency component reappeared.

Still, effect looks pretty good compared to no high pass filtering:

before_after_lowpass.png

Top – regular random noise. Bottom – with high pass and histogram remapping.

Its frequency spectrum also looks promising:

afterHighpassAndRemapPeriodogram.png

However, there is still this trailing low pass component. It doesn’t contain lots of energy, but still can introduce some visible low pass error in dithered image…

What we can do is to re-apply the technique again!

This is what we get:

highPassRemapTwice.gif

Frequency spectrum definitely looks better and whole algorithm is very cheap so we can apply it as many times as we need.

Unfortunately, no matter how many times we will reapply it, it’s impossible to “fix” all possible problematic spots.

I think about it this way – if some area of the image contains only very low frequency, after applying highpass filter, it will get few adjacent values that are very close to zero. After histogram remapping, they will get remapped to again similar, adjacent values.

HighpassAndRemapUnfixable.png

Small part of a sequence with a local minimum that algorithm repeated even 10 times cannot get out of. Notice few areas of almost uniform gray.

It’s possible that using a different high pass filter or adding some noise between iterations or detecting those problematic areas and “fixing” them would help – but it’s beyond scope of this post and the original technique.

What is worth noting is that original algorithm gives sequence that is not perfect, but often “good enough” – it leaves quite bad local spots, but optimizes frequency spectrum globally .Let’s check it in action.

Results

Let’s have a look at our initial, simple 1D dithering for binary quantization:

HighpassAndRemapDitherComparison.png

Rows 1, 3, 5 – original sine function. Row 2 – dithering with regular noise. Row 4 – dithering with golden ratio sequence. Row 6 – dithering with “highpass and remap” blue-noise-like sequence.

We can see that both golden ratio sequence and our highpass and remap are better than regular noise. However it seems like golden ratio sequence performs better here due to less “clumping”. You can see though some frequency “beating” corresponding to peak frequencies there:

HighpassRemapDitherPeriodograms.gif

Black – white noise. Red – golden ratio sequence. Green – highpass and remap noise sequence.

So this is not a perfect technique, but a) very fast b) tweakable and c) way better than any kind of white noise.

Better? Slower blue noise

Ok, what could we do if we wanted some solution that doesn’t contain those local “clumps”? We can have a look at Siggraph 2016 paper “Blue-noise Dithered Sampling” by Iliyan Georgiev and Marcos Fajardo from Solid Angle.

The algorithm is built around the idea of using probabilistic technique of simulated annealing to globally minimize desired error metric (in this case distance between adjacent elements).

I implemented a simple (not exactly simulated annealing; more like a random walk) and pretty slow version supporting 1, 2 and 3D arrays with wrapping: https://github.com/bartwronski/BlueNoiseGenerator/

As usually with probabilistic global optimization techniques, it can take pretty damn long! I was playing a bit with my naive implementation for 3D arrays and on 3yo MacBook after a night running it converge to at best average quality sequence. However, this post is not about the algorithm itself (which is great and quite simple to implement), but about the dithering and noise.

For the purpose of this post, I generated a 2000 elements, 1D sequence using my implementation.

This is a plot of first 64 elements:

generatedbluepiece.png

Looks pretty good! No clumping, pretty good distribution.

Frequency spectrum also looks very good and like desired blue noise (almost linear energy increase with frequency)!

generatedbluefrequencies.png

If we compare it with frequency spectrum of “highpass and remap”, they are not that different; slightly less very low frequencies and much more of desired very high frequencies:

highpassremapvsgenerated.gif

Highpass and remap (black) vs Solid Angle technique (red).

We can see it compared with all other techniques when applied to 1D signal dithering:

all1dtechniquescomparison.png

Every odd row is “ground truth”. Even rows: white noise, golden ratio sequence, highpass and remap and finally generated sequence blue noise.

It seems to me to be perceptually best and most uniform (on par with golden ratio sequence).

We can have a look at frequency spectrum of error of those:

all1dtechniquescomparedfrequencies.gif

Black – white noise. Red – golden ratio sequence. Green – highpass and remap. Yellow – generated sequence.

If we blur resulting image, it starts to look quite close to original simple sine signal:

comparisonall1dblurred.png

If I was to rate them under this constraints / scenario, I would probably use order from best to worst:

  • Golden ratio sequence,
  • Blue noise generated by Solid Angle technique,
  • Blue noise generated by highpass and remap,
  • White noise.

But while it may seem that golden ratio sequence is “best”, we also got lucky here, as our error didn’t alias/”resonate” with frequencies present in this sequence, so it wouldn’t be necessarily best case for any scenario.

Summary

In this part of blog post mini-series I mentioned blue noise definition, referenced/presented 2 techniques of generating blue noise and one of many general purpose high-frequency low-discrepancy sampling sequences. This was all still in 1D domain, so in the next post we will have a look at how those principles can be applied to dithering of a quantization of 2D signal – like an image.

Blog post mini-series index.

References

https://www.graphics.rwth-aachen.de/media/papers/jgt.pdf “Golden Ratio Sequences For Low-Discrepancy Sampling”, Colas Schretter and Leif Kobbelt

“Physically Based Rendering, Third Edition: From Theory to Implementation”, Matt Pharr, Wenzel Jakob and Greg Humphreys.

https://en.wikipedia.org/wiki/Colors_of_noise#Blue_noise 

http://gpuopen.com/vdr-follow-up-fine-art-of-film-grain/ “Fine art of film grain”, Timothy Lottes

“Blue-noise Dithered Sampling”, Iliyan Georgiev and Marcos Fajardo

Posted in Code / Graphics | Tagged , , , , , , , | 3 Comments

Dithering part one – simple quantization

Introduction

First part of this mini-series will focus on more theoretical side of dithering -some history and applying it for 1D signals and to quantization. I will try to do some frequency analysis of errors of quantization and how dithering helps them. It is mostly theoretical, so if you are interested in more practical applications, be sure to check the index and other parts.

You can find Mathematica notebook to reproduce results here and the pdf version here.

What is dithering?

Dithering can be defined as intentional / deliberate adding of some noise to signal to prevent large-scale / low resolution errors that come from quantization or undersampling.

If you have ever worked with either:

  • Audio signals,
  • 90s palletized image file formats.

You must have for sure encountered dithering options that by adding some noise and small-resolution artifacts “magically” improved quality of audio files or saved images.

However, I found on Wikipedia quite an amazing fact about when dithering was first defined and used:

…[O]ne of the earliest [applications] of dither came in World War II. Airplane bombers used mechanical computers to perform navigation and bomb trajectory calculations. Curiously, these computers (boxes filled with hundreds of gears and cogs) performed more accurately when flying on board the aircraft, and less well on ground. Engineers realized that the vibration from the aircraft reduced the error from sticky moving parts. Instead of moving in short jerks, they moved more continuously. Small vibrating motors were built into the computers, and their vibration was called dither from the Middle English verb “didderen,” meaning “to tremble.” Today, when you tap a mechanical meter to increase its accuracy, you are applying dither, and modern dictionaries define dither as a highly nervous, confused, or agitated state. In minute quantities, dither successfully makes a digitization system a little more analog in the good sense of the word.

— Ken Pohlmann, Principles of Digital Audio
This is inspiring and interesting historical fact and as I understand it that it works by avoiding bias in computations and resonances by randomly breaking up some mechanical vibration feedback loops.
But history aside, let’s look at the dithering process for 1D signals first, like audio.

Dithering quantization of a constant signal

We will start first with analyzing the most boring possible signal – a constant signal. If you know a bit about audio and audio-related DSP, you might ask – but you promised looking at audio and and audio by definition cannot have a constant term! (furthermore, both audio software and hardware deliberately remove so called DC offset)
That’s true and we will have a look at more complicated functions in a second, but first things first.
Imagine that we are doing a 1 bit quantization of a normalized floating point signal. This means we will be dealing with final binary values, 0 or 1.
If our signal is 0.3, simple rounding without any dithering will be the most boring function ever – just zero!
Error is also constant, 0.3 and therefore average is also 0.3. This means that we introduced quite big bias to our signal and completely lost original signal information.
We can try to dither this signal and have a look at results.
Dithering in this case (used rounding function) is applying just plain, random white noise (random value per every element, producing uniform noise spectrum) and adding random value from range (-0.5, 0.5) to the signal prior to quantization.

quantizedDitheredSignal =
  Round[constantSignalValue + RandomReal[] – 0.5] & /@ Range[sampleCount];

Constant_dither_noise.png
Constant_dither_noise_img.png
It’s difficult to see anything here, just that now result of quantization is some random ones and zeros… With (as hopefully expected) more zeros. It’s not terribly interesting signal on its own, but what is quite interesting is the plot of the error and average error.
Constant_dither_error.png
Ok, we can see that as expected, error is also alternating… but what is quite scary is that error got sometimes bigger (0.7 absolute value)! So our maximum error is worse, pretty unfortunate… however, average noise is:

Mean[ditheredSignalError]
0.013

Much much smaller than original error of 0.3. With sufficiently large amount of samples this error would go towards zero (limit). So error for constant term got much smaller, but let’s have a look at frequency plot of all errors.
spectrum_quantization_noise_comparison.gif

Red plot/spike = frequency spectrum of error when not using dithering (constant, no frequencies). Black – with white noise dithering.

Things are getting more interesting! This shows first major takeaway of this post – dithering distributes quantization error / bias among many frequencies.
We will have a look in the next section how it helps us.

Frequency sensitivity and low-pass filtering

So far we observed that dithering a quantized constant signal:
  • Increased maximal error.
  • Almost zeroed average, mean error.
  • Added constant white noise (full spectral coverage) to the error frequency spectrum, reducing the low-frequency error.
By itself it doesn’t help us too much… However, we are not looking at quantization of any arbitrary mathematical function / signal. We are looking here at signals that will be perceived by humans. Human perception is obviously limited, some examples:
  • Our vision has a limit of acuity. Lots of people are short-sighted and see blurred image of faraway objects without correction glasses.
  • We perceive medium scale of detail much better than very high or very low frequencies (small details of very smooth gradients may be not noticed).
  • Our hearing works in specific frequency range (20Hz -20kHz, but it gets worse with age) and we are most sensitive to middle ranges – 2kHz-5kHz.

Therefore, any error in frequencies closer to upper range of perceived frequency will be much less visible.

Furthermore, our media devices are getting better and better and provide lots of oversampling. In TVs and monitors we have “retina”-style and 4K displays (where it’s impossible to see single pixels), in audio we use at least 44kHz sampling file formats even for cheap speakers that often can’t reproduce more than 5-10kHz.

This means, that we can approximate perceptual look of a signal by low-pass filtering it. Here I did a low pass filtering (padding with zeros on the left -> “ramp up”):
Constant_dither_noise_lowpass.png

Red – desired non-quantized signal. Green – quantized and dithered signal. Blue – low pass filter of that signal.

Signal starts to look much closer to original, unquantized function!
Unfortunately we started to see some low frequencies that are very visible and were not present in the original signal. We will look at fixing it by using blue noise in part 3 of the series, but for now this is how it could look like with some quasi-noise function that has much less lower frequency content:
Constant_dither_noise_golden_lowpass.png
This is possible because our quasi-random sequence has following frequency spectrum:
Constant_dither_noise_golden_spectrum.png
But for now enough looking at simplistic, constant function. Let’s have a look at a sine wave (and if you know Fourier theorem – a building block of any periodic signal!).

Quantizing a sine wave

If we quantize a sine wave with 1 bit quantization, we get a simple… square wave.
sine_quantize.png
Square wave is quite interesting, as it comprises base frequency as well as odd harmonics.
It is interesting property that is used heavily in analog subtractive synthesizers to get hollow/brassy sounding instruments. Subtractive synthesis starts with a complex, harmonically rich sound and filters it by removing some frequencies (with filter parameters varying over time) to shape sounds in desired way.
Square wave frequency spectrum:
sine_quantize_spectrum.png
But in this post we are more interested in quantization errors! Let’s plot the error as well as frequency spectrum of the error:
sine_quantize_error.png
sine_quantize_error_spectrum.png
In this case we are in much better situation – average error is close to zero! Unfortunately, we still have lots of undesired low frequencies, very close to our base frequency (odd multiplies with decreasing magnitudes). This is known as aliasing or dithering noise – frequencies that were not present in the original signal appear and they have pretty large magnitudes.
Even low-pass filtering cannot help this signal much… As error has so many low frequencies:
sine_quantize_lowpass.png

Low-pass filtered quantized sine

sine_quantize_error_lowpass.png

Low-pass filtered quantized sine error

Let’s have a look at how this changes with dithering. At first sight things don’t improve a lot:
sine_quantize_dither.png
However if we display it as an image, it starts to look better:
sine_quantize_dither_img.png
And notice how again, quantization error gets distributed among different frequencies:
spectrum_quantization_noise_comparison_sine.gif
This looks very promising! Especially considering that we can try to filter it now:
sine_quantize_dither_lowpass.png
That’s slightly crooked sine, but looks much closer to original one than the non-dithered one with the exception of a phase shift introduced by asymmetrical filter (I am not going to describe or cover it here; it is fixable simply by applying symmetrical filters):
sine_quantize_dither_lowpass_comparison.png

Red – original sine. Green – low pass filtered undithered signal. Blue – low pass filtered dithered signal.

Plotting both error functions confirms numerically that error is much smaller:

sine_quantize_dither_lowpass_error.png

Red – error of low-pass filtered non-dithered signal. Blue – error of low-pass filterer dithered signal.

Finally, let’s just quickly look at a signal with better dithering function containing primarily high frequencies:

sine_quantize_dither_vs_golden_img.png

Upper image – white noise function. Lower image – a function containing more higher frequencies.

sine_quantize_dither_golden_lowpass_comparison.png

Low-pass filtered version dithered with a better function – almost perfect results if we don’t count filter phase shift!

And finally – all 3 comparisons of error spectra:

spectrum_quantization_noise_golden__comparison_sine.gif

Red – undithered quantized error spectrum. Black – white noise dithered quantized error spectrum. Blue – noise with higher frequencies ditherer error spectrum.

Summary

This is end of part one. Main takeaways are:
  • Dithering distributes quantization error / bias among many different frequencies that depend on the dithering function instead of having them focused in lower frequency area.
  • Human perception of any signal (sound, vision) works best in very specific frequency ranges. Signals are often over-sampled for end of perception spectrum in which perception is almost marginal. For example common audio sampling rates allow reproducing signals that most adults will not be able to hear at all. This makes use of dithering and trying to shift error into this frequency range so attractive because of previous point.
  • Different noise functions produce different spectra of error that can be used knowing which error spectrum is more desired.

In the next part we will have a look at various dithering functions – the one I used here (golden ratio sequence) and blue noise.

Blog post mini-series index.

Posted in Code / Graphics | Tagged , , , , , | 2 Comments

Dithering in games – mini series

This an opening post of mini blog post series about various uses of dithering for quantization and sampling in video games. It is something most of us use intuitively in every day work, so wanted to write down some of those concepts, explain and analyze them in Mathematica.

This post is just a table of contents.

Part one – dithering – definition and simple 1D quantization.

Part two – dithering – golden ratio sequence, white and blue noise for 1D quantization.

Part three – dithering in quantization of 2D images, Bayer matrix, interleaved gradient noise and blue noise.

Part four – dithering for sampling in rendering – disk sampling and rotations. (coming soon, November)

Part five – adding time! Few ideas for improving dithering quality in temporal-supersampling scenarios. (coming soon, December)

You can find some supplemental material for posts here: https://github.com/bartwronski/BlogPostsExtraMaterial/tree/master/DitheringPostSeries

Posted in Code / Graphics | Tagged , , , , , , , | 3 Comments

Short names are short

Intro

This blog post is a counter-argument and response to a post by Bob Nystrom that got very popular few months ago and was often re-shared. I disagree so much that I thought it’s worth sharing my opinion on this topic.
Mine response here is very subjective, and contains opinions about quite a polarizing topic – if you want to read a pure facts / knowledge sharing post, you may as well stop reading it. This post is supposed to be short, fun and a bit provocative.
opinion
Also, I don’t think that single piece of advice “use shorter names” or “use longer names” without any context has any real value, so at the end instead of just ranting I will try to summarize my recommendations.
Having said that and if you want to read some rant, let’s go!

What I agree with

Some personal story here – I used to write terrible, hacked code that now I am ashamed of. Fortunately, working at Ubisoft with colleagues with much higher standards and some discussion with them helped me understand clean code value. Working on one of codebases that they have cleaned up I was amazed how pleasant it can be to work with clear, good quality and well architected code and it made me realize my wrongdoings!
Today I believe that readability and simplicity are the most important features of good code (assuming that it’s correct, performs good enough etc.) to the point that I would always keep clear reference implementation until it’s really necessary to do magic, unclear optimization(s). You write code once, but you and your colleagues read it thousands of times and for years etc.
I definitely agree with Bob Nystrom that names should be:
Clear.
Precise.
However, this is I guess where points that we could agree upon end. I don’t think that short names for variables or functions serve readability at all!

Understandability without context

Imagine that you debug a crash or weird functionality in a piece of code you are not familiar with. You start unwinding a callstack and see a variable named “elements” or looking at author’s example, “strawberries”.
What the hell is that? What are those elements? Is it local, temp array, or a member of debugged class? Wouldn’t tempCollectedStrawberiesToGarnish or m_garnishedStrawberies be more readable?
If you look at a crashdump on some crash aggregation website server and before downloading it and opening in IDE, it gets even more difficult! You will have completely no idea what given code does.
And this is not uncommon scenario – we work in teams and our teammates will debug our code many times. In a way, we are writing our code for them…

Confusion and ambiguity

Second thing – I don’t believe that short names can be precise. Working in the past with some component based game engines and seeing “m_enabled” being checked / set with some extra logic around, I just wanted to face-palm. Fortunately people were not crazy enough to skip the member prefix and just operate on some “enabled” variable in long function – this would lead to even more / extreme confusion!
What does it mean that component is enabled / disabled? I guess that animation component is not updating skeleton / hierarchy (or is it?) and mesh component is probably not rendered, but how can I be sure? Wouldn’t be m_isSkeletonUpdated or m_pauseTick or m_isVisible more readable?
Side note: this point could be an argument against as well general class polymorphism / inheritance and reusing same fields for even slightly different functionalities.

Less context switching

With slightly longer and more verbose names, it is easier to keep all information on one screen and within single mental “context”. The shorter the name, the larger and less obvious context you need to understand its purpose, role, lifetime and type.
If you need to constantly remind yourself of class name or variable type, or even worse need to check some name/type of parent class (again, not a problem if you don’t (ab)use OOP) the less effective and more distracted you are. In long, complex code this context switching can be detrimental to code understanding and make you less focused.

Search-ability

This is probably my biggest problem with short names and a biggest NO. I use grep-like tools all the time and find them better and more robust than any IDE-specific symbol searching. Don’t get me wrong! For example VisualAssistX is an amazing extension and I use it all the time. It’s way faster than IntelliSense, however still can choke on very large code solutions – but this is not the main issue.
The main issue is that every codebase I worked in (and I guess any other serious and large codebase) has many different languages. I work daily with C/C++, HLSL/PSSL, Python, JSON, some form of makefiles and custom data definition languages. To look for some data that can be in either of those places (sometimes in many!) I use good, old “search in files”. I can recommend here a plugin called Entrian Source Search (colleague from Ubisoft, Benjamin Rouveyrol recommended it to me and it completely transformed the way I work!) and it perfectly solves this problem. I can easily look for “*BRDF*” or “SpecularF0” and be sure that I’ll find all HLSL, C++ and data definitions references.
Going back to the main topic – this is where short, ambiguous names completely fail! If you find 1000 references or given name then it could be considered just useless.
Just some examples.
Let’s look for “brilliantly” named variable or function enable – why would anyone need more context?
enable.png
Note that this shows only whole words matching! Hmm, not good… How about “brilliant” short name update?
update.png
Good luck with checking if anyone uses your class “update” function!

Help with refactoring

Related to the previous topic – before starting refactoring, I always rely on code search. It’s really useful to locate it where given variable / function / class is being used, why, can it be removed?
Large codebases are often split into many sub-solutions to be lighter on memory (if you worked in a huge one in Visual Studio you know the pain!) and more useful for daily use / common use scenario. This is where IDE symbol search fails completely and can be an obstacle in any refactoring.
Again – in my personal opinion and experience, using straight grep-like search works much better than any flaky symbol search, and works across many code solutions and languages. Good, uncommon, clear and unique names really help it. I can immediately see when a function or variable is not used anywhere, who uses it, which parts of the pipeline need information about it. So basically – all necessary steps in planning a good refactor!

Risks with refactoring

This point is a mix of previous one and section about confusion / ambiguity. It is a) very easy to misunderstand code with common names b) overload the original term, since it’s short and seems kind-of still accurate. This leads often to even more meaningless and confusing terms.
If you have a longer, more verbose name then you will think twice before changing its type or application – and hopefully rename it and/or reconsider/remove all prior uses.

Self-documenting code

I believe in use of comments do document “non-obvious” parts of an algorithm or some assumptions (much better than offline documents that get sent in emails and lost or are put at some always-outdated wiki pages), but hate code constantly interleaved with short comments – for me because of context switching it increases cognitive load. Code should be self-documenting to some extent and I think it’s possible – as long as you don’t try to remove whole context information from variable names.

When using short names is ok?

Ok, I listed hopefully enough arguments against using short, context-less names – but I sometimes use them myself!
As I said at the beginning – it all depends and “single advice fits all” attitude is usually piece of crap.
My guidelines would be – use rather long, unique and memorable names, giving enough information about the context, unless:
  1. It’s something like iterator in a short loop. I use C-style names like i, j, k quite often and don’t see it as a major problem provided that the loop is really short and simple.
  2. In general, it’s a local variable in a short function with clear use. So think about it this way – if someone else “lands” there accidentally with a debugger, would they require understanding of the system to figure out its purpose?
  3. If they are class or function names, only if they are guaranteed to be local in the scope of a file and not accessible from the outside. If you change such function from static to global, make sure you change the name as well!
  4. It is a part of convention that all programmers in the codebase agreed upon. For example that your physics simulation classes will start with a “P”, not “Physics”.
  5. You use POD types like structs and never write functions inside them – it’s fine to have their names short as you know they relate to the struct type.
  6. Similar to 5 – you use (not abuse!) namespaces and/or static class functions to provide this extra scope and information and always access it with a prefix (no “using” so search-ability is not impacted by it).

Summary

Rant mode over!🙂 I hope that this provocative post shown some disadvantages of over-simplified pieces of advice like “long names are long” and some benefits of slightly increased verbosity.
Posted in Code / Graphics | Tagged , , , , , | 1 Comment

Image dynamic range

Intro

This post is a second part of my mini-series about dynamic range in games. In this part I would like to talk a bit about dynamic range, contrast/gamma and viewing conditions.

You can find the other post in the series here and it’s about challenges that lay ahead of properly exposing a scene with too large luminance dynamic range – I recommend checking it out!🙂

This post is accompanied by Mathematica Notebook so you can explore those topics yourself.

.nb Matematica version

PDF version of the notebook

Dynamic range

So, what is dynamic range? It is most simply a ratio between highest possible value a medium can reproduce / represent and lowest one. It is measured usually in literally “contrast ratio” (a proportion like for example 1500:1), decibels (difference in logarithm in base of 10) or “stops” (or simply… bits; as it is difference in logarithms in base of 2). In analog media, it is represented by “signal to noise ratio”, so lowest values are the ones that cannot be distinguished from noise. In a way, it is analogous to digital media, where under lowest representable value is only quantization noise.

This measure is not very useful on its own and without more information about representation itself. Dynamic range will have different implications on analog film (where there is lots of precision in bright and mid parts of the image and dark parts will quickly show grain), digital sensors (total clipping of whites and lots of information compressed in shadows – but mixed with unpleasant, digital noise) and in digital files (zero noise other than quantization artifacts / banding).

I will focus in this post purely on dynamic range of the scene displayed on the screen. Therefore, I will use EV stops = exposure value stops, base2 logarithm (bits!).

Display dynamic range

It would be easy to assume that if we output images in 8 bits, dynamic range of displayed image would be 8 EV stops. This is obviously not true, as information stored there is always (in any modern pipeline and OS) treated with an opto-electrical transfer function – OETF prior to 8bit quantization/encoding that is given for a display medium. Typically used OETF is some gamma operator, for example gamma 2.2 and sRGB (not the same, but more about it later).

Before going there, let’s find a way of analyzing it and displaying dynamic range – first for the simplest case, no OETF, just plain 8bit storage of 8bit values.

Since 0 is “no signal”, I will use 1/256 as lowest representable signal (anything lower than that gets lost in quantization noise) and 1 as highest representable signal. If you can think of a better way – let me know in comments. To help me with that process, I created a Mathematica notebook.

Here is output showing numerical analysis of dynamic range:

dynrange_linear.png

Dynamic range of linear encoding in 8bits. Red line = dynamic range. Blue ticks = output values. Green ticks = stops of exposure. Dots = mappings of input EV stops to output linear values.

Representation of EV stops in linear space is obviously exponential in nature and you can immediately see a problem with such naive encoding – many lower stops of exposure get “squished” in darks, while last 2 stops cover 3/4 of the range! Such linear encoding is extremely wasteful for logarithmic in nature signal and would result in quite large, unpleasant banding after quantization. Such “linear” values are unfortunately not perceptually linear. This is where gamma comes in, but before we proceed to analyze how it affects the dynamic range, let’s gave a look at some operations in EV / logarithmic space.

Exposure in EV space

First operation in EV space is trivial, it’s adjusting the exposure. Quite obviously, addition in logarithmic space is multiplication in linear space.

exp2(x+y) == exp2(x)*exp2(y)

Exposure operation does not modify dynamic range! It just shifts it in the EV scale. Let’s have a look:

exposure.gif

Blue lines scale linearly, while green lines just shift – as expected.

underexposed_linear

Underexposed image – dynamic range covering too high values, shifted too far right.

Gamma in EV space

This is where things start to get interesting as most people don’t have intuition about logarithmic spaces – at least I didn’t have. Gamma is usually defined as a simple power function in linear space. What is interesting though is what happens when you try to convert it to EV space:

gamma(x,y) = pow(x, y)

log2(gamma(exp2(x),y)) == log2(exp2(x*y))

log2(exp2(x*y)) == x*y

Gamma operation becomes simple multiplication! This is a property that is actually used by GPUs to express a power operation through a series of exp2, madd and log2. However, if we stay in EV space, we only need the multiply part of this operation. Multiplication as a linear transform obviously preserves space linearity, just stretching it. Therefore gamma operation is essentially dynamic range compression / expansion operation!

gamma.gif

You can see what it’s doing to original 8 bit dynamic range of 8 stops – it multiplies it by reverse of the gamma exponent.

Contrast

Gamma is a very useful operator, but when it increases dynamic range, it makes all values brighter than before; when decreasing dynamic range, it makes all of them darker. This comes from anchoring at zero (as 0 multiplied by any value will stay zero), which translates to “maximum” one in linear space. It is usually not desired property – when we talk about contrast, we want to make image more or less contrasty without adjusting overall brightness. We can compensate for it though! Just pick another, different anchor value – for example mid grey value.

Contrast is therefore quite similar to gamma – but we usually want to keep some other point fixed when dynamic range gets squished instead of 1 – for example linear middle gray 0.18.

contrast(x, y, midPoint) = pow(x,y) * midPoint / pow(midPoint, y)

contrast.gif

Increasing contrast reduces represented dynamic range while “anchoring” some grey point so it’s value stays unchanged.

How it all applies to local exposure

Hopefully, with this knowledge it’s clear why localized exposure works better to preserve local contrast, saturation and “punchiness” than either contrast or gamma operator – it moves parts of histogram (ones that were too dark / bright), but:

  1. Does it only to the parts of the image / histogram that artist wanted, not affecting others like e.g. existing midtones.
  2. Only shifts values (though movement range can vary!), instead of squishing / rescaling everything like gamma would do.

This property and contrast preservation is highly desired and preserves other properties as well – like saturation and “punchiness”.

gamma_vs_both

Comparison of gamma vs localized adaptive exposure – notice saturation and contrast difference.

EOTF vs OETF

Before talking about output functions, it’s important to distinguish between a function that’s used by display devices to transfer from electric signals (digital or analog), called EOTF – Electro Optical Transfer Function and the inverse of it, OETF, Opto-Electrical Transfer Function.

EOTF came a long way – from CRT displays and their power response to voltage that was seen as perceptually uniform in some viewing conditions.

crt_gamma.gif

CRT response to voltage

While early rendering pipelines completely ignored gamma and happily used its perceptual linearity property and did output values directly to be raised to a gamma power by monitor, when starting to get more “serious” about things like working in linear spaces, we started to use inverse of this function, called OETF and this is what we use to encode signal for display in e.g. sRGB or Rec709 (and many more other curves like PQ).

While we don’t use CRTs anymore, such power encoding was proven to be very useful also on modern displays as it allows to encode more signal and this is why we still use them. There is even standard called BT1886 (it specifies only EOTF) that is designed to emulate gamma power look of old CRTs!

Monitor OETF gamma 2.2 encoding

Equipped with that knowledge, we can get back to monitor 2.2 gamma. As EOTF is 2.2, OETF will be a gamma function of 1/2.2, ~0.45 and hopefully it’s now obvious that encoding from linear space to gamma space, we will have a dynamic range 2.2 * 8 == 17.6.

dynrange_gamma_2_2.png

Now someone would ask – ok, hold on, I thought that monitors use a sRGB function, that is combination of a small, linear part and an exponent of 2.4!

Why I didn’t use precise sRGB? Because this function was designed to deliver average gamma 2.2, but have added linear part in smallest range to avoid numerical precision issues. So in this post instead of sRGB, I will be using gamma 2.2 to make things simpler.

What is worth noting here is that just gamma 2.2 of linear value encoded in 8 bits has huge dynamic range! I mean, potential to encode over 17 stops of exposure / linear values – should be enough? Especially that their distribution also gets better – note how more uniform is coverage of y axis (blue “ticks”) in the upper part. It still is not perfect, as it packs many more EV stops into shadows and puts most of the encoding range to upper values (instead of midtones), but we will fix it in a second.

Does this mean that OETF of gamma 2.2 allow TVs and monitors to display those 17.6 stops of exposure and we don’t need any HDR displays nor localized tonemapping? Unfortunately not. Just think about following question – how much information is really there between 1/256 and 2/256? How much those 2.2 EV stops mean to the observer? Quite possibly they won’t be even noticed! This is because displays have their own dynamic range (often expressed as contrast ratio) and depend on viewing conditions. Presentation by Timothy Lottes explained it very well.

But before I proceed with further analysis of how tonemapping curves change it and what gamma really does, I performed a small experiment with viewing conditions (let you be the judge if results are interesting and worth anything).

Viewing conditions

I wanted to do small experiment – compare an iPhone photo of the original image in my post about localized tonemapping (so the one with no localized tonemapping/exposure/contrast) on my laptop in perfect viewing conditions (isolated room with curtains):

2016-08-21 13.31.05.jpg

Poor dynamic range of the phone doesn’t really capture it, but it looked pretty good – with all the details preserved and lots of information in shadows. Interesting, HDR scene and shadows that look really dark, but full of detail and bright, punchy highlights. In such perfect viewing conditions, I really didn’t need any adjustments and the image and display looked “HDR” by themselves!

Second photo is in average/typical viewing conditions:

2016-08-21 13.29.59.jpg

Whole image was still clear and readable, though what could be not visible here is that darks lost details; Image looks more contrasty (Bartleson-Breneman effect) and Kratos back was unreadable anymore.

Finally, same image taken outside (bright Californian sun around noon, but laptop maxed out on brightness!):

2016-08-21 13.30.31.jpg

Yeah, almost invisible. This is with brightness adjustments (close to perception of very bright outdoor scene):

2016-08-21 13.30.31_c.jpg

Not only image is barely visible and lost all the details, but also is overpowered with reflections of surroundings. Almost no gamma settings would be able to help such dynamic range and brightness.

I am not sure whether this artificial experiment shows it clearly enough, but depending on the viewing conditions image can look great and clear or barely visible at all.

My point is to emphasize what Timothy Lottes shown in his presentation – viewing conditions define the perceived dynamic range.

I tried to model it numerically – imagine that due to bright room and sunlight, you cannot distinguish anything below encoded 0.25. This is how dynamic range of such scene would look like:

poor_viewing_conditions.png

Not good at all, only 4.4 EV stops! And this is one of reasons why we might need some localized tonemapping.

Viewing conditions of EOTF / OETF

Viewing conditions and perception is also the reason for various gamma EOTF curves and their inverse. Rec709 EOTF transfer curve is different than sRGB transfer curve (average gamma of 2.4 vs 2.2) because of that – darker viewing conditions of HDTV require different contrast and reproduction.

Due to mentioned Bartleson-Breneman effect (we perceive more contrast, less dynamic range as surroundings get brighter) living room at night requires different EOTF than one for view conditions web content in standard office space (sRGB). Rec709 EOTF gamma will mean more contrast produced by the output device.

Therefore, using inverse of that function, OETF of Rec709 you can store 2.4*8 == 19.2 stops of exposure and the TV is supposed to display them and conditions are supposed to be good enough. This is obviously not always the case and if you tried to play console games in sunlit living room you know what I mean.

gamma_2_2vs2_4.gif

Gammas 2.2 and 2.4 aren’t only ones used – before standardizing sRGB and Rec709 different software and hardware manufacturers were using different values! You can find historical references to Apple Mac gammas 1.8 or 2.0 or crazy Gameboy extreme gammas of 3.0 or 4.0 (I don’t really understand this choice for Gameboys that are handheld and supposed to work in various conditions – if you know, let me and readers know in comments!).

“Adjust gamma until logo is barely visible”

Varying conditions are main reason for what you see in most games (especially on consoles!):

logo.png

Sometimes it has no name and is called just a “slider”, sometimes gamma, sometimes contrast / brightness (incorrectly!), but it’s essentially way of correcting for imperfect viewing conditions. This is not the same as display gamma! It is an extra gamma that is used to reduce (or increase) the contrast for the viewing conditions.

It is about packing more EV stops into upper and midrange of the histogram (scaling and shifting right), more than expected by display before it turns it into linear values. So a contrast reduction / dynamic range compression operation as in brighter viewing conditions, viewer will perceive more contrast anyway.

It often is “good enough” but what in my opinion games usually do poorly is:

  1. Often user sets it only once, but plays the game during different conditions! Set once, but then play same gam during day/night/night with lights on.
  2. Purpose of it is not communicated well. I don’t expect full lecture on the topic, but at least explanation that it depends on brightness of the room where content is going to be viewed.

Let’s take our original “poor viewing conditions” graph and apply first 0.7 gamma (note: this additional gamma has nothing to do with the display EOTF! This is our “rendering intent” and knowledge about viewing conditions that are not part of the standard) on the linear signal before applying output transform, so:

(x^0.7)^(1/2.2) == x^(0.7/2.2)

poor_viewing_conditions.gif

A little bit better, ~1.42 more stops.🙂 However, in good viewing conditions (like player of your game launching it at night), this will make image look less contrasty and much worse…

Tonemapping operators

From my analysis so far it doesn’t seem immediately obvious why we need some tonemapping curves / operators – after all we get relatively large dynamic range just with straight linear encoding in proper viewing conditions, right?

There are two problems:

  1. First one is non-uniform distribution of data, biased towards last visible EV stops (values close to white) occupying most of the encoded range. We would probably want to have store most EV stops around middle values and larger range and have upper and lower range squishing many EV stops with smaller precision. So a non-uniform dynamic range scaling with most of precision bit range used for midtones.
  2. Fixed cutoff value of 1.o defining such white point. It is highly impractical with HDR rendering, where while after exposure we can have most of values in midrange, we can also have lots of (sometimes extreme!) dynamic range in bright areas. This is true especially with physically based rendering and very glossy speculars or tiny emissive objects with extreme emitted radiance.

Methods presented so far don’t offer any solution for those very bright objects – we don’t want to reduce contrast or shift the exposure and we don’t want them to completely lose their saturation; we still want to perceive some details there.

proper_exp_linear

Correctly exposed image, but with linear tonemapping curve shows the problem – completely clipped brights!

This is where tonemapping curves come to play and show their usefulness. I will show some examples using good, old, Reinhard curve – mostly for its simplicity. By no means I recommend this curve, there are better alternatives!

There is a curve that probably every engine has implemented as a reference – Hable Uncharted tonemapping curve.

Another alternative is adapting either full ACES workflow – or cheaper (but less accurate; ACES RRT and ODT are not just curves and have some e.g. saturation preserving components!) approximating it like Kris Narkowicz proposed.

Finally, option that I personally find very convenient is nicely designed generic filmic tonemapping operator from Timothy Lottes.

I won’t focus here on them – instead try good, old Reinhard in simplest form – just for simplicity. It’s defined as simply:

tonemap(x) = x/(1+x)

It’s a curve that never reaches white point, so usually a rescaling correction factor is used (as division by tonemap(whitePoint) ).

Let’s have a look at Reinhard dynamic range and its distribution:

reinhard.png

Effect of Reinhard tonemapping with white point defined as 128

This looks great – we not only gained over 7 stops of exposure, have them tightly packed in brights without affecting midtones and darks too much, but also distribution of EV stops in final encoding looks almost perfect!

It also allows to distinguish (to some extent) details in objects that are way brighter than original value of EV 0 (after exposure) – like emissives, glowing particles, fire, bright specular reflections. Anything that is not “hotter” than 128 will get some distinction and representation.

linear_vs_filmic

Comparison of linear and filmic operators. Here filmic tonemapping operator is not Reinhard, so doesn’t wash out shadows, but the effect on highlights is same.

As I mentioned, I don’t recommend this simplest Reinhard for anything other than firefly fighting in post effects or general variance reduction. There are better solutions (already mentioned) and you want to do many more things with your tonemapping – add some perceptual correction for darks contrast/saturation (“toe” of filmic curves), some minor channel crosstalk to prevent total washout of brights etc.

Summary

I hope that this post helped a bit with understanding of the dynamic range, exposure and EV logarithmic space, gamma functions and tonemapping. I have shown some graphs and numbers to help not only “visualize” impact of those operations, but also see how they apply numerically.

Perfect understanding of those concepts is in my opinion absolutely crucial for modern rendering pipelines and any future experiments with HDR displays.

If you want to play with this “dynamic range simulator” yourself, check the referenced Mathematica notebook!

.nb Matematica version

PDF version of the notebook

Edit: I would like to thank Jorge Jimenez for a follow up discussion that allowed me to make some aspects of my post regarding EOTF / OETF hopefully slightly clearer and more understandable.

References

http://renderwonk.com/publications/s2010-color-course/ SIGGRAPH 2010 Course: Color Enhancement and Rendering in Film and Game Production

http://gpuopen.com/gdc16-wrapup-presentations/  “Advanced Techniques and Optimization of HDR Color Pipelines”, Timothy Lottes.

http://filmicgames.com/archives/75 John Hable “Filmic Tonemapping Operators”

https://github.com/ampas/aces-dev Academy Color Encoding Standard

https://knarkowicz.wordpress.com/2016/01/06/aces-filmic-tone-mapping-curve/ ACES filmic tonemapping curve

http://graphicrants.blogspot.com/2013/12/tone-mapping.html Brian Karis “Tone-mapping”

 

Posted in Code / Graphics | Tagged , , , , , , , , | 11 Comments

Localized tonemapping – is global exposure and global tonemapping operator enough for video games?

In this blog post, I wanted to address something that I was thinking about for many years – since starting working on rendering in video games and with HDR workflows and having experience with various photographic techniques. For the title of the article, I can immediately give you an answer – “it depends on the content”, and you could stop reading it.🙂

However, if you are interested in what is localized tonemapping and why it’s an useful tool, I’ll try to demonstrate some related concepts using personal photographs and photographic experiments as well as screenshots from Sony Santa Monica’s upcoming new God of War and the demo we have shown this year at E3.

Note: all screenshots have some extra effects like bloom or film grain turned off – not to confuse and to simplify reasoning about them. They are also from an early milestone, so are not representative of final game quality.

Note 2: This is a mini-series. A second blog post accompanies this one. It covers some notes about dynamic range, gamma operations, tonemapping operators, numerical analysis and notes about viewing conditions.

Global exposure and tonemapping

Before even talking about tonemapping and the need of localized exposure, I will start with simple, global exposure setting.

Let’s start with a difficult scene that has some autoexposure applied:

underexposed_linear.png

For whole demo for lighting we used reference physical radiance values for natural light sources (sky, sun) and because of the scene taking place in a cave, you can already see that it has pretty big dynamic range.

I won’t describe here all the details of the used autoexposure system (pretty standard; though I might describe it in future blog post or in some presentation), but due to its center weighted nature it slightly underexposed lots of details in the shade at the bottom of the screen – but outdoors are perfectly exposed. In most viewing conditions (more about it in a second post) shadows completely lost all details, you don’t understand the scene and main hero character is indistinguishable…

Let’s have a look at the histogram for the reference:

hist_underexposed.png

Interesting observation here is that even though big area of the image is clearly underexposed, there is still some clipping in whites!

Let’s quickly fix it with an artist-exposed exposure bias:

proper_exp_linear.png

And have a look at the histogram:

hist_exp_linear.png

Hmm, details are preserved slightly better in shadows (though they still look very dark – as intended) and the histogram is more uniform, but whole outdoor area is completely blown out.

I have a confession to make now – for the purpose of demonstration I cheated here a bit and used linear tonemapping curve.🙂 This is definitely not something you would like to do, and for as why there are excellent presentations on the topic. I want to point to two specific ones:

First one is a presentation from a 2010 Siggraph course organized by Naty Hoffman – “From Scene to Screen” by Josh Pines. It describes how filmic curves were constructed and what was the reasoning behind their shape and in general – why we need them.

Second one is from GDC 2016 by Timothy Lottes “Advanced Techniques and Optimization of HDR Color Pipelines” and covers various topics related to displaying High Dynamic Range images.

So, this is the scene with correct, filmic tonemapping curve and some adjustable cross-talk:

proper_exp_filmic.png

And the histogram:

hist_exp_filmic.png

Much better! Quite uniform histogram and no blown whites (in game bloom that is additive would make whites white).

I made a gif of the difference before/after:linear_vs_filmic.gif

And the animation of all three histograms:

hist_anim.gif

Much better! Few extra stops of dynamic range, lots of details and saturation preserved in bright areas. However, histogram still contains many completely black pixels, there is a large spike in lower 10th percentile and depending on the viewing conditions, scene might still be too dark…

Gamma / contrast settings

Why not just reduce the dynamic range of the image? Raising linear pixel values to some power and rescaling are equivalent of logarithmic EV space scaling and shifting and can give result that looks like that:

contrast_gamma.png

hist_low_contrast.png

Definitely all details are visible, but does the scene look well? In my opinion not; everything is milky, washed out, boring and lacks “punch”. Furthermore, both environments and characters started to look chalky. Due to perceptual character of per-channel gamma adjustments, not only contrast, but also saturation is lost. Even histogram shows the problem – almost no whites nor blacks in the scene, everything packed into grays!

Check this gif comparison:

filmic_vs_gamma.gif

Does this problem happen in real life / photography?

Yes, obviously it does! Here is a photo from Catalina Island – you can notice on the left how camera took an underexposed picture with extremely dark shadows with zero detail – and on the right how I corrected it to feel more perceptually correct and similar to how I saw the scene.

real_photo_comp.png

So can we do anything about it? Yes, we can! And I have teased with the photograph above what is the solution.

But first – some question that probably immediately pop up in a discussion with lighting artists.

Why not just change the lighting like they do in movies?

The answer often is – yes, if you can do it right, just adjust lighting in the scene to make dynamic range less problematic. We interviewed recently many lighters coming from animation and VFX and usually as a solution for large scene dynamic range they mention just adding more custom lights and eyeballing it until it looks “good”.

Obviously by using the word right in my answer, I kind of dodged the question. Let me first explain what it means in real-life photography and cinematography with some random youtube tutorials I have found on the topic. It means:

Diffusors

Pieces of some material that partially transmits light, but completely diffuses the directionality and source position of lighting. Equivalent of placing large area light in rendered scene and masking out original light.

Reflectors

Simply  a surface that bounces lighting and boosts lighting on shadowed surfaces and ones not facing the light direction.

In a game scene you could either place white material that would produce more GI / bounced lighting, or simply in a cinematic place a soft (area) fill light.

Additional, artificial light sources (fill flash, set lights etc.)

This is pretty obvious / straightforward. Movie sets and games with cinematic lighting have tons of them.

So as you can see – all those techniques apply to games! With constrained camera, camera cuts, cinematics etc. you can put in there fill lights and area lights and most games (with enough budget) do that all the time.

This is also what is possible in VFX and animation – manually adding more lights, fixing and balancing stuff in composition and just throwing more people at the problem…

Note: I think this is good, acceptable solution and in most cases also the easiest! This is where my answer “it depends” for the question in the title of this article comes from. If you can always add many artificial, off-screen light sources safely and your game content allows for it and you have great lighters – then you don’t need to worry.

Is it always possible in games and can get us good results?

No. If you have a game with no camera cuts or simply need perfect lighting in conditions with 100% free camera, you cannot put invisible light source in the scene and hope it will look well.

On the other hand, one could argue that there is often place for grounded light sources (torch, fire, bulbs… name it) and sure; however as often they can make no sense in the context and level design design of the scene, or you might have no performance budget for them.

Hacking Global Illumination – just don’t!

Some artists would desperately suggest “let’s boost the GI!” “let’s boost bounced lighting!” “let’s boost just the ambient/sky lighting intensity!” and sometimes you could get away with that on previous generation of consoles and non-PBR workflows, but in PBR pipeline, it’s almost impossible not to destroy your scene this way. Why? Large family of problems it creates is balance of specular and diffuse lighting.

If you hack just diffuse lighting component (by boosting the diffuse GI), your objects will look chalky or plastic and your metals will be too dark; other objects sometimes glow in the dark. If you boost also indirect speculars, suddenly under some conditions objects will look oily and/or metallic; your mirror-like glossy surfaces will look weird and lose sense of grounding.

Finally, this is not only GI specific, but applies to any hacks in PBR workflow – when you start hacking sun/sky/GI intensities, you lose ability to quickly reason about material responses and the lighting itself and debugging them – as you can’t trust what you see and many factors can be the source of the problem.

How photography deals with the problem of too large dynamic range when operating with natural light?

This is a very interesting question and my main source of inspiration to the solution of this problem. Especially in the film / analog era, photographers had to know a lot about dynamic range, contrast, various tonemapping curves. Technique and process were highly interleaved with artistic side of photography.

One of (grand)fathers of photography, Ansel Adams created so-called Zone system.

https://en.wikipedia.org/wiki/Zone_System

http://photography.tutsplus.com/tutorials/understanding-using-ansel-adams-zone-system–photo-5607

https://luminous-landscape.com/zone-system/

I won’t describe it in detail here, but it is very similar to many principles that we are used to – mapping middle gray, finding scene dynamic range, mapping it to media dynamic range etc.

Fascinating part of it is the chemical/process part of it:

Picking correct film stock (different films have different sensitivity and tonemapping curve shape), correct developer chemical, diluting it (my favourite developer, Rodinal can deliver totally different contrast ratios and film acuity/sharpness depending on the dilution), adjusting development time or even frequency of mixing the developed film (yes! 1 rotation of development tank a minute can produce different results than 1 rotation every 30 seconds!).

wilno1

Photo captured on Fuji Neopan developed in diluted Agfa Rodinal – absolutely beautiful tonality and low contrast, high acuity film.

Manual localized exposure adjustment

This all is interesting, but also in global tonemapping / per-image, global process domain. What photographers had to do to adjust exposure and contrast locally, was tedious process called dodging and burning.

https://en.wikipedia.org/wiki/Dodging_and_burning

It meant literally filtering or boosting light during print development. As film negatives had very large dynamic range, it made it possible to not only adjust exposure/brightness, but recover lots of details in otherwise overblown / too dark areas.

An easy alternative that works just great for landscape photography is using graduated filters:

https://en.wikipedia.org/wiki/Graduated_neutral-density_filter

Or even more easily, by using polarizer (darkens and saturates the sky and can cancel out specular light / reflections on e.g. water).

https://en.wikipedia.org/wiki/Polarizing_filter_(photography)

Fortunately in digital era, we can do it much easier with localized adjustment brushes! This is not very interesting process, but it’s extremely simple in software like Adobe Lightroom. Some (stupid) example of manually boosting exposure in shadows:

localized_adjustment.gif

As localized adjustment brush with exposure is only exposure addition / linear space multiplication (more about it in second post in the series!), it doesn’t affect contrast in modified neighborhood.

It is worth noting here that such adjustment would be probably impossible (or lead to extreme banding / noise) with plain LDR bmp/jpeg images. Fortunately, Adobe Lightroom and Adobe Camera Raw (just like many other similar deticated RAW processing format) operate on RAW files that are able to capture 12-16 exposure stops of dynamic range with proper detail! Think of them as of HDR files (like EXR), just stored in a compressed format and containing data that is specific to the input device transform.

This is not topic of this post, but I think it’s worth mentioning that on God of War we implemented similar possibility for lighting artists – in form of 3D shapes that we called “exposure lights”. Funnily they are not lights at all – just spherical, localized exposure boosters / dimmers. We used dimming possibility in for example first scene of our demo – Kratos reveal, to make him completely invisible in the darkness (there was too much GI🙂 ) and we use brightness boosting capabilities in many scenes.

Automatic localized adjustments – shadows / highlights

Manual localized exposure adjustments are great, but still – manual. What if we could do it automatically, but without reducing whole image contrast – so:

a) automatically

b) when necessary

c) preserving local contrast?

Seems like Holy Grail of exposure settings, but let’s have a look at tools already at photographers/artists disposal.

Enter… Shadows / Highlights! This is an image manipulation option available in Adobe Photoshop and Lightroom / Camera Raw. Let’s have a look at some image with normal exposure, but lots of bright and dark areas:

photo_orig_exposure.png

We can boost separately shadows:

photo_shadows.png

(notice how bright the trees got – with slight “glow” / “HDR-look” (more about it later).

Now highlights:

photo_highlights.png

Notice more details and saturation in the sky.

And finally, both applied:

photo_both.png

acr_highlights_shadows.gif

What is really interesting, is that it is not a global operator and doesn’t only reshape exposure curve. It’s actually a contrast-preserving, very high quality localized tonemapping operator. Halo artifacts are barely visible (just some minor “glow”)!

Here is an extreme example that hopefully shows those artifacts well (if you cannot see them due to small size – open images in a separate tab):

localized_extreme.gifhalo.png

Interestingly, while ACR/Lightroom HDR algorithm seems to work great until pushed to the extreme, same Shadows/Highlights looks quite ugly in Photoshop in extreme settings:

photoshop_highlights_shadows.png

Aaaargh, my eyes!🙂 Notice halos and weird, washed out saturation.

Is the reason only less information to work with (bilateral weighting in HDR can easily distinguish between of -10EV vs -8EV while 1/255 vs 2/255 provides almost no context/information?) or a different algorithm – I don’t know.

Actual algorithms used are way beyond scope of this post – and still a topic I am investigating (trying to minimize artifacts for runtime performance and maximize image quality – no halos), but I was playing with two main categories of algorithms:

  • Localized exposure (brightness) adjustments, taking just some neighborhood into account and using bilateral weighting to avoid halos. I would like to thank here our colleagues at Guerilla Games for inspiring us with an example of how to apply it in runtime.
  • Localized histogram stretching / contrast adjustment – methods producing those high structure visibility, oversaturated, “radioactive” pictures.

There are obviously numerous techniques and many publications available – sadly not many of them fit in a video game performance budget.

In “God of War”

Enough talking about photography and Adobe products, time to get back to God of War!

I implemented basic shadows/highlights algorithm with artist tweakable controls and trying to match behavior of Lightroom. First screenshot shows a comparison of “shadows” manipulation and regular, properly tonemapped screenshot with a filmic tonemapping curve.

filmic_vs_shadows.gif

hist_filmic_vs_shadows.gif

I set it to some value that is relatively subtle, but still visible (artists would set it from more subtle settings to more pronounced in gameplay-sensitive areas). Now the same with highlights options:

filmic_vs_highlights.gif

hist_filmic_vs_highlights.gif

One thing that you might notice here is haloing artifacts – they result from both relatively strong setting as well as some optimizations and limitations of the algorithm (working in lower/partial resolution).

Finally, with both applied:

filmic_vs_shadows_and_highlights.gif

hist_filmic_vs_both.gif

As I mentioned – here it is shown in slightly exaggerated manner and showing artifacts. However, it’s much better than regular “gamma” low contrast settings:

gamma_vs_both.gif

hist_gamma_vs_both.gif

Histogram shows the difference – while gamma / contrast operator tends to “compact” the dynamic range and pack it all in midtones / grays, shadows/highlights operations preserve local contrast, saturation and some information about darkest and brightest areas of the image.

Why localized exposure preserves contrast and saturation? Main difference is that gamma in logarithmic space becomes scale, scaling whole histogram, while exposure/scale becomes just a linear shift (more about it in part 2) and shifts under / over exposed parts with same histogram shape into visible range.

Summary

You can check the final image (a bit more subtle settings) here:

final.png

To sum up – I don’t think that problems of exposure and dynamic range in real time rendering are solved. Sometimes scenes rendered using realistic reference values have way too large dynamic range – just like photographs.

We can fix it with complicated adjustments of the lighting (like they do on movie sets), some localized exposure adjustments (in 3D “exposure lights”) or using simple “procedural” image space controls of shadows/highlights.

Possible solutions depends heavily on the scenario. For example – if you can cut the camera, you have many more options than when is it 100% free and not constrained with zero cuts. It also depends how much budget do you have – both in terms of milliseconds to spend on extra lights as well as in terms of lighting artists time.

Sometimes a single slider can make scene look much better and while localized exposure / localized tonemapping can have its own problems, I recommend adding it to your artists’ toolset to make their lives easier!

If you are interested a bit more in the dynamic range, tonemapping and gamma operations, check out my second post in the mini-series.

References

http://renderwonk.com/publications/s2010-color-course/ SIGGRAPH 2010 Course: Color Enhancement and Rendering in Film and Game Production

http://gpuopen.com/gdc16-wrapup-presentations/  “Advanced Techniques and Optimization of HDR Color Pipelines”, Timothy Lottes.

https://en.wikipedia.org/wiki/Zone_System

http://photography.tutsplus.com/tutorials/understanding-using-ansel-adams-zone-system–photo-5607

https://luminous-landscape.com/zone-system/

https://en.wikipedia.org/wiki/Dodging_and_burning

https://en.wikipedia.org/wiki/Graduated_neutral-density_filter

https://en.wikipedia.org/wiki/Polarizing_filter_(photography)

 

 

Posted in Code / Graphics, Travel / Photography | Tagged , , , , , , , , | 7 Comments