Fixing screen-space deferred decals

Screen-space deferred decals are a very popular technique. There were so many presentations and blog posts about it that I will just list couple of them (just a first google search results page to be honest…) in no particular order:

Therefore I think it wouldn’t be exaggeration to call it “industry standard”.

The beauty of screen-space decals used together with deferred rendering is that they provide further “defer” of another part of the rendering pipeline – in this case of layered materials and in general – modifications to the rendered surface, both static and dynamic. Just like you can defer the actual lighting from generating the material properties (in deferred lighting / shading), you can do it as well with composite objects and textures.

You don’t need to think about special UV mapping, unwrapping, shader or mesh permutations, difficult and expensive layered material shaders, even more difficult pipelines for artists (how to paint 2 partially overlapping objects at once? How to texture it in unique way depending on the asset instance?) or techniques as complex and hard to maintain as virtual texturing with unique space parametrization.

Instead just render a bunch of quads / convex objects and texture it in the world space – extremely easy to implement (matter of hours/max days in complex, multi-platform engines), very easy to maintain (usually only maintenance is making sure you don’t break MRT separate blending modes and normals en/decoding in the G-Buffer) and easy for artists to work with. I love those aspects of screen-space decals and how easily they work with GBuffer (no extra lighting cost). I have often seen the deferred decals as one of important advantages of the deferred shading techniques and cons of the forward shading!

However, I wouldn’t write this post if not for a serious deferred screen-space decals problem that I believe every presentation failed to mention!

Later post edit: Humus actually described this problem in another blog post (not the original volume decals one). I will comment on it on one of later sections.

(Btw. a digression – if you are a programmer, researcher, artist, or basically any author of talk or a post – really, please talk about your failures, problems and edge cases! This is where 90% of engineering time is spent and mentioning it doesn’t make any technique any less impressive…).

Dirty screen-space decal problem

Unfortunately, in all those “simple” implementations presented in blog posts, presentations and articles there is a problem with the screen space decals that makes them in my opinion unshippable without any “fix” or hack in PS4/XboxOne generation of AAA games with realistic and complex lighting, materials and sharp, anisotropic filtering. Funnily enough, I found only one (!) screenshot in all those posts with such camera angle that presents this problem… Edge artifacts. This is a screenshot from the Saint Row: The Third presentation.

Problem with screen-space decals - edges

Problem with screen-space decals – edges. Source: Lighting and Simplifying Saints Row: The Third

I hope the problem is clearly visible on this screenshot – some pixels near the geometric edges perpendicular to the camera do not receive the decal properly and its background is clearly visible. I must add that in motion such kind of artifacts looks even worse. :( Seeing it in some other engine, I suspected at first many other “obvious” different reasons that cause edge artifacts – half-texel offsets, wrong depth sampling method, wrong UV coordinates… But the reason for this artifact is quite simple – screen space UV derivatives and the Texture2D.Sample/tex2DSample instruction!

Edit: there are other interesting problems with the screen-space / deferred decals. I highly recommend reading Sébastian Lagarde and Charles de Rousiers presentation about moving Frostbite to PBR in general (in my opinion the best and most comprehensive PBR-related presentation so far!), but especially section 3.3 about problems with decals and materials and lighting.

Guilty derivatives

The guilty derivatives – source of never ending graphics programmers frustrations, but also a solution to a problem unsolved otherwise. On the one hand a necessary feature for antialiasing of textures and the texturing performance, on the other hand a workaround with many problem of its own. They cause your quad overshading and inability to handle massive amounts of very small triangles (well, to be fair there are some more other reasons like vertex assembly etc.), they are automatically calculated for textures only in pixel shaders (in every other shader stage to use texturing you need to specify the LOD/derivatives manually), their calculations are imprecise and possibly low quality; they can cause many types of edge artifacts and are incompatible with jittered rasterization patterns (like flip-quad).

In this specific case, let’s have a look at how the GPU would calculate the derivatives, first by looking how per quad derivatives are generated in general.

Rasterized pixels

Rasterized pixels – note – different colors belong to the different quads.

In the typical rendering scenario and a regular rendering (no screen-space techniques) of this example small cylinder object, there would be no problem. Quad containing pixels A and B would get proper derivatives for the texturing, different quad containing pixels C and D would cause some overshading, but still have proper texture UV derivatives – no problem here as well (except for the GPU power loss on those overshaded pixels).

So how do the screen-space techniques make it not work properly? The problem lies within the way the UV texture coordinates are calculated and reprojected from the screen-space (so the core of this technique). And contrary to the triangle rasterization example, the problem with decal being rendered behind this object is not with the pixel D, but actually with the pixel C!

Effect of projecting reconstructed position into decal bounding box

Effect of projecting reconstructed position into decal bounding box

We can see on this diagram how UVs for the point C (reprojected from the pixel C) will lie completely outside the bounding box of the decal (dashed-line box), while the point D has proper UV inside it.

While we can simply reject those pixels (texkill, branch out with alpha zero etc. – doesn’t really matter), unfortunately they would contribute to the derivatives and mip level calculation.

In this case, the calculated mip level would be extremely blurry – the calculated partial derivative sees a difference of 1.5 in the UV space! As usually the further mip levels contain mip-mapped alpha as well then we end up with almost transparent alpha from the alpha texture or bright/blurred albedo and many kinds of different edge artifacts depending on the decal type and blending mode…

Other screen-space techniques suffering

Screen-space/deferred decals are not the only technique suffering from this kind of problems. Any kind of technique that relies on the screen-space information reprojected to the world space and used as the UV source for the texturing will have such problems and artifacts.

Edit: The problem of mip-mapping, derivatives and how screen-space deferred lighting with projection textures can suffer from it was described very well by Aras Pranckevičius.

Other (most common) examples include projection textures for the spot-lights and the cubemaps for the environment specular/diffuse lighting. To be honest, in every single game engine I worked with there were some workarounds for this kind of problems (sometimes added unconsciously :) more about it in on of the next sections).

Not working solution – clamping the UVs

The first, quite natural attempt to fix it is to clamp the UVs – also for the discarded pixels, so that derivatives used for the mip-mapping are smaller in such problematic case. Unfortunately, it doesn’t solve the issue; it can make it less problematic of even completely fix it when the valid pixel is close to the clamped, invalid one, but it won’t work in many other cases… One example would be a having an edge between some rejected pixels close to U or V 0 and some valid pixels close to U or V 1; In this case still we get full mip chain dropped due to huge partial derivative change within this quad.

Still, if you can’t do anything else, it makes sense to throw in there a free (on most modern hardware) saturate instruction (or instruction modifier) for some of those rare cases when it helps…

Brutal solution – dropping mip-maps

I mentioned quite natural “solution” that I have seen in many engines and that is an acceptable solution for most of other screen space techniques – not using mip-maps at all. Replace your Sample with SampleLevel and the derivative and mip level problem is solved, right? ;)

This works “ok” for shadow maps – as the aliasing is partially solved by commonly used cascaded shadow mapping – further distances get lower resolution shadow maps (plus we filter some texels anyway)…

It is “acceptable” for the projection textures, usually because they are rendered only when being close to the camera because of a) high lighting cost b) per-scene and per-camera shot tweaking of lights.

It actually often works well with the environment maps – as lots of engines have Toksvig or other normal variance to roughness remapping and the mip level for the cubemap look-up is derived manually from the roughness or gloss. :)

However, mip mapping is applied on textures for a reason – removing aliasing and information in frequency higher than rasterizer can even reproduce. For things like shiny, normal mapped deferred decals like blood splats the effect of having no mip maps can be quite extreme and the noise and aliasing unacceptable. Therefore I wouldn’t use this as a solution in a AAA game, especially if deferred, screen-space decals are used widely as a tool for environment art department.

A middle ground here could be just dropping some further mip maps (for example keeping mips 0-3). This way one could get rid of extreme edge artifacts (when sampling completely invalid last mip levels) and still get some basic antialiasing effect.

Possible solution – multi-pass rendering

This is again a partial solution that would fix problems in some cases, but not in most. So the idea is to inject decal rendering in-between the object rendering with per-type object sorting. So for example “background”/”big”/”static” objects could be rendered first, decals projected on top of them and then other object layer.

This solution has many disadvantages – the first one is complication of the rendering pipeline and many unnecessary device state changes. The second one – the performance cost. Potential overshading and overdraw, wasting the bandwidth and ALU for pixels that will be overwritten anyway…

Finally, the original problem can be still visible and unsolved! Imagine a terrain with high curvature and projecting the decals on it – a hill with a valley background can still produce completely wrong derivatives and mip level selection.

Possible solution – going back to world space

This category of solutions is a bit cheated one, as it derives from the original screen-space decals technique and goes back to the world space. In this solution, artists would prepare a simplified version of mesh (in extreme case a quad!), map UV on it and use such source UVs instead of reprojected ones. Such UVs would mip-map correctly and won’t suffer from the edge artifacts.

Other aspects and advantages of the deferred decals technique would remain the same here – including possibility of software Z-tests and rejecting based on object ID (or stencil).

manual_decals

On the other hand, this solution is suitable only for the environment art. It doesn’t work at all for special effects like bullet holes or blood splats – unless you calculate source geometry and its UV on the CPU like in “old-school” decal techniques…

It also can suffer from wrong, weird parallax offset from the UV not actually touching the target surface – but in general camera settings in games never allow for extreme close ups so that it would be noticeable.

Still, I mention this solution because it is very easy on the programming side, can be good tool on the art side and actually works. It was used quite heavily in The Witcher 2 in the last level, Loc Muinne – as an easier alternative for messy 2nd UV sets and 2 layered, costly materials.

I’m not sure if those specific assets in this following screenshot used it, but such partially hand-made decals were used on many similar “sharp-ended” assets like those rock “teeth” on the left and right on the door frame in this level.

Loc_Muinne_sewers_screen1

It is much easier to place them and LOD out quickly with distance (AFAIK they were present only together with a LOD 0 of a mesh) than creating multi-layered material system or virtual texture. So even if you need some other, truly screen-space decals – give artists possibility of authoring manual decal objects blended into the G-Buffer – I’m sure they will come up with great and innovative uses for them!

Possible solution – Forward+ decals

Second type of “cheated” solutions – fetch the decal info from some pre-culled list and apply it during the background geometry rendering. Some schemes like per-tile pre-culling like in Forward+ or clusterred lighting can make it quite efficient. It is hard for me to estimate the cost of such rendered decals – depends probably on how expensive are your geometry pixel shaders, how many different decals you have, are they bound on memory or ALU, can they hide some latency etc. One beauty of this solution is how easy it becomes to use anisotropic filtering, how easy is it to blend normals (blending happens before any encoding!), no need to introduce any blend states or decide what won’t be overwritten due to storage in alpha channel; Furthermore, it seems it should work amazingly well with MSAA.

Biggest disadvantages – complexity, need to modify your material shaders (and all of their permutations that probably already eat too much RAM and game build times), increased register pressure, difficulty debugging and potentially biggest runtime cost. Finally, it would work properly only with texture arrays / atlases, which add a quite restrictive size limitation…

Possible solution – pre-calculating mip map / manual mip selection

Finally, a most “future research” and “ideas” category – if you have played with any of them and have experience or simply would like to share your opinion about them, please let me know in comments! :)

So, if we a) want the mip-mapping and b) our screen-space derivatives are wrong, then why not compute the mip level or even the partial derivatives (for anisotropic texture filtering) manually? We can do it in many possible ways.

One technique could utilize in-quad communication (available on GCN explicitly or via tricks with many calls to ddx_fine / ddy_fine and the masking operations on any DX11 level hw) and compute the derivatives manually only when we know that pixels are “valid” and/or come from the same source asset (via testing distances, material ID, normals, decal mask or maybe something else). In case of zero valid neighbors we could fall back to using the zero mip level. In general, I think this solution could work in many cases, but I have some doubts about its temporal stability under camera movement and the geometric aliasing. It also could be expensive – it all depends on the actual implementation and used heuristics.

Another possibility is calculating the derivatives analytically during reconstruction, given the target surface normal and the distance from the camera. Unfortunately a limitation here is how to read the source mesh normals without the normal-mapping applied. If your G-Buffer layout has them lying somewhere (interesting example was in the Infamous: Second Son GDC 2014 presentation) around then great – they can be used easily. :) If not, then IMO normal-mapped information is useless. One could try to reconstruct normal information from the depth buffer, but this is either not working in the way we would like it to be – when using simple derivatives (because we end up having exact same problem like the one we are trying to solve!) – or expensive when analyzing bigger neighborhood. If you have the original surface normals in G-Buffer though it is quite convenient and you can safely read from this surface even on the PC – as decals are not supposed to write to it anyway.

Post edit: In one older post Humus described a technique being a hybrid of the ones I mentioned in 2 previous paragraphs – calculating UV derivatives based on depth differences and rejection. It seems to work fine and probably is the best “easy” solution, though I would still be concerned by temporal stability of technique (with higher geometric complexity than in the demo) given that approximations are calculated in screen-space. All kinds of “information popping in and out” problems that exist in techniques like SSAO and SSR could be relevant here as well.

Post edit 2: Richard Mitton suggested on twitter a solution that seems both smart and extremely simple – using the target decal normal instead of surface normal and precomputing those derivatives in the VS. I personally would still scale it by per-pixel depth, but it seems like this solution would really work in most cases (unless there is huge mismatch of surface curvature – but then decal would be distorted anyway…). Anyway it seems it would work in most cases.

Final possibility that I would consider is pre-computing and storing the mip level or even derivatives information in the G-Buffer. During material pass, most useful information is easily available (one could even use CalculateLevelOfDetail using some texture with known UV mapping density and later simply rescale it to the target decal density – assuming that projection decal tangent space is at least somehow similar to the target tangent space) and depending on the desired quality it probably could be stored in just few bits. “Expensive” option would be to calculate and store the derivatives for potential decal anisotropic filtering or different densities for target triplanar mapping – but I honestly have no idea if it is necessary – probably depends what you intend to use the decals for.

This is the most promising and possibly cheap approach (many game GDC and Siggraph presentations proved that next-gen consoles seem to be quite tolerant to even very fat G-Buffers :) ), but makes the screen-space decals less easy to integrate and use and requires probably more maintenance, editing your material shaders etc.

This idea could be extended way further and generalized towards deferring other aspects of material shading and I have discussed it many times with my industry colleagues – and similar approach was described by Nathan Reed in his post about “Deferred Texturing”. I definitely recommend it, very interesting and inspiring article! Is it practical? Seems to me like it could be, the first game developers who will do it right could convince others and maybe push the industry into exploring interesting and promising area. :)

Special thanks

I would like to thank Michal Iwanicki, Krzysztof Narkowicz and Florian Strauss for inspiring discussions about those problems and their potential solutions that lead to me writing this post (as it seems that it is NOT a solved problem and many developers try to workaround it in various ways).

Posted in Code / Graphics | Tagged , , , , , , , , | 3 Comments

Anamorphic lens flares and visual effects

Introduction

There are no visual effects that are more controversial than various lens and sensor effects. Lens flares, bloom, dirty lens, chromatic aberrations… All of those have their lovers and haters. Couple years ago many games used cheap pseudo HDR effect by blooming everything; then we had light-shafts craze (almost every UE3 game had them, often set terribly – “god rays” not matching the lighting environment and the light source at all) and more recently many lo-fi lens effects – dirty-lens, chromatic aberrations and anamorphic flares/bloom.

They are extremely polarizing – on the one hand for some reason art directors and artists love to use them, programmers engines implement them in their engines, but on the other hand lots of gamers or movie audience seem to hate those effects and find their use over the top or even distracting. Looking for some examples of those effects in games it is way easier to find criticism like http://gamrconnect.vgchartz.com/thread.php?id=182932 (more on neogaf and basically any large enough gamer forum) than any actual praise… Hands up if you have ever heard from a player “wow, this dirty lens effect was soooo immersive, more of that please!”. ;)

Killzone lens flares - high dynamic range and highly saturated colors producing interesting beautiful effect, or abused effect and unclear image?

Killzone lens flares – high dynamic range and highly saturated colors producing interesting visuals, or abused effect and unclear image?

It is visible not only in games, but also movies – it went to the extreme point that after tons of criticism movie director J.J. Abrams supposedly apologized for over-using the lens flares in his movies.

Star Trek: Into Darkness lens effects example, source: http://www.slashfilm.com/star-trek-lens-flares/

Star Trek: Into Darkness lens effects example, source: http://www.slashfilm.com/star-trek-lens-flares/

Among other graphics programmers and artists I have heard very often quite strong opinion “anamorphic effects are interesting, but are good only for the sci-fi genre or modern FPS”.

Before stating any opinion of my own, I wanted to write a bit more about anamorphic effects, which IMO are quite fascinating and actually physically “inspired“. To understand them, one has to understand the history of cinematography and analog film tapes.

Anamorphic lenses and film format

I am not a cinema historian or expert, so first I will reference you to two links that cover the topic much more in depth and in my opinion much better and provide some information about the history:

Wikipedia entry

RED guide to anamorphic lenses

To sum it up, anamorphic lenses are lenses that (almost always) provide double squeezing of the image in the horizontal plane. They were introduced to provide much higher vertical resolution of the image when cinema started to experiment with widescreen formats. At that time, most common film used were 35mm tapes and obviously whole industry didn’t want to exchange all of its equipment to larger format (impractical equipment size, more expensive processes), especially just for some experiments. Anamorphic lenses allowed for that by using essentially analog and optics-based compression scheme. This compression was literal one – by squeezing the image before exposing a film and later decompressing by unsqueezing it when screening it in the cinema.

First example of a movie shot using anamorphic lenses is The Robe from 1953, over 60 years ago! Anamorphic lenses provided simple 2:1 squeeze no matter what was the target aspect ratio – but there were various different target aspect ratios depending if sound was encoded on the same tape, what was the format etc.

 

No anamorphic image stretching - limited vertical resolution. Source: wikipedia author Wapcaplet

No anamorphic image stretching – limited vertical resolution. Source: wikipedia author Wapcaplet

 

Effect of increased vertical resolution due to anamorphic image stretching. Source: wikipedia author Wapcaplet

Effect of increased vertical resolution due to anamorphic image stretching. Source: wikipedia author Wapcaplet

To compensate for squeezed, anamorphic image inverse conversion and stretching were performed during the actual movie projection. Such compression didn’t leave the image quality unaffected – due to lens imperfections it resulted in various interesting anamorphic effects (more about it later).

Anamorphic lenses are more or less a thing of the past – since the transition to digital format, 4K resolution etc. they are not really needed anymore and are expensive, but also incompatible with many cameras, poor optical quality etc. I don’t believe if anamorphic lenses are used anymore at all, maybe except for probably some niche experiments – but please correct me in comments if I’m wrong.

Lens flares

Before proceeding with the description of how it affects the lens flares, I wanted to refer to a great write-up by Padraic Hennessy about physical basis for the lens flares effects in actual, physical lenses. This post covers comprehensively why all lenses (unfortunately) produce some flares and about simulation of this effects.

In shortcut – physical lenses used for movies and photography consist of many glass lens groups. Because of Fresnel law and different IOR of every layer, light is never transmitted perfectly and in 100% between the air and glass. Note: lens manufacturers coat glass with special nano-coating to reduce it as much as possible (except for some hipster “oldschool” lenses versions)- but it’s impossible to reduce it completely.

Optical elements - notice how close pieces of glass are together (avoiding glass/air contact)

Optical elements – notice how close pieces of glass are together (avoiding glass/air contact)

Having many groups and different transmission values it results in light reflecting and bouncing multiple times inside the lens before hitting the film or sensor – and in effect some light leaking, flares, transmittance loss and ghosting. In cases of low dynamic range scenes, due to very small amount of light that gets reflected every time, it produces negligible results – but it is worth noting that the image always contains some ghosting and flares, sometimes it is not measurable. However with extremely high dynamic range light sources like sun (orders of orders of magnitude higher intensity), the light after bouncing and reflecting can be still brighter than actual other image pixels!

Anamorphic lens flares

Ok, so we should understand at this point the anamorphic format, anamorphic lenses and the lens flares, so where do the anamorphic lens flares come from? This is relatively simple – light reflection on the glass-air contact surface can happen in many places in the physical lens. It can happen both before and after the anamorphic lens components. Therefore extra light transmitted and producing a lens flare will be ghosted as if the image was not-anamorphic and had regular, not squished aspect ratio. If you look at exposed and developed such film, you will see squished image, but with some regular looking circular lens flares. Then, during film projection it will be stretched and viola – an horizontal, streaked, anamorphic lens flare and bloom! :)

Reproducing anamorphic effects – an experiment

Due to extremely simple nature of anamorphic effects – just your lens effects happen in 2x squeezed texture space, you can quite simply reproduce them. I added option to do so to my C#/.NET Framework for graphics prototyping (git update soon) together with some simplest procedural and fake lens flares and bloom. I just squeezed my smaller resolution buffers used for blurring by 2 – that simple. :) Here are some comparison screenshots that I’ll comment in the next paragraph – for the first 3 of them the blur is relatively smaller. For some of them I added some bloom extra color multiplier (for the cheap sci fi look ;) ), some other ones have uncolored bloom.

Please note that all of those screenshots are not supposed to produce artistically, aesthetically pleasing image, but to demonstrate the effect clearly!

c1 c2 c3

In the following ones the bloom/flare blur is 2x stronger and the effect probably more natural:

c4 c5 c6

Bonus:

I tried to play a bit with anamorphic bokeh achieved in similar way.

anamorphic_bokeh1 anamorphic_bokeh2Results discussion

First of all, we can see that with simple, procedural effects and Gaussian blurs using real stretch ratio of 2:1 it is impossible to achieve crazy effect of anamorphic flares and bloom seen in many movies and games with single, thin lines across the whole screen. So be aware that it can be an artistic choice – but has nothing to do with real, old school movies and anamorphic lenses. Still, you will probably get such a request when working on an engine with real artists or for some customers – and there is nothing wrong with that.

Secondly, the fact that procedural effects are anamorphic, makes it more difficult to see the exact shape of ghosting, blends them together and makes less distracting. This is definitely a good thing. It is questionable if it can be achieved only by a more aggressive blur on its own – in my opinion the non-uniform blurring and making the shapes not mirrored perfectly is more effective for this purpose.

Thirdly, I had no expectations for the anamorphic bokeh, played with it as some bonus… And still don’t know what to think about it, as for the results I’m not as convinced. I never got a request from an artist to implement it and it definitely can look weird (more like a lazy programmer who wrongly implemented aspect ratio during DOF ;) ), but it is worth knowing that such effects actually existed in the real, anamorphic lens/film format scenario.

Probably I would prefer to spend some time investigating the physical basis and how to implement busy, circular bokeh (probably just some anamorphic stretch perpendicular to the radius of the image).

My opinion

In my opinion, anamorphic effects like bloom, glare and lens flares are one of many effects and tools in the artists toolbox. There is a physical basis for such effect and they are well established in the history of the cinema. Therefore viewers and audience are used to their characteristic look and even subconsciously can expect to see them.

They can be abused, applied in ugly or over-stylized manner that has nothing to do with reality – but that is not the problem of the technique; it is again some artistic choice, fitting some specific vision. Trust your artists and their art direction.

I personally really like subtle and physically inspired 2x ratio of anamorphic lens flares, glare and bloom and think they make scene look better (less distracting) than isomorphic, regular procedural effects. Everything just “melts” together nicely.

I would argue with someone saying such effects fit only sci-fi setting – in my opinion creating simulation of cinematic experience (and a reference to the past movies…) is just as valid as trying to 100% simulate human vision only for any kind of game – it is matter of creative and artistic direction of the game and its rendering. Old movies didn’t pick anamorphic lens flares selectively for specific set&setting – it was a workaround for film technical limitation and used to exist in every genre of movies!

Therefore, I don’t mind them in some fantasy game – as long as the whole pipeline is created in cinematic and coherent way. Good example of 100% coherent and beautiful image pipeline is The Order 1886 – their use of lens aberrations, distortion and film grain looks just right (and being an engineer – technically amazing!) and doesn’t interfere with the fantasy-Victorian game setting. :)

Probably over-stylized and sterile, extreme anamorphic lens flares and bloom producing horizontal light streaks over whole screen don’t fit into the same category though. I also find them quite uncanny in fantasy/historical settings. Still, as I said – at this point such extreme effects are probably a conscious decision of the art director and should serve some specific purpose.

I hope that my post helped at least a bit with understanding the history and reasoning behind the anamorphic effects – let me know in comments what you think!

Posted in Code / Graphics | Tagged , , , , , , , , , , , , , , , | 15 Comments

Processing scanned/DSLR photos of film negatives in Lightroom

The topic I wanted to cover in this post is non-destructive workflow for “developing” photographed or scanned negatives – in this case B&W film. Why even bother about film? Because I still love analog photos as a hobby. :) I wrote some time ago a post about it.

Previously I used to work on my scanned negatives in Photoshop. However, even with specialized scanning software like Silverfast it is quite painful process. Using Photoshop as additional step has many problems:

  • Either you are working on huge files or lose lots of quality and resolution on save.
  • Even 16 bit output images from the scanning software are low dynamic range.
  • 16 bit uncompressed TIFF files output from scanning software are insanely big in size.
  • Batch processing is relatively slow and hard to control.
  • If you don’t keep your data in huge and slow to load PSD files and adjustment layers, you are going to lose information on save.
  • Photoshop is much more complex and powerful tool, not very convenient for a quick photo collection edit. That’s why Adobe created Lightroom. :)

No scanner? No problem!

Previously I used a scanner on many of my all time favourite film photos.

At the end of the post (not to make it too long) there are some examples of ones scanned using excellent (for the price and its size) Epson V700. I developed B&W photos myself, color one was developed at a pharmacy lab for ~$2 a roll.

They are one of my favourite photos of all time. All taken using relatively compact Mamiya 6. Such a high quality is possible with medium format film only though – don’t expect such results from a small frame 35mm camera.

However, I foolishly left my scanner in Poland and wasn’t able to use anymore (I’m afraid that such fragile equipment can get broken during shipping without proper packaging). Buying a new one is not super-cheap. Therefore I started to experiment with using a DSLR or camera in general for getting decent looking positive digital representation of negatives. I confirm that it is definitely possible even without buying expensive macro lens – I hope to write a bit more about the reproduction process later when it improves. For now I wanted to describe the non-destructive process I came up with in Lightroom using a small trick with curves.

The Lightroom workflow

Ok, so you take a photo of your negative using some slide copying adapter. You get results looking like this:

step0-inputFar from perfect, isn’t it? The fact that it was a TriX 400 roll rated at ASA 1250 and developed using Diafine doesn’t help – it is push developer, quite grainy and not very high detail with specific contrast that cannot be fixed using Adams Zone System and development times. It was also taken using a 35mm rangefinder (cheapest Voigtlander Bessa R3A), so when taking the photo you can’t be sure of proper crop, orientation or sharpness.

But enough excuses – it’s imperfect, but we can try to work around many of those problems in our digital “lightroom”. :)

Ok, so let’s go and fix that!

Before starting, I just wanted to make it clear – all I’ll describe is relevant only if you use RAW files and have a camera with dynamic range that is good enough – any DSLR or mirrorless bought within last 5-6 years will be fine.

1. Rough and approximate crop

step1-approx-cropThe first step I recommend for convenience of even evaluating your film negatives is doing some simple cropping and maybe rotating the shot. It not only helps you judge photo and decide if you want to continue “developing” it (could have been a completely missed shot), but also will help the histogram used for further parts of development.

2. Adjusting white balance and removing saturation and noise reduction

step2-wb-and-saturationIn the next step I propose to adjust white balance (just use pipette and pick any gray point) and completely remove saturation from your photo if working with black and white. Every film has some color cast (depends on the film and developer – can be purple, brown etc). Also since you (should) take it using small ISO like 100, you can safely remove any extra digital noise reduction that unfortunately is on by default in Lightroom – that’s what I did here as well. Notice that in my example color almost didn’t change at all – the camera was smart enough to auto adjust its WB automatically to the film color.

3. Magic! Inverting the negative

step3-inverting-negativeOk, this is the most tricky part and a feature that Lightroom is lacking – a simple “invert”. In Photoshop it is one of basic menu options, there is even a keyboard shortcut, here you have to… use the curves. Simply grab your white point and turn it to 0, and do the opposite with the black point – put it to 1. Simple (though UI sometimes can get stuck, so adjust those points slowly) and works great! Finally you can see something on this photo. You can also see that this digital “scan” is far from perfect, as the film was not completely flat – blurriness on the edges. :( But in the era of desired low-fi and instagram maybe it is an advantage? ;)

4. (Optional) Pre-adjusting the exposure and contrast

step4-exposureThis step can be optional – depends on the contrast of your developed film. In my case I decided to move them a bit to make it easier later on further operations on the curves – otherwise you might have to do precise sub-pixel changes with the curves which due to UI imprecision can be inconvenient. I also cropped a bit extra to make the histogram even better and not fooled by the fully lit or dark borders.

5. Adjusting white point

step5-white-pointNow having the histogram more equalized and useful, you can set your white point using curves. Obviously your histogram is now reversed – this can be confusing at first, but after working on a first scan you can quickly get used to it. The guidelines here are the same like with regular image processing or photography – tweak your slider looking for the points of your scene you want to be white (with B&W one can be more radical on this step) using the histogram as a helper.

Why I’m not using Lightroom “general” controls like blacks, whites etc.? Because their behavior is reversed and very confusing. They also do some extra magic and have non-linear response, so it’s easier to work with curves. Though if you can find optimal workflow using those controls – let me know in comments!

6. Adjusting the black point

step6-black-pointNext step is simple and similar – you proceed in the same way to find your black point and darkest parts of your photo.

At this point your photo may look too contrasty, too dark or too bright – but don’t worry, we are going to fix it in the next step. Also since all editing in Lightroom is non-destructive, it is going to have still the same quality.

7. Adjusting gamma curve / general brightness

step7-gammaIn this step you add another control point to your curve and by dragging it, create smooth, gamma response. In this step look mainly at your midtones – medium grays – and general ambiance of the photo.

You can make your photo brighter or darker – it all depends. In this case I wanted slightly brighter one.

It can start to lack extra “punch” and the response can become too flat – we will fix it in the next point.

8. Adding extra contrast in specific tonal parts

step8-contrast-heelBy adding extra point and “toe” to your curve, you can boost the contrast in specific parts. I wanted this photo to be aggressive (I like how black and white chest pieces work well in B&W), so I added quite intensive one – now looking at it I think I might have over done it, but this whole post is instructional.

Fun-fact for photographers who are not working in the games industry or are not technical artists or graphics programmers – such S-shaped curve is often called in games an “filmic tonemapping” curve, from analog photo or movie film/tape.

9. Final crop and rotation

step9-final-cropI probably should have done it earlier, but I added extra cropping and straightened the photo. I added some extra sharpening / unsharp mask to compensate for not perfectly sharp both original photo and the scan of the film.

10. Results after saving to the hard drive

DSC01359This is how it looks after a resizing save from Lightroom to disk – not too bad comparing to the starting point, right?

The best advantage of Lightroom – you can extremely easy copy those non-destructive settings (or even create a preset!) and apply it to other photos on your scan! I spent on those 2 extra shots no longer than 3 minutes each. :) Very convenient and easily controllable batch processing.

DSC01362 DSC01361

Conclusions

I hope I have shown that non-destructive workflow of processing scans of negatives in Lightroom can be fast, easy, productive and you can batch-process many photos. This is amazing tool and I’m sure other better photographers will get even better results!

And I promise to write more about my scanning rig assembled for around $100 (assuming you already have a camera) and post some more scans from better quality, lower ASA films or medium format shots.

Bonus

As I promised, some photos that I took couple years ago using Mamiya 6. All scanned and processed manually (color one developed at cheap pharmacy photo lab). Medium format 6×6 composition – another reason to start using film. :)

wilno3 wilno2 wilno1 warsaw3 warsaw2 warsaw1

Posted in Travel / Photography | Tagged , , , , , , , , , , | Leave a comment

Designing a next-generation post-effects pipeline

Hey, it’s been a while since my last post. Today I will focus on topic of post-effects. Specifically, I wanted to talk about next-gen post process pipeline and redesign I worked on while being a part of Far Cry 4 rendering team. While I’m no longer Ubisoft employee and my post won’t represent the company in any way and can’t share for example internal screenshots and debug views of buffers and textures I don’t have access to anymore, I think it is a topic worth discussing just in “general” way and sharing and some ideas could be useful for other developers. Some other aspects of the game were discussed in Michal Drobot’s presentation [1]. Also at the GDC 2015 Steve McAuley will talk about Far Cry impressive lighting and vegetation technology [9] and Remi Quenin about game engine, tools and pipeline improvements [12] – if you are there, be sure to check their presentations!

Whole image post-processing in 1080p on consoles took around 2.2ms.

Whole image post-processing in 1080p on consoles took around 2.2ms.

Introduction

Yeah, image post processing – usual and maybe even boring topic? It was described by almost every game developer in detail during previous console generation. Game artists and art directors got interested in “cinematic” pipelines and movie-like effects that are used to build mood, attract viewers’ attention to specific parts of the scene and in general – enhance the image quality. So it was covered very well and many games got excellent results. Still, I believe that most games post-effects can be improved – especially given new, powerful hardware generation.

Definition of a post-effect can be very wide and cover anything from tone-mapping through AA up to SSAO or even screen space reflections! Today I will cover only “final” post effects that happen after the lighting, so:

  • Tonemapping,
  • Depth of field,
  • Motion blur,
  • Color correction,
  • “Distortion” (refraction),
  • Vignetting,
  • Noise/grain,
  • Color separation (can serve as either glitch effect or fake chromatic aberration),
  • Various blur effects – radial blur, gaussian blur, directional blur.

I won’t cover AA – Michal Drobot described it exhaustively at Siggraph and mentioned some his work on SSAO during Digital Dragons presentation. [1]

State of the art in post-effects

There were many great presentations, papers and articles about post effects. I would like to just give some references to great work that we based on and tried to improve in some aspects:

– Crytek presentations in general, they always emphasize importance of highest quality post-effects. I recommend especially Tiago Sousa’s Siggraph 2011-2013 presentations. [2]

– Dice / Frostbite smart trick for hexagonal bokeh rendering. [3]

– Morgan McGuire work together with University of Montreal on state of the art quality in motion blur. [4]

– And recent amazing and comprehensive publication by Jorge Jimenez, expanding work of [4] and improving real-time performance and plausibility of visual results. [5]

Motivation

With so many great publications available, why we didn’t use exactly same techniques on Far Cry 4?

There are many reasons, but main one is – performance and how effects work together. Far Cry 3, Blood Dragon and then Far Cry 4 are very “colorful” and effect heavy games, it is part of game’s unique style and art direction. Depth of Field, motion blur, color correction and many others are always active and in heavy combat scenes 4-6 other effects kick in! Unfortunately they well all designed separately, often not working very well and they were not working in HDR – so there were no interesting effects like bright bokeh sprites. But even with simple and LDR effects, their frame time often exceeded 10ms! It was clear to us that we needed to address post-processing in unified manner. So re-think, re-design and re-write their pipeline completely. We got a set of requirements from the art director and fx artists:

– Depth of field had to produce circular bokeh. I was personally relieved! :) I wrote already about how much I don’t like hexagonal bokeh and why IMO it makes no sense in games (low-quality/cheap digital camera effect vs human vision and high definition cameras and cinematic lenses). [6]

– They wanted “HDRness” of depth of field and potentially other blur and distortion effects. So bright points should cause bright motion blur streaks or bokeh circles.

– Proper handling of near and far depth of field and no visible lerp blend between sharp and blurred image – so gradual increase/decrease of CoC.

– Many other color correction, vignetting, distortion (refraction) and blur effects.

– Motion blur to work stable and behave properly in high-velocity moving vehicles (no blurring of the vehicle itself) without hacks like masks for foreground objects.

– Due to game fast tempo and many objects moving, lots of blurs happening all the time – no need for proper “smearing” of moving objects; at first art director prioritized per-object MB very low – fortunately we could sneak it in for almost free and getting rid of many artifacts with previous, “masked” motion blur.

– Most important – almost all effects active all the time! DoF was used for sniper rifle aiming, focus on main weapon, binoculars, subtle background blurring etc.

The last point made it impossible to go with many techniques in 1080p and with good performance. We I made ourselves a performance goal – around 2ms spent on post-effects total (not including post-fx AO and AA) per frame on consoles.

Some general GCN/console post-effect performance optimization guidelines

Avoid heavy bandwidth usage. Many post-effects do data multiplication and can eat huge amounts of available memory bandwidth. Anything done to operate on smaller targets, smaller color bit depth, cutting number of passes or other forms of data bw compression will help.

Reduce your number of full screen passes as much as possible. Every such pass had cost associated with reading and outputting a full screen texture – there is some cache reload cost as well as exports memory bandwidth costs. On next-gen consoles it is relatively small, smaller cost than on x360 (when you had to “resolve” after every pass if you wanted to read data back) even in way higher resolution, but in 1080p and with many passes and effects it adds up!

Avoid weird data-dependent control flows to allow efficient latency hiding. I wrote about latency hiding techniques in GCN architecture some time ago [7] and suggested that this architecture in case of many needed samples (so typical post-effect use-case) benefits rather from batching samples together and hiding latency without wave switching. Therefore any kind of data-dependent control flow will prevent this optimization –watch out for branches (especially dynamically calculating required number of samples – often planning for worst case works better! But take it with a grain of salt – sometimes it is good to dynamically reject for example half of samples; just don’t rely on a dynamic condition that can take 1-N samples!).

With efficient GPU caches it is easy to see “discrete performance steps” effect. What I mean is that often adding a new sample from some texture won’t make the performance worse – as GPU will still fit +/- same working set in cache and will be able to perfectly hide the latency. But add too many source textures or increase their size and suddenly timing can increase even 2 times! It means you just exceeded optimal cache working size and started to trash your caches and cause their reloading. This advice doesn’t apply to ALU – it scales almost always with the number of instructions and if you are not bw-bound it is always worth to do some fast math tricks.

Often previous console generation advices are counterproductive. One example is practice from previous consoles to save some ALU in PS by moving trivial additions (like pixel offsets for many samples) to VS and relying on hardware triangle parameter interpolation – this way we got rid of some instructions and if we were not interpolation bound we observed only performance increase. However, on this architecture there is nothing like hardware interpolation – all interpolation is done in PS! Therefore such code can be actually slower than such additions in PS. And thanks to “sample with literal offset” functions (last parameter of almost all Sample / SampleLevel / Gather functions) if you have fixed sample count you probably don’t need to do any ALU operations at all!

Be creative about non-standard instruction use. DX11+ has tons of Sample and Gather functions and they can have many creative uses. For example to take N horizontal samples from 1 channel texture (with no filtering) it is better to do N/2 gathers and just ignore half of gathered results! It really can make a difference and allow for many extra passes with timings of e.g. 0.1ms.

Finally, I would like to touch a quite controversial topic and this is my personal opinion – I believe that designing visual algorithms and profiling runtime performance we should aim to improve the worst case, not the average case. This point is valid especially with special (post) FX – they kick in already when scenery is heaviest for the GPU because of particles, many characters and dynamic camera movement. I noticed that many algorithms rely on forms of “early outs” and special optimal paths. This is great as an addition and to save some millis, but I wouldn’t rely on it. Having such fluctuations makes it much harder for technical artists to optimize and profile the game – I prefer to “eat” some parts of the budget even if the effect is not visible at the moment. There is nothing worse than stuttering in action-heavy games during those intensive moments when the demand for interactivity is highest! But as I said, this is a controversial topic, I know many great programmers who don’t agree with me. There are no easy answers and single solutions – it depends on specific case of game, special performance requirements etc. For example probably hitting 60fps most of the time with occasional drops to 30fps would be better than constant 45 v-synced to 30.

Blur effects

Whole idea for the pipeline is not new or revolutionary; it appeared on many internet forums and blogs for a long time (Thanks to some people I have the reference I was talking about – thanks! [13]). It is based on observation that all blurs can be combined together if we don’t really care about their order. Based on this we started with combining motion blur and depth of field, but ended up including many more blurs: whole-screen blur, radial blur and directional blur. Poisson disk of samples can be “stretched” or “rotated” in given direction, giving blur directionality and desired shape.

Stretching of CoC Poisson disk in the direction of motion vector and covered samples.

Stretching of CoC Poisson disk in the direction of motion vector and covered samples.

If you do it in half screen resolution, take enough samples and calculate “occlusion” smartly – you don’t need more than one pass! To be able to fake occlusion we used “pre-multiplied alpha” approach. Blur effect would be feeded 2 buffers:

  1. Half resolution blur RGB parameters/shape description buffer. Red channel contained “radius”, GB channels contained directionality (signed value – 0 in GB meant perfectly radial blur with no stretch).
  2. Half resolution color with “blurriness”/mask in alpha channel.

In the actual blur pass we wouldn’t care at all about the source of blurriness – just did 1 sample from blur shape buffer, and then did 16 or 32 samples (depending if it was a cut-scene or not) from color buffer, weighting by color alpha and renormalizing afterwards – that’s all! :)

How blur shape and blurriness alpha/mask would be calculated? It was mixture of samples from motion vectors buffer, Circle of Confusion buffer, some artist-specified masks (in case of generic “screen blur” effect) and some ALU for radial blur or directional blur.

Ok, but what about desired bleeding of out-of-focus near objects to sharp in-focus background objects? We used a simple trick of “smearing” the circle of confusion buffer – blurred objects in front of focus plane would blur their CoC on sharp in-focus objects. To extend near objects CoC efficiently and not to extend far-blur objects onto sharp background we used signed CoC. Objects behind the focus plane had negative CoC sign and during the CoC extension we would just simply saturate() fetched value and calculate maximum with unclamped, original value. No branches, no ALU cost – the CoC extension was separable and had some almost-negligible cost of AFAIR 0.1ms.

Synthetic example of DoF CoD without near depth extension.

Synthetic example of DoF CoD without near depth extension.

Synthetic example of DoF CoD with near depth extension. Notice how only near CoC extends onto sharp areas - far CoC doesn't get blurred.

Synthetic example of DoF CoD with near depth extension. Notice how only near CoC extends onto sharp areas – far CoC doesn’t get blurred.

Obviously it was not as good as proper scatter-as-gather approaches and what Jorge Jimenez described in [5], but with some tweaking of this blur “shape” and “tail” it was very fast and produced plausible results.

Whole pipeline overview

You can see very general overview of this pipeline on following diagram.

postprocess_diagram

Steps 1-3 were already explained, but what also deserves some attention is how bloom was calculated. Bloom buffers used fp 11-11-10 color buffers – HDR, when pre-scaled precision was high enough, good looking, and 2x less bandwidth!

For the blur itself, we borrowed idea from Martin Mittring’s Unreal Engine 4 presentation [8]. Mathematical background is easy – according to Central Limit Theorem average of many randomly distributed variables with many distributions including uniform one converges to Gaussian variable distribution. Therefore we approximated Gaussian blur with many octaves of efficiently box-sampled bloom thresholded buffer. Number of samples for every pass was relatively small to keep data in L1 cache if possible, but with many those passes combined effect approached nicely a very wide Gaussian curve. They were combined together to ½ resolution buffer in step 4 with applied artist-specified masks and typical “dirty lens” effect texture (only the last octaves contributed to the dirty lens). There was also combine with “god-rays”/”lens-flare” post-effect in this step, but I don’t know if it was used in final game (cost was negligible, but it definitely is a past-gen effect…).

Most complex, most expensive and only full-screen resolution pass was 5.

It combined not only bloom, half-resolution blurs and sharp image, but also performed tone-mapping operator, 3D texture color correction and other ALU texture operations and simple ALU-based noise/dithering effect (magnitude of noise calculated to be at least 1 bit of sRGB). Please note that the tone-mapping didn’t include the exposure – it was already exposed properly in lighting / emissive / transparent shaders. It allowed for much better color precision, no banding and easier to debug color buffers. I hope that Steve McAuley will describe it more in his GDC talk as part of lighting pipeline he designed and developed.

But what I found surprising performance-wise and I think is worth sharing was that we also calculated distortion /refraction and color separation in there. It was cheaper to do color separation as 3x more samples in every combined buffer! Usually they were not very far away from original ones and it was localized in screen-space and within adjacent pixels, so there was not so much additional cost for those passes! Separate passes for those effects were more expensive (and harder to maintain) than this single “uber-pass”. There were many more different passes combined in there and we applied similar logic – sometimes it is possible to calculate a cascade of effects in a single pass. It allows for saving bandwidth, reducing export cost and improved latency hiding – and post process effect usually don’t have dependent flow in code, so even with lower occupancies performance is great and the latency hidden.

Summary

Described solution performed very fast (and the worst case was only a bit slower than the average) gave nice and natural effects. The way all effects were united and working together allowed for good color precision. As it was called from a single file and the order was clearly defined in one shader file, it was easy to refactor, maintain and change it. Single blur shader provided great performance optimization but also improved the quality (affordable to take many samples).

However, there are some disadvantages of this technique.

– There were some “fireflies”. Artifacts caused by too big smearing of bright HDR pixels, especially when doing some intermediate steps in partial resolution. Smart and fast workaround for it seems to be weighting operator suggested by Brian Karis. [11] It would come at almost no additional cost (already doing premul alpha weighting). However it would mean that artists would lose some of HDR-ness of DoF. So as always – if you cannot do “bruteforce” supersampling, you have to face some trade-offs…

-There was no handling of motion blurred objects “smearing” at all. If you find it a very important feature, then probably it would be possible to do some blurring / extension on motion vectors buffer with taking occlusion into account – but such pass even in half res would add some extra cost.

– Circle of confusion extension/blur for near-objects was sometimes convincing, but sometimes looked artificial. It depended a lot on tweaked parameters and fudge factors – after all, it was a bit “hack”, not proper realistic sprite-based scatter solution. [6]

– Finally, there were some half resolution artifacts. This is pretty self-explainatory. Worst one was caused by taking bilinear samples from half resolution blur “mask” stored in blur buffers alpha channels. Worst case was when moving fast along a wall. Gun was not moving in screen space, but the wall was moving very fast and it accidentally was grabbing some samples from gun outline. We experimented with more aggressive weighting, changing depth minimizing operator to “closest” etc., but it only made the artifact less visible – it still could appear in case of very bright specular pixels. Probably firefly reduction weighting technique could help here. Also 3rd person games would be much less prone to such artifact.

References

[1] http://michaldrobot.files.wordpress.com/2014/08/hraa.pptx

[2] http://www.crytek.com/cryengine/presentations

[3] http://publications.dice.se/attachments/BF3_NFS_WhiteBarreBrisebois_Siggraph2011.pptx

[4] http://graphics.cs.williams.edu/papers/MotionBlurHPG14/

[5] http://advances.realtimerendering.com/s2014/index.html#_NEXT_GENERATION_POST

[6] http://bartwronski.com/2014/04/07/bokeh-depth-of-field-going-insane-part-1/

[7] http://bartwronski.com/2014/03/27/gcn-two-ways-of-latency-hiding-and-wave-occupancy/

[8] http://advances.realtimerendering.com/s2012/index.html

[9] http://www.gdconf.com/news/see_the_world_of_far_cry_4_dec.html

[10] http://www.eurogamer.net/articles/digitalfoundry-2014-vs-far-cry-4

[11] http://graphicrants.blogspot.com/2013/12/tone-mapping.html

[12] http://schedule.gdconf.com/session/fast-iteration-for-far-cry-4-optimizing-key-parts-of-the-dunia-pipeline

[13] http://c0de517e.blogspot.com.es/2012/01/current-gen-dof-and-mb.html

Posted in Code / Graphics | Tagged , , , , , , , , , | 6 Comments

CSharpRenderer Framework update

In couple days I’m saying goodbye to my big desktop PC for several next weeks (relocation), so time to commit some stuff to my CSharpRenderer GitHub repository that was waiting for it for way too long. :)

Startup time optimizations

The goal of this framework was to provide as fast iterations as possible. At first with just few simple shaders it wasn’t a big problem, but when it started growing it became something to address. To speed it up I did following two optimizations:

Gemetry obj file caching

Fairly simple – create a binary instead of loading and processing obj file text every time. On my hd in Debug mode gives up to two seconds of start-up time speed-up.

Multi-tasked shader compilation

Shader compilation (pre-processing and building binaries) is trivialy parallelizable, so I simply needed to make sure it’s stateless and only loading binaries to driver and device happens from a main, immediate context.

I highly recommend .NET Task Parallel Library – it is both super simple and powerful, has very nice syntax with lambdas and allows for complex task dependencies (child tasks, task continuations etc.). It also hides from user problematic thread vs task management (think with tasks and multi-tasking, not multiple threads!). I didn’t use all of its power (like Dataflow features which would make sense), but it is definitely worth taking into consideration when developing any form of multitasking in .NET.

Additional tools for debugging

shapshots_features

I added simple features toggles (auto-registered and auto-reloaded UI) to allow easier turning on-off from within the UI. To provide additional debugging help with this feature and also some other features (like changing a shader when optimizing and checking if anything changed quality-wise and in which parts of the scene) I added option of taking “snapshots” of final image. I supports quickly switching between snapshot and current final image or displaying snapshot vs current image difference. Much faster than reloading a whole shader.

Half resolution / bilateral upsampling helpers

Some helper code to generate offsets texture for bilateral upsampling. For every full res pixel it generates offset information that depending on depth differences between full-res and half-res pixels uses either original bilinear information (offset equal zero) or snaps to edge-bilinear (instead of quad-bilinear) or even point sampling (closest depth) from low resolution texture when depth differences are big. Benefit of doing it this way (not in every upscale shader) is much less shader complexity and potentially performance (when having multiple half res – > full res steps); also less used registers and better occupancy in final shaders.

bilat_upsampling

Physically-correct env LUTs, cubemap reflections and (less correct) screen-space reflections

I added importance sampling based cubemap mip chain generation for GGX distribution and usage of proper environment light LUTs – all based on last year’s Brian Karis Siggraph talk.

I also added very simple screen-space reflections. They are not full performance (reflection calculation code is “simple”, not super-optimized) or quality (noise and temporal smoothing), more as a demonstation of the technique and showing why adding indirect specular occlusion is so important.

Screen-space reflections are temporally supersampled with additional blurring step (source of not being physically correct) and by default look very subtle due to lack of metals or very glossy materials, but still useful for occluding indirect speculars.

without_ssrwith_ssr

As they re-use previous frame lighting buffer we actually get multi-bounce screen-space reflections at cost of increasing temporal smoothing and trailing of moving objects.

Weather to use them or not in a game is something I don’t have a clear opinion on – my views were expressed in one of first posts in this blog. :)

Future

I probably won’t update the framework because of having only MacBook Pro available for at least several weeks / possibly months (unless I need to integrate a critical fix), but I plan to do quite big write-up about my experiences with creating efficient next-gen game post-processing pipeline and optimizing it – and later definitely post some source code. :)

Posted in Code / Graphics | Tagged , , , , , , , | 13 Comments

Review: “Multithreading for Visual Effects”, CRC Press 2014

Today I wrote a short review about a book I bought and read recently – “Multithreading for Visual Effects” published by CRC Press 2014 and including articles by Martin Watt, Erwin Coumans, George ElKoura, Ronald Henderson, Manuel Kraemer, Jeff Lait, James Reinders. Couple friends asked me if I recommend it, so I will try to briefly describe its contents and who I can recommend it for.

BxN4wfoIcAE00_t

What this book is not

This book is a collection of various VFX related articles. It is not meant to be a complete / exhaustive tutorial for designing multi-threaded programs or algorithms or how VFX industry approaches multithreading in general. On the other hand, I don’t really feel it’s just a collection of technical papers / advancements like ShaderX or GPU Pro books are. It doesn’t include very detailed presentation of any algorithm or technique. Rather it is a collection of post-mortems of various studios, groups and people working on a specific piece of technology and how they had to face multi-threading, problems they encountered and how they solved them.

Lots of articles have no direct translation to games or real time graphics – you won’t get any ready-to-use recipe for any specific problem, so don’t expect it.

What I liked about it

I really enjoyed practical aspects of the book – talking about actual problems. Most of the problem comes from the fact that existing code bases contain tons of not threaded / tasked, legacy code with tons of global states and “hacks”. It is trivial to say “just rewrite bad code”, but when talking about technology developed for many years, producing desired results and already deployed in huge studios (seems that VFX studios are often order of magnitude larger than game ones…) obviously it is rarely possible. One article provides very interesting reasoning in whole “refactor vs rewrite” discussion.

Authors are not afraid to talk about such not-perfect code and provide practical information how to fix it and avoid such mistakes in the future. There are at least couple articles that mention best code practices and ideas about code design (like working on contexts, stateless / functional approach, avoiding global states, thinking in tasks etc.).

I also liked that authors provided very clear description of “failures” and practicality of final solutions, what did and what didn’t work and why. Definitely this is something most scientific / academic papers are lacking, but here it was described clearly and will definitely help readers.

Short chapters descriptions

“Introduction and Overview”, James Reinders

Brief introduction in history of hardware, its multi-threading capabilities and why they are so important. Distinction between threading and tasking. Presentation of different parallel computations solutions easily available in C++ – OpenMP, Intel TBB, OpenCL and others. Very good book introduction.

“Houdini, Multithreading existing software”, Jeff Lait

Great article about the problem of multithreading existing, often legacy code bases. Description of best practices when designing multi-threaded/tasked code and how to fix existing, not perfect one (and various kinds of problems / anti-patterns you may face). I can honestly recommend this article to any game or tools programmer.

“The Presto Execution System: Designing for Multithreading”, George ElKoura

Introductory article about threaded systems designs when dealing with animations. Very beneficial for any engine or tools programmers as describes many options for parallelism strategies, their pros and cons. Final applied solution is not really applicable for games run-time, but IMO this article is a still very practical and good read for game programmers.

“LibEE: Parallel Evaluation of Character Rigs”, Martin Watts

Second chapter exclusively about animations, but applicable to any node/graph-based systems and their evaluation. Probably my favorite article in the book because of all the performance numbers, compared approaches and practical details. I really enjoyed its in-depth analysis of several cases, how multi-tasking worked on specific rigs and how content creators can (and probably at some point will have to) optimize their content for optimal and parallel evaluation. The last part is something often not covered by any articles at all.

“Fluids: Simulation on the CPU”, Ronald Henderson

Interesting article describing process of picking and evaluating most efficient parallel data structures and algorithms for specific case of fluids simulation. It is definitely not exhaustive description of fluids simulation problem, but rather example analysis of parallelizing a specific problem – very inspiring.

“Bullet Physics: Simulation with OpenCL”, Erwin Coumans

Introduction to GPGPU with OpenCL with case study of Bullet physics engine. Introduction to rigid body simulation, collision detection (tons of references to great “Real-time collision detection“) nicely overlapping with description of OpenCL, GPGPU / compute simulations and differences between them and classic simulation solutions.

“OpenSubdiv: Interoperating GPU Compute and Drawing”, Manuel Kraemer

IMO the most specialized article. As I’m not an expert on mesh topologies, tessellation and Catmull-Clark surfaces it was for me quite hard to follow. Still, depiction of the title problem is clear and proposed solutions can be understood even by someone who doesn’t fully understand the domain.

Final words / recommendation

I feel that with next-gen and bigger game levels, vertex counts and texture resolutions we need not only better runtime algorithms, but also better content creation and modification pipelines. Tools need to be as responsive as they used to be couple years ago, but this time with order of magnitude bigger data sets to work on. This is the area where we almost converged with the problems VFX industry faces. From discussions with many developers, it seems to be the biggest concern of most game studios at the moment – tools are lagging in development compared to the runtime part and we are just beginning to utilize network caches and parallel, multithreaded solutions.

I always put emphasis on short iteration times (they allow to fit more iterations at the same time, more prototypes that directly translate to better final quality of anything – from core gameplay to textures and lighting), but with such big data sets to process, they would have to grow unless we optimize pipelines for modern workstations. Multi-threading and multi-tasking is definitely the way to go.

Too many existing articles and books either only mentioned parallelization problem, or silently ignored it. “Multithreading for Visual Effects” is very good as it finally describes practical side of designing code for multi-threaded execution.

I can honestly recommend “Multithreading for Visual Effects” to any engine, tools and animations programmers. Gameplay or graphics programmers will benefit from it as well and hopefully it will help them create better quality code that runs efficiently on modern multi-core machines.

Posted in Code / Graphics | Tagged , , , | Leave a comment

Python as scientific toolbox – 8 months later

I started this blog with a simple post about my attempts to find free Mathematica replacement tool for general scientific computing with focus on graphics. At that time I recommended scientific Python and WinPython environment.
Many months have passed, I used lots of numerical Python at home, I used a bit of Mathematica at work and I would like to share my experiences – both good and bad as well as some simple tips to increase your productivity. This is not meant to be any kind of detailed description, guide or even tutorial – so if you are new to Python as scientific toolset, I recommend you to check out great Scientific Python 101 by Angelo Pesce before reading my post.
My post is definitely not exhaustive and is very personal – if you have different experiences or I got something wrong – please comment! :)

Use Anaconda distribution

In my original post I recommended WinPython. Unfortunately, I don’t use it anymore and at the moment I definitely can vote for Anaconda. One quite obvious reason for that is that I started to use MacBook Pro and Mac OSX – WinPython doesn’t work there. I’m not a fan of having different working environments and different software on different machines, so I had to find something working on both Win and MacOSX.

Secondly, I’ve had some problems with WinPython. It works great as a portable distribution (it’s very handy to have it on USB key), but once you want to make it essential part of your computational environment, problems with its registration in system start to appear. Some packages didn’t want to install, some other ones had problems to update and there were conflicts in versions. I even managed to break distro by desperate attempts to make one of packages work.

Anaconda is great. Super easy to install, has tons of packages, automatic updater and “just works”. Its registration with system is also good and “works”. Not all interesting packages are available through its packaging system, but I found no conflicts so far with Python pip, so you can work with both.

At the moment, my recommendation would be – if you have administrative rights on a computer, use Anaconda. If you don’t (working not on your computer), or want to go portable, have WinPython on your USB key – might be handy.

Python 2 / 3 issue is not solved at all

This one is a bit sad and ridiculous – perfect example of what goes wrong in all kinds of open source communities. When someone asks me if they should get Python 2.7+ or 3.4+, I simply don’t have an easy answer – I don’t know. Some packages don’t work with Python 3, some others don’t work with Python 2 anymore. I don’t feel there is any strong push for Python 3, for “compatibility / legacy reasons”… Very weird situation and definitely blocks development of the language.

At the moment I use Python 2, but try to use imports from __future__ and write everything compatible with Python 3, so I won’t have problems if and when I switch. Still, I find lack of push in the community quite sad and really limiting the development/improvement of the language.

Use IPython notebooks

My personal mistake was that for too long I didn’t use the IPython and its amazing notebook feature. Check out this presentation, I’m sure it will convince you. :)

I was still doing oldschool code-execute-reload loop that was hindering my productivity. With Sublime Text and Python registered in the OS it is not that bad, but still, with IPython you can get way better results. Notebooks provide interactivity maybe not as good as Mathematica, but comparable to and much better than regular software development loop. You can easily re-run, change parameters, debug, see help and profile your code and have nice text, TeX or image annotations. IPython notebooks are easy to share, store and to come back to later.

Ipython as shell is also quite ok itself – even as environment to run your scripts from (with handy profiling macros, help or debugging).

NumPy is great and very efficient…

NumPy is almost all you need for your basic numerical work. SciPy linear algebra packages (like distance arrays, least squares fitting or other regression methods) provide almost everything else. :) For stuff like Monte Carlo, numerical integration, pre-computing some functions and many others I found it sufficient and performing very well. Slicing and indexing options can be not obvious at beginning, but once you get some practice they are very expressive. Big volume operations can boil down to a single expression with implicit loops over many elements that are internally written in efficient C. If you ever worked with Matlab / Octave you will feel very comfortable with it – to me it is definitely more readable than weird Mathematica syntax. Also interfacing with file operations and many libraries is trivial – Python becomes expressive and efficient glue code.

…but you need to understand it and hack around silent performance killers

On the other hand, using NumPy very efficiently requires quite deep understanding of its internal way of working. This is obvious and true in case of any programming language, environment or algorithm – but unfortunately in case of numerical Python it can be very counter-intuitive. I won’t cover examples here (you can easily find numerous tutorials on numpy optimizations), but often writing efficient code means writing not very readable and not self-documenting code. Sometimes there are absurd situations like some specialized functions performing worse than generic ones, or need to write incomprehensible hacks (funniest one was suggestion to use complex numbers as most efficient way for simple Euclidean distance calculations)… Hopefully after couple numerically heavy scripts you will understand when NumPy does internal copies (and it does them often!), that any Python iteration over elements will kill your perf, that you need to try to use implicit loops and slicing etc.

There is no easy way to use multiple cores

Unfortunately, multithreading, multitasking and parallelism are simply terrible in Python. Whole language wasn’t designed to be multitasked / multithreaded and Global Interpreter Lock as part of language design makes it a problem almost impossible to solve. Even if most NumPy code releases GIL, there is quite a big overhead from doing so and other threads becoming active – you won’t notice big speed-ups if you don’t have really huge volumes of work done in pure, single NumPy instructions. Every single line of Python glue-code will become a blocking, single-threaded path. And according to Amdahl’s law, it will make any massive parallelism impossible. You can try to work around it using multiprocessing – but in such case it is definitely more difficult to pass and share data between processes. I haven’t researched it exhaustively – but anyway no simple / annotation based (like in OpenMP / Intel TBB) solution exists.

SymPy cannot serve as replacement for Mathematica

I played with SymPy just several times – it definitely is not any replacement for symbolic operations in Mathematica. It works ok for symbol substitution, trivial simplification or very simple integrals (like regular Phong normalization), but for anything more complex (normalizing Blinn-Phong level… yeah) it doesn’t work – after couple minutes (!) of calculations produces no answer. Its syntax is definitely not as friendly for interactive work like Mathematica as well. So for symbolic work it’s not any replacement at all and isn’t very useful. One potential benefit of using it is that it embeds nicely and produces nice looking results in IPython notebooks – can be good for sharing them.

No very good interactive 3D plotting

There is matplotlib. It works. It has tons of good features

…But its interactive version is not embeddable in IPython notebooks, 3D plotting runs very slow and is quite ugly. In 2D there is beautiful Bokeh generating interactive html files, but nothing like that for 3D. Nothing on Mathematica level.

I played a bit with Vispy – if they could create as good WebGL backend for IPython notebooks like they promise, I’m totally for it (even if I have to code visualizations myself). Until then it is “only” early stage project for quickly mapping between numerical Python data and simple OpenGL code – but very cool and simple one, so it’s fun to play with it anyway. :)

There are packages for (almost) everything!

Finally, while some Python issues are there and I feel won’t be solved in the near future (multithreading), situation is very dynamic and changes a lot. Python becomes standard for scientific computing and new libraries and packages appear every day. There are excellent existing ones and it’s hard to find a topic that wasn’t covered yet. Image processing? Machine learning? Linear algebra? You name it. Just import proper package and adress the problem you are trying to solve, not wasting your time on coding everything from scratch or integrating obscure C++ libraries.
Therefore I really believe it is worth investing your time in learning it and adapting to your workflow. I wish it became standard for many CS courses at universities instead of commercial Matlab, poorly interfaced Octave or professors asking students to write whole solutions in C++ from scratch. At least in Poland they definitely need more focus on problems, solutions and algorithms, not on coding and learning languages…

Posted in Code / Graphics | Tagged , , , , , , | 1 Comment