Hair rendering trick(s)

I didn’t really plan to write this post as I’m quite busy preparing for Siggraph and enjoying awesome Montreal summer, but after 3 similar discussion with friends developers I realized that the simple hair rendering trick I used during the prototyping stage at CD Projekt Red for Witcher 3 and Cyberpunk 2077 (I have no idea if guys kept that though) is worth sharing as it’s not really obvious. It’s not about hair simulation or content authoring, I’m not really competent to talk about those subjects and it’s really well covered in AMD Tress FX or nVidia HairWorks (plus I know that lots of game rendering engineers work on that topic as well), so check them out if you need awesome looking hair in your game. The trick I’m going to cover is to improve quality of typical alpha-tested meshes used in deferred engines. Sorry, but no images in this post though!

Hair rendering problems

There are usually two problems associated with hair rendering that lot of games and game engines (especially deferred renderers) struggle with.

  1. Material shading
  2. Aliasing and lack of transparency

First problem is quite obvious – hair shading and material. Using standard Lambertian diffuse and Blinn/Blinn-Phong/microfacet specular models you can’t get proper looks of hair, you need some hair specific and strongly anisotropic model. Some engines try to hack some hair properties into the G-Buffer and use branching / material IDs to handle it, but as recently John Hable wrote in his great post about needs for forward shading – it’s difficult to get hair right fitting those properties into G-Buffer.

I’m also quite focused on performance, love low-level and analyzing assembly and it just hurts me to see branches and tons of additional instructions (sometimes up to hundreds…) and registers used to branch for various materials in the typical deferred shading shader. I agree that the performance impact can be not really significant compared to bandwidth usage on fat GBuffers and complex lighting models, but still it’s the cost that you pay for whole screen even though hair pixels don’t occupy too much of the screen area.

One of tricks we used on The Witcher 2 was faking hair specular using only dominant light direction + per character cube-maps and applying it as “emissive” mesh lighting part. It worked ok only because of really great artists authoring those shaders and cube-maps, but I wouldn’t say it is an acceptable solution for any truly next-gen game.

Therefore hair really needs forward shading – but how to do it efficiently and not pay the usual overdraw cost and combine it with deferred shading?

Aliasing problem.

A nightmare of anyone using alpha-tested quads or meshes with hair strands for hair. Lots of games can look just terrible because of this hair aliasing (the same applies for foliage like grass). Epic proposed to fix it by using MSAA, but this definitely increases the rendering cost and doesn’t solve all the issues. I tried to do it using alpha-to-coverage as well, but the result was simply ugly.

Far Cry 3 and some other games used screen-space blur on hair strands along the hair tangenta and it can improve the quality a lot, but usually end parts of hair strands either still alias or bleed some background onto hair (or the other way around) in non-realistic manner.

Obvious solution here is again to use forward shading and transparency, but then we will face other family of problems: overdraw, composition with transparents and problems with transparency sorting. Again, AMD Tress FX solved it completely by using order-independent transparency algorithms on just hair, but the cost and effort to implement it can be too much for many games.

Proposed solution

The solution I tried and played with is quite similar to what Crytek described that they tried in their GDC 2014 presentation. I guess we prototyped it independently in similar time frame (mid-2012?). Crytek presentation didn’t dig too much into details, so I don’t know how much it overlaps, but the core idea is the same. Another good reference is this old presentation from Scheuermann from ATI at GDC 2004! Their technique was different and based only on forward shading pipeline, not aimed to combined with deferred shading – but the main principle of multi pass hair rendering and treating transparents and opaque parts separately is quite similar. Thing worth noting is that with DX11 and modern GPU based forward lighting techniques it became possible to do it much easier. :)

Proposed solution is a hybrid of deferred and forward rendering techniques to solve some problems with it. It is aimed for engines that still rely on hair alpha tested stripes for hair rendering, have fluent alpha transition in the textures, but still most of hair strands are solid, not transparent and definitely not sub-pixel (then forget about it and hope you have the perf to do MSAA and even supersampling…). You also need to have some form of forward shading in your engine, but I believe that’s the only way to go for the next gen… Forward+/clustered shading is a must for material variety and properly lit transparency – even in mainly deferred rendering engines. I really believe in advantages of combining deferred and forward shading for different rendering scenarios within a single rendering pipeline.

Let me describe first proposed steps:

  1. Render your hair with full specular occlusion / zero specularity. Do alpha testing in your shaders with value Aref close to 1.0. (Artist tweakable).
  2. Do your deferred lighting passes.
  3. Render forward pass of hair speculars with no alpha blending, z testing set to “equal”. Do the alpha testing exactly like in step 1.
  4. Render forward pass of hair specular and albedo for hair transparent part with alpha blending (scaled from 0 to Aref to 0-1 range), inverse alpha test (1-Aref) and regular depth test.

This algorithm assumes that you use regular Lambertian hair diffuse model. You can easily swap it, feel free to modify point 1 and 3 and first draw black albedo into G-Buffer and add the different diffuse model in step 3.

 Advantages and disadvantages

There are lots of advantages of this trick/algorithm – even with non-obvious hair mesh topologies I didn’t see any problems with alpha sorting – because alpha blended areas are small and usually on top of solid geometry. Also because most of the rendered hair geometry writes depth values it works ok with particles and other transparents. You avoid hacking of your lighting shaders, branching and hardcore VGPR counts. You have smooth and aliasing-free results and a proper, any shading model (not needing to pack material properties). It also avoids any excessive forward shading overdraw (z-testing set to equal and later regular depth testing on almost complete scene). While there are multiple passes, not all of them need to read all the textures (for example no need to re-read albedo after point 1 and G-Buffer pass can use some other normal map and no need to read specular /gloss mask). The performance numbers I had were really good – as hair covers usually very small part of the screen except for cutscenes – and proposed solution meant zero overhead/additional cost on regular mesh rendering or lighting.

Obviously, there are some disadvantages. First of all, there are 3 geometry passes for hair (one could get them to 2, combining points 3 and 4, but getting rid of some of advantages). It can be too much, especially if using some spline/tessellation based very complex hair – but this is simply not an algorithm for such cases, they really do need some more complex solutions… Again, see Tress FX. There can be a problem of lack of alpha blending sorting and later problems with combining with particles – but it depends a lot on the mesh topology and how much of it is alpha blended. Finally, so many passes complicate renderer pipeline and debugging can be problematic as well.


Bonus hack for skin subsurface scattering

As a bonus description how in a very similar manner we hacked skin shading in The Witcher 2.

We couldn’t really separate our speculars from diffuse into 2 buffers (already way too many local lights and big lighting cost, increasing BW on those passes wouldn’t help for sure). We didn’t have ANY forward shading in Red Engine at the time as well! For skin shading I really wanted to do SSS without blurring neither albedo textures nor speculars. Therefore I came up with following “hacked” pipeline.

  1. Render skin texture with white albedo and zero specularity into G-Buffer.
  2. During lighting passes always write specular not modulated by specular color and material properties into the alpha channel (separate blending) of lighting buffer.
  3. After all lights we had diffuse response in RGB and specular response in A – only for skin.
  4. Do a typical bilateral separable screen space blur (Jimenez) on skin stencil-masked pixels. For masking skin I remember trying both 1 bit from G-Buffer or “hacking” test for zero specularity/white albedo in the G-Buffer – both worked well, don’t remember which version we shipped though.
  5. Render skin meshes again – multiplying RGB from blurred lighting pixels by albedo and adding specularity times the specular intensity.

The main disadvantage of this technique is losing all specular color from lighting (especially visible in dungeons), but AFAIK there was a global, per-environment artist specified specular color multiplier value for skin. A hack, but it worked. Second, smaller disadvantage was higher cost of SSS blur passes (more surfaces to read to mask the skin).

In more modern engines and current hardware I honestly wouldn’t bother, do separate lighting buffers for diffuse and specular responses instead, but I hope it can inspire someone to creatively hack their lighting passes. :)







[6] “Forward+: Bringing Deferred Lighting to the Next Level” Takahiro Harada, Jay McKee, and Jason C.Yang

[7] “Clustered deferred and forward shading”, Ola Olsson, Markus Billeter, and Ulf Assarsson

[8] “Screen-Space Perceptual Rendering of Human Skin“, Jorge Jimenez, Veronica Sundstedt, Diego Gutierrez

[9] “Hair Rendering and Shading“, Thorsten Scheuermann, GDC 2004

Posted in Code / Graphics | Tagged , , , , , | 3 Comments

C#/.NET graphics framework on GitHub + updates

As I promised I posted my C#/.NET graphics framework (more about it and motivation behind it here) on GitHub:

This is my first GitHub submit ever and my first experience with Git, so there is possibility I didn’t do something properly – thanks for your understanding!

List of changes since initial release is quite big, tons of cleanup + some crashfixes in previously untested conditions, plus some features:

Easy render target management

I added helper functions to manage lifetime of render targets and allow render target re-use. Using render target “descriptors” and RenderTargetManager you request a texture with all RT and shader resource views and it is returned from a pool of available surfaces – or lazily allocated when no surface fitting given descriptor is available. It allows to save some GPU memory and makes sure that code is 100% safe when changing configurations – no NULL pointers when enabling not enabled previously code paths or adding new ones etc.

I also added very simple “temporal” surface manager - that for every surface created with it stores N different physical textures for requested N frames. All temporal surface pointers are updated automatically at beginning of a new frame. This way you don’t need to hold states or ping-pong in your rendering passes code and code becomes much easier to follow eg.:

RenderTargetSet motionVectorsSurface = TemporalSurfaceManager.GetRenderTargetCurrent("MotionVectors");
RenderTargetSet motionVectorsSurfacePrevious = TemporalSurfaceManager.GetRenderTargetHistory("MotionVectors");
m_ResolveMotionVectorsPass.ExecutePass(context, motionVectorsSurface, currentFrameMainBuffer);

Cubemap rendering, texture arrays, multiple render target views

Nothing super interesting, but allows to much more easily experiment with algorithms like GI (see following point). In my backlog there is a task to add support for geometry shader and instancing for amplification of data for cubemaps (with proper culling etc.) that should speed it up by order of magnitude, but wasn’t my highest priority.

Improved lighting – GI baker, SSAO

I added 2 elements: temporally supersampled SSAO and simple pre-baked global illumination + fully GPU-based naive GI baker. When adding those passes I was able to really stress my framework and check if it works as it is supposed to – and I can confirm that adding new passes was extremely quick and iteration times were close to zero - whole GI baker took me just one evening to write.


GI is stored in very low resolution, currently uncompressed volume textures - 3 1MB R16 RGBA surfaces storing incoming flux in 2nd order SH (not preconvolved with cosine lobe – not irradiance). There are some artifacts due to low resolution of volume (64 x 32 x 64), but for cost of 3MB for such scene I guess it’s good enough. :)

It is calculated by doing cubemap capture at every 3d grid voxel, calcularing irradiance for every texel and projecting it onto SH. I made sure (or I hope so! ;) but seems to converge properly) it is energy conserving, so N-bounce GI is achieved by simply feeding previous N-1 bounce results into GI baker and re-baking the irradiance. I simplified it (plus improved baking times – converges close to asymptotic value faster) even a bit more, as baker uses partial results, but with N -> oo it should converge to the same value and be unbiased.

It contains “sky” ambient lighting pre-baked as well, but I will probably split those terms and store separately, quite possibly at a different storage resolution. This way I could simply “normalize” the flux and make it independent of sun / sky color and intensity. (it could be calculated in the runtime). There are tons of other simple improvements (compressing textures, storing luma/chroma separately in different order SH, optimizing baker etc) and I plan to gradually add them, but for now the image quality is very good (as for something without normalmaps and speculars yet ;) ).

Improved image quality – tone-mapping, temporal AA, FXAA

Again nothing that is super-interesting, rather extremely simple and usually unoptimal code just to help debugging other algorithms (and make their presentation easier). Again adding such features was matter of minutes and I can confirm that my framework succeeds so far in its design goal.

Constant buffer constants scripting

A feature that I’m not 100% happy with.

For me when working with almost anything in games – from programming graphics and shaders through materials/effects to gameplay scripting the biggest problem is finding proper boundaries between data and code. Where splitting point should be? Should code drive data, or the other way around. From multiple engines I have worked on (RedEngine, Anvil/scimitar, Dunia plus some very small experience just to familiarize myself with CryEngine, UnrealEngine 3, Unity3D) in every engine it was in a different place.

Coming back to shaders, usually tedious task is putting some stuff on the engine side in code, and some in the actual shaders while both parts must mach 100%. It not only makes it more difficult to modify some of such stuff, adding new properties, but also harder to read and follow code to understand the algorithms as it is split between multiple files not necessarily by functionality, but for example performance (eg. precalculate stuff on CPU and put into constants).

Therefore my final goal would be to have one meta shader language and using some meta decorators specify frequency of every code part – for example one part should be executed per frame, other per viewport, other per mesh, per vertex, per pixel etc. I want to go in this direction, but didn’t want to get myself into writing parsers and lexers and temporarily I used LUA (as extremely fast to integrate and quite decently performing).

Example would be one of my constant buffer definitions:

cbuffer PostEffects : register(b3)
 /// Bokeh
 float cocScale; // Scripted
 float cocBias; // Scripted
 float focusPlane; // Param, Default: 2.0, Range:0.0-10.0, Linear
 float dofCoCScale; // Param, Default: 0.0, Range:0.0-32.0, Linear
 float debugBokeh; // Param, Default: 0.0, Range:0.0-1.0, Linear
 focusPlaneShifted = focusPlane + zNear
 cameraCoCScale = dofCoCScale * screenSize_y / 720.0 -- depends on focal length & aperture, rescale it to screen res
 cocBias = cameraCoCScale * (1.0 - focusPlaneShifted / zNear)
 cocScale = cameraCoCScale * focusPlaneShifted * (zFar - zNear) / (zFar * zNear)

We can see that 2 constant buffer properties are scripted – there is zero code on C# side that would calculate it like this, instead a LUA script is executed every frame when we “compile” constant buffer for use by the GPU.

UI grouping by constant buffer

Simple change to improve readability of UI. Right now the UI code is the most temporary, messy part and I will change it completely for sure, but for the time being I focused on the use of it.


Further hot-swap improvements

Right now everything in shader files and related to shaders is hot-swappable – constant buffer definitions, includes, constant scripts. Right now I can’t imagine working without it, definitely helps iterating faster.

Known issues / requirements

I was testing only x64 version, 32 bit could be not configured properly and for sure is lacking proper dll versions.

One known issue (checked on a different machine with Windows 7 / x64 / VS2010) is runtime exception complaining about lack of “lua52.dll” – it is probably caused by lack of Visual Studio 2012+ runtime.

Future plans

While I update stuff every week/day in my local repo, I don’t plan to do any public commits (except for something either cosmetic, or serious bug/crash fix) till probably late August. I will be busy preparing for my Siggraph 2014 talk and plan to release source code for the talk using this framework as well.

Posted in Code / Graphics | Tagged , , , , , | Leave a comment

Coming back to film photography

Yeah, finally I managed to go back to my past-time favourite hobby – film/analog photography that I started when I was 10 years old with following camera:


Now I’m a bit older and my photo gear has changed as well (but I really miss this camera!). :) So I’m using at the moment:


Why film and not digital? Don’t get me wrong. I love digital photography for its quality, ease of use and possibility to document events and reality. It’s also very convenient on holiday (especially something small like my Fuji X100). However, lots of people (including me) find it easier to take more “artistic”/better aesthetic quality photos when working with film, especially on medium format – just due to the fact that you have 10, 12 or 15 (depending if it’s 645, 6×6 or 6×7) photos you think about every shot, composition and try to make best ones. Also shooting B&W is quite interesting challenge, as we are easily attracted to colors and shoot photos based on them, while in B&W it’s impossible and you have to look for interesting patterns, geometric elements, surface of objects and relations between them. Interesting way to try to “rewire” your brain and sense of aesthetics and learn a new skill.

Finally, developing your own film by yourself is amazing experience – you spend an hour in the darkroom, fully relaxed carefully treat film and obey all the rules and still you don’t know what will be the outcome, maybe no photo will be good at all. Great and relaxing experience for all OCD programmer guys. ;)


Some photos from just awesome Montreal summer – nothing special, just a test roll of Mamiya I brought from Poland (and it turns out it underexposes, probably old battery, will need to calibrate it properly with light meter…).

5162-015 5162-014 5162-011 5162-007 5162-006 5162-005 5162-004 5162-003 5162-002

Posted in Travel / Photography | Tagged , , , , , , | Leave a comment

Runtime editor-console connection in The Witcher 2

During Digital Dragons and tons of inspiring talks and discussions I’ve been asked by one Polish game developer (he and his team are doing quite cool early-access Steam economy/strategy game about space exploration programmes that you can check out here) to write a bit more about the tools we had for connectivity between game editor and final game running on a console on The Witcher 2. As increasing productivity and minimizing iteration times is one of my small obsessions, (I believe that fast iteration times, big productivity and efficient and robust pipelines are much much more important than adding tons of shiny features) I agreed that it is quite cool topic to write about. :) While I realize that probably lots of other studios have similar pipelines, it is still a cool topic to talk about and multiple other (especially smaller) developers can benefit from it. As I don’t like sweeping problems under the carpet, I will discuss disadvantages and limitations of the solution we had at CD Projekt RED at that time.

Early motivation

Xbox 360 version of The Witcher 2 was first console game done 100% internally by CD Projekt RED. At that time X360 was already almost 7 years old and far behind the capabilities of modern PCs, for which we developed the game in the beginning. Whole studio – artists, designers and programmers were aware that we will need to cut down and change really lots of stuff to make game running on consoles – but have to do wisely not to sacrifice the high quality of players experience that our game was known for. Therefore programmers team apart from porting and optimizing had to design and implement multiple tools to aid the porting process.

Among multiple different tools, a need for connection between game editor and consoles appeared.  There were 2 specific topics that made us consider doing such tools:

Performance profiling and real-time tweaking on console

PC version sometimes had insane amounts of localized lights. If you look at following scene – one of game opening scenes, at specific camera angles it had up to 40 smaller or bigger localized deferred lights on a PC – and there were even heavier lit scenes in our game! 


Yeah, crazy, but how was it even possible?

Well, our engine didn’t have any kind of Global Illumination or baking solution, one of early design decisions was that we wanted to have everything dynamic, switchable, changeable (quite important for such nonlinear game – most locations had many “states” that depended on game progress and player’s decision), animated.

Therefore, GI was faked by our lighting and environment artists by placing many lights of various kinds – additive, modulative, diffuse-only, specular-only, character or env-only with different falloffs, gobo lights, different types of animation on both light brightness and position (for shadow-casting lights it gives this awesome-looking torches and candles!) etc. Especially interesting ones were “modulative” lights that were subtracting energy from the scene to fake large radius AO / shadows – doing such small radius modulative light will be cheaper than rendering a shadowmap and gives nice, soft light occlusion.

All of this is totally against current trend of doing everything “physically-correct” and while I see almost only benefits of PBR approach and believe in coherency etc, I also trust great artists and believe they can also achieve very interesting results when crossing those physical boundaries and have “advanced mode” magical knobs and tweaks for special cases – just like painters and other artists that are only inspired by reality. 

Anyway, having 40+ lights on screen (very often overlapping and causing massive lighting overdraw) was definitely a no-go on X360, even after we optimized our lighting shaders and pipelines a lot. It was hard for our artists to decide which lights should be removed, which ones add significant cost (large overdraw / covered area). Furthermore, they wanted to be able to decide in which specific camera takes big lighting costs were acceptable – even 12ms of lighting is acceptable if whole scene mesh rendering took under 10ms – to make game as beautiful as possible we had flexible and scene-dependent budgets.

All of this would be IMHO impossible to simulate with any offline tools – visualizing light overdraw is easy, but seeing final cost together with the scene drawing cost is not. Therefore we decided that artists need a way to tweak, add, remove, move and change lights in the runtime and see changes in performance immediately on screen and to create tools that support it.

Color precision and differences

Because of many performance considerations on x360 we went with RGBA 1010102 lighting buffer (with some exp bias to move it to “similar range” like on PC). We also changed our color grading algorithms, added filmic tone mapping curve and adapted gamma curves for TV display. All of this had simply devastating effect on our existing color precision – especially moving from 16bit lighting to 10 bit and having multiple lighting, fog and particle passes – as you might expect, the difference was huge. Also our artist wanted to have some estimate of how the game will look on TVs, with different and more limited color range etc. – on a PC version most of them used high quality, calibrated monitors to achieve consistency of texturing and color work in the whole studio. To both have a preview of this look on TV while tweaking color grading values and to fight the banding, again they wanted to have live preview of all of their tweaks and changes in the runtime. I think it was easier way to go (both in terms of implementation and code maintenance time), than trying to simulate looks of x360 code path in the PC rendering path.

Obviously, we ended up with many more benefits that I will try to summarize.

Implementation and functionality

To implement this runtime console-editor connection, we wrote a simple custom command-based network protocol. 

Our engine and editor already had support for network-based debugging for scripting system. We had a custom, internally written C-like scripting system (that automatically extended the RTTI, had access to all of the RTTI types, was aware of game saving/loading and had a built-in support for state machines – in general quite amazing piece of code and well-designed system, probably worth some more write-up). This scripting system had even its own small IDE, debugger with breakpoints and a sampling profiler system. 

Gameplay programmers and script designers would connect with this IDE to running editor or game, could debug anything or even hot-reload all of the scripts and see the property grid layout change in the editor if they added/removed or renamed a dynamic property! Side note: everyone experienced with complex systems maintenance can guess how often those features got broken or crashed the editor after even minor changes… Which is unfortunate – as it discouraged gameplay scripters from using those features, so we got less bug reports and worked on repairing it even less frequently… Lesson learned is as simple as my advice – if you don’t have a huge team to maintain every single feature, KISS.

Having already such network protocol with support for commands sent both ways, it was super-easy to open another listener on another port and start listening to different types of messages! 

To get it running and get first couple of commands implemented I remember it took only around one day. :) 

So let’s see what kinds of commands we had:

Camera control and override

Extremely simple – a command that hijacked in-game camera. After the connection from editor and enabling camera control, every in-editor camera move was just sent with all the the camera properties (position, rotation, near/far planes and FOV) and got serialized through the network.

Benefits from this feature were that it not only made easier working with all the remaining features – it also allowed debugging streaming, checking which objects were not present in final build (and why) and in general our cooking/exporting system debugging. If something was not present on the screen in final console build, artist or level designer could analyze why – whether it is also not present in the editor, does it have proper culling flags, is it assigned to a proper streaming layer etc. – and either fix it, or assign a systemic bug to programmers team.

Loading streaming groups / layers

Simple command that send a list of layers or layer groups to load or unload (while they got un/loaded in the editor), passed directly to the streaming system. Again allowed performance debugging and profiling of the streaming and memory cost – to optimize CPU culling efficiency, minimizing memory cost of loaded objects that were not visible etc.

While in theory something cool and helpful, I must admit that this feature didn’t work 100% as expected and wasn’t very useful and used commonly in practice for those goals. It was mostly because lots of our streaming was affected by hiding/unhiding layers by various gameplay conditions. As I mentioned, we had very non-linear game and streaming was also used for achiving some gameplay goals. I think that it was kind of a misconception and bad design of our entity system (lack of proper separation of objects logic and visual representation), but we couldn’t change it for Xbox 360 version of Witcher 2 easily.

Lights control and spawning

Another simple feature. We could spawn in the runtime new lights, move existing ones and modify most of their properties – radius, decay exponent, brightness, color, “enabled” flag etc. Every time a property of a light was modified or new light component was added to a game world, we sent a command over network that replicated such event on console.

A disadvantage of such simple replication was that if we restarted the game running on console, we would lose all those changes. :( In such case either save + re-export (so cooking whole level again) or redoing those changes was necessary.

Simple mesh moving

Very similar to the previous one. We had many “simple” meshes in our game (that didn’t have any gameplay logic attached to them) that got exported to a special, compressed list, to avoid memory overhead of storing whole entities and entity templates and they could be moved without the need of re-exporting whole level. As we used dynamic occlusion culling and scene hierarchy structure – a beam-tree, therefore we didn’t need to recompute anything, it just worked.

Environment presets control

The most complex feature. Our “environment system” was a container for multiple time-of-day control curves for all post-effects, sun and ambient lighting, light groups (under certain mood dynamic lights had different colors), fog, color grading etc. It was very complex as it supported not only dynamic time of day, but multiple presets being active with different priorities and overriding specific values only per environment area. To be able to control final color precision on x360 it was extremely important to allow editing them in the runtime. IIRC when we started editing them while in the console connection mode, whole environment system on console got turned off and we interpolated and passed all parameters directly from the editor.

Reloading post-process (hlsl file based) shaders

Obvious, simple and I believe that almost every engine has it implemented. For me it is obligatory to be able to work productively, therefore I understand how important it is to deliver similar functionalities to teams other than graphics programmers. :)

What kind of useful features we lacked

While our system was very beneficial for the project and seeing its benefits in every next project in any company I will opt for something similar, we didn’t implement many other features that would be as helpful.

Changing and adding pre-compiled objects

Our system didn’t support adding or modifying any objects that got pre-compiled during export – mostly meshes and textures. It could be useful to quickly swap textures or meshes in the runtime (never-ending problems with dense foliage performance anyone? :) so far the biggest perf problem on any project I worked on), but our mesh and texture caches were static. It would require partial dynamism of those cache files and system + adding more support for export in editor (for exporting we didn’t use the editor, but a separate “cooker” process).

Changing artist-authored materials

While we supported recompiling hlsl based shaders used for post-effects, our system didn’t support swapping artist-authored particle or material shaders. Quite similar to the previous one – we would need to add more dynamism to the shader cache system… Wouldn’t be very hard to add if we weren’t already late in “game shipping” mode.

Changing navmesh and collision

While we were able to move some “simple” static objects, the navmesh and gameplay collision didn’t change. It wasn’t a very big deal – artists almost never played on those modified levels – but it could make life of level and quest designers much easier – just imagine when having a “blocker” or wrong collision on a playthrough quick connection with editor, moving it and immediately checking the result – without the need to restart whole complex quest or starting it in the editor. :)

Modifying particle effects

I think that being able to change particle system behaviors, curves and properties in the runtime would be really useful for FX artists. Effects are often hard to balance – there is a very thin line of compromise between the quality and performance due to multiple factors – resolution of the effect (half vs full res), resolution of flipbook textures, overdraw, alpha value and alpha testing etc. Being able to tweak such properties on a paused game during for instance explosion could be a miracle cure for frame timing spikes during explosions, smoke or spell effects. Still, we didn’t do anything about it due to complexity of particle systems in general and multiple factors to take into account… I was thinking about simply serializing all the properties, replicating them over the network and deserializing them – would work out of the box – but there was no time and we had many other, more important tasks to do.

Anything related to dynamic objects

While our system worked great on environment objects, we didn’t have anything for the dynamic objects like characters. To be honest, I’m not really sure if it would be possible to implement easily without doing a major refactor on many elements. There are many different systems that interact with each other, many global managers (which may not be the best “object-oriented” design strategy, but often are useful to create acceleration structures and a part of data/structure oriented design), many objects that need to have state captured, serialized and then recreated after reloading some properties – definitely not an easy task, especially under console memory constraints. Nasty side effect of this lack was something that I mentioned – problems with modifying semi-dynamic/semi-static objects like doors, gameplay torches etc.

Reloading scripts on console

While our whole network debugging code was designed in the first place to enable scripts reloading between the editor and a scripting IDE, it was impossible to do it on console the way it was implemented. Console version of the game had simplified and stripped RTTI system that didn’t support (almost) any dynamism and moving there some editor code would mean de-optimizing runtime performance. It could be a part of a “special” debug build, but the point of our dynamic console connection system was to be able to connect it simply to any running game. Also again capturing state while RTTI gets reinitialized + scripts code reloaded could be more difficult due to memory constraints. Still, this topic quite fascinates me and would be kind of ultimate challenge and goal for such connection system.


While our system was lacking multiple useful features, it was extremely easy and fast to implement (couple days total?). Having an editor-console live connection is very useful and I’m sure that time spent developing it paid off multiple times. It provides much more “natural” and artist-friendly interface than any in-game debug menus, allows for faster work and implementing much more complex debug/live editing features. It not only aids debugging as well as optimization, but if it was a bit more complex, it could even accelerate the actual development process. When your iteration times on various game aspects get shorter, you will be able to do more iterations on everything – which gives you not only more content in the same time/for the same cost, but also much more polished, bug-free and fun to play game! :)

Posted in Code / Graphics | Tagged , , , , , , | Leave a comment

Digital Dragons 2014 slides

This Friday I gave a talk on Digital Dragons 2014.

It was a presentation with tons of new, unpublished content and details about our:

  • Global Illumination solution – full description of baking process, storing data in 2D textures and runtime application
  • Temporal supersampled SSAO
  • Multi resolution ambient occlusion by adding “World Ambient Occlusion” (Assassin’s Creed 3 technique)
  • Procedural rain ripple effect using compute and geometry shaders
  • Wet surfaces materials approximation
  • How we used screenspace reflections to enhance look of wet surfaces
  • GPU driven rain simulation
  • Tons of videos and debug displays of every effect and procedural textures!

If you have seen my GDC 2014 talk, then probably still there is lots of new content for you – I tried to avoid reusing my GDC talk contents as much as possible.


Here (and on publications page) are my slides for Digital Dragons 2014 conference:

PPTX Version, 226MB - but worth it (tons of videos!)

PPTX Version with extremely compressed videos, 47MB

PDF Version with sparse notes, 6MB

PDF Version, no notes, 7MB



Posted in Code / Graphics | Tagged , , , , , , | 2 Comments

Temporal supersampling pt. 2 – SSAO demonstration

This weekend I’ve been working on my Digital Dragons 2014 presentation (a great Polish game developers conference I was invited to – if you will be somewhere around central Europe early May be sure to check it out) and finally got to take some screenshots/movies of temporal supersampling in action on SSAO. I promised to take them quite a while ago in my previous post about temporal techniques and almost forgot. :)

To be honest, I never really had time to “benchmark” properly its quality increase when developing for Assassin’s Creed 4 – it came very late in the production, actually for a title update/patch – in the same patch as increased PS4 resolution and our temporal anti-aliasing. I had motion vectors so I simply plugged it in, tweaked params a bit, double-checked the profilers, asked other programmers and artists to help me assess increase in quality (everybody was super happy with it) and review it, gave for full testing and later submitted.

Now I took my time to do proper before-after screenshots and the results are surprising ever for me.

Let’s have a look at comparison screenshots:


Scalable Ambient Obscurance without temporal supersampling / smoothing

Scalable Ambient Obscurance without temporal supersampling / smoothing

Scalable Ambient Obscurance with temporal supersampling / smoothing

Scalable Ambient Obscurance with temporal supersampling / smoothing

On a single image with contrast boosted (click it to see in higher res):

Scalable Ambient Obscurance with/without temporal supersampling - comparison

Scalable Ambient Obscurance with/without temporal supersampling – comparison

Quite decent improvement (if we take into account a negligible runtime cost), right? We see that ugly pattern / noise around foliage disappeared and undersampling behind the ladder became less visible.

But it’s nothing compared to to how it behaves in motion – be sure to watch it in fullscreen!

(if you see poor quality/compression on wordpress media, check out direct link)

I think that in motion the difference is huge and orders of magnitude better! It fixes all the issues typical to the SSAO algorithms that happen because of undersampling. I will explain in a minute why it gets so much better in motion.

You can see on the video some artifacts (minor trailing / slow appearance of some information), but I don’t know if I would notice them not knowing what to look for (and with applied lighting, our SSAO was quite subtle – which is great and exactly how SSAO should look like – we had great technical art directors :) ).

Let’s have a look what we have done to achieve it.


Algorithm overview

Our SSAO was based on Scalable Ambient Obscurance algorithm by McGuire et al. [1]

The algorithm itself has a very good console performance (around 1.6ms on consoles for full res AO + two passes of bilateral blur!), decent quality and is able to calculate ambient obscurance of quite high radius (up to 1.5m in our case) with fixed performance cost. Original paper presents multiple interesting concepts / tricks, so be sure to read it!

We plugged our temporal supersampling to the AO pass of algorithm – we used 3 rotations of SSAO sampling pattern (that was unique for every per screen space pixel position) alternating every frame (so after 3 frames you got the same pattern).

To combine them, we simply took previous ssao buffer (so it became effectively accumulation texture), took offset based on motion vectors, read it and after deciding on rejection or acceptance (smooth weight) combined them together with a fixed exponential decay (weight of 0.9 for history accumulation buffer on acceptance, it got down to zero on rejection) and output the AO.

For a static image it meant tripling the effective sample count and supersampling – which is nice. But given the fact that every screen space pixel has a different sampling pattern it meant that number of samples contributing to the final image when moving game camera could be hundreds of times higher! With camera moving and pixel reprojection we were getting more and more different sampling patterns and information from different pixels and they all accumulated together into one AO buffer – that’s why it behaves so well in motion.

Why we supersampled during the AO pass, not after blur? My motivation was that I wanted to do the supersampling, so increase the number of samples taken by AO by splitting them across multiple frames / pixels. It seemed to make more sense (temporal supersampling + smoothing, not just the smoothing) and was much better at preserving the details than doing it after blur – when the information is already lost (low-pass filter) and scattered around multiple pixels.

To calculate the rejection/acceptance we used the fact the Scalable Ambient Obscurance has a simple, but great trick of storing and compressing depth into same texture as AO (really accelerates the subsequent bilateral blurring passes, only 1 sample taken each tap) – 16bit depth gets stored in 2 8-bit channels. Therefore we had depth information ready and available with the AO and could do the depth rejection for no additional cost! Furthermore, as our motion vectors and temporal AO surfaces were 8 bits only, they didn’t pollute the cache too much and fetching those textures pipelined very well – I couldn’t see any additional cost of temporal supersampling on a typical scene.

Depth rejection has a problem of information “trailing”, (when occluder disappears, occluded pixel has no knowledge of it – and cannot reject the “wrong” history / accumulation) but it was much cheaper to do (information for given pixel compressed and fetched with color) than multi-tap color-based rejection and as I said – neither we, nor any testers / players have seen any actual trailing issues.


Comparison to previous approaches

Idea to apply temporal smoothing on SSAO is not new. There were presentations from DICE [2] and Epic Games [3] about similar approaches (thanks for Stephen Hill for mentioning the second one – I had no idea about it), but they differed from our approach a lot not only in implementation, but also in both reasoning as well as application. They used temporal reprojection to help smoothen the effect and reduce the flickering when camera was moving, especially to reduce half resolution artifacts when calculating SSAO in half res (essential for getting acceptable perf on the expensive HBAO algorithm). On the other hand, for us it was not only to smoothen the effect, but to really increase the number of samples and do the supersampling distributed across mutliple frames distributed in time and main motivation/inspiration came from temporal antialiasing techniques. Therefore our rejection heuristic was totally different than the one used by DICE presentation – they wanted to do temporal smoothing only on “unstable” pixels, while we wanted to keep the history accumulation for as long as possible on every pixel and get the proper supersampling.



I hope I have proved that temporal supersampling works extremely well on some techniques that take multiple stochastic samples like SSAO and solves common issues (undersampling, noise, temporal instability, flickering) at a negligible cost.

So… what is your excuse for not using it for AA, screen-space reflections, AO and other algorithms? :)



[1] Scalable Ambient Obscurance – McGuire et al

[2] Stable SSAO in Battlefield 3 with Selective Temporal Filtering – Andersson and Bavoil

[3] Rendering Techniques in GEARS OF WAR 2 – Smedberg and Wright


Posted in Code / Graphics | Tagged , , , , , , , , , , , , | 2 Comments

C#/.NET graphics framework

In my previous post about bokeh I promised that I will write a bit more about my simple C# graphics framework I use at home for prototyping various DX11 graphics effects.

You can download its early version with demonstration of bokeh effect here.

So, the first question I should probably answer is…

Why yet another framework?

Well, there are really not many. :) In the old days of DirectX 9, lots of coders seemed to be using ATI (now AMD) RenderMonkey . It is no longer supported, doesn’t have modern DirectX APIs support. I really doubt that with advanced DX10+ style API it would be possible to create something similar with full featureset – UAVs in all shader stages, tesselation, geometry and compute shaders.

Also today most of newly developed algorithms got much more complex.

Lots of coders seem to be using Shadertoy to showcase some effects or quite similar, quite an awesome example would be implementation of Brian Karis area lights by ben. Unfortunately such frameworks work well for fully procedural, usually raymarched rendering with a single pass – while you can demonstrate amazing visual effects (demoscene style), this is totally unlike regular rendering pipelines and is often useless for prototyping shippable rendering techniques. Also because of basing everything on raymarching, code becomes hard to follow and understand, with tons of magic numbers, hacks and functions to achieve even simple functionalities…

There are two frameworks I would consider using myself and that caught my attention:

  • “Sample Framework” by Matt Pettineo. It seems it wraps very well lots of common steps needed to set up simple DirectX 11 app and Matt adds new features from time to time. In the samples I tried it works pretty well and the code and structure are quite easy to follow. If you like coding in C++ this would be something I would look into first, however I wanted to have something done more in “scripting” style and that would be faster to use. (more about it later).
  • bgfx by Branimir Karadžić. I didn’t use it myself, cannot really tell more about it, but it has benefit of being multiplatform and multi API, so it should make it easy to abstract lots of stuff – this way algorithms should be easier to present in a platform agnostic way. But it is more of an API abstraction library, not a prototyping playground / framework.

A year or two ago I started to write my own simple tool, so I didn’t look very carefully into them, but I really recommend you to do so, both of them are for sure more mature and written better way than my simple tech.

Let’s get to my list of requirements and must-have when developing and prototyping stuff:

  • Possibility of doing multi pass rendering.
  • Mesh and texture loading.
  • Support for real GPU profiling – FPS counter or single timing counter are not enough! (btw. paper authors, please stop using FPS as a performance metric…)
  • DX11 features, but wrapped – DX11 is not very clean API, you need to write tens of lines of code to create a simple render target and all of “interesting” views like RTV, UAV and SRV.
  • Data drivenness and “scripting-like” style of creating new algorithms.
  • Shader and possibly code reloading and hot swapping (zero iteration times).
  • Simple to create UI and data driven UI creation.

Why C# / . NET

I’m not a very big fan of C++ and its object-oriented style of coding. I believe that for some tasks (not performance critical) scripting or data driven languages are much better, while other things are expressed much better in functional or data oriented style. C++ can be a “dirty” language, doesn’t have a very good standard library and templated extensions like boost (that you need for as simple tasks as regular expressions) are a nightmare to read. To make your program usable, you need to add tons of external library requirements. It gets quite hard to have them compile properly between multiple machines, configurations or library versions.

Obviosuly, C++ is here to stay, especially in games, I work with it every day and can enjoy it as well. But on the other hand I believe that it is very beneficial if a programmer works in different languages with different working philosophies – this way he can learn “thinking” about problems and algorithms, not the language specific solutions. So I love also Mathematica, multi-paradigm Python, but also C#/.NET.

As I said, I wanted to be able to code new algorithms in a “scripting” style, not really thinking about objects, but more about algorithms themselves – so I decided to use .NET and C#.

It has many benefits:

  • .NET has lots of ways of expressing solutions to a problem. You even can write in more dynamic/scripting style, Emit or dynamic objects are extremely powerful tools.
  • It has amazingly fast compilation times and quite decent edit&continue support.
  • Its performance is not that bad if you don’t write with it code that is executed thousands of times per frame.
  • .NET on windows is an excellent environment / library and has everything I need.
  • It should run on almost every developers Windows, with Visual Studio Express (free!) and if you limit used libraries (I use SlimDX) compilation / dependency resolving shouldn’t be a problem.
  • It is very easy to write complex functional-style solutions to problems with LINQ (yes, probably all game developers would look disgusted at me right now :) ).
  • It is trivial to code UI, windows etc.

So, here I present my C# / .NET framework!



Simplicity of adding new passes

As I mentioned, my main reason to create this framework was making sure that it is trivial to add new passes, especially with various render targets, textures and potentially compute. Here is an example of adding simple pass together with binding some resources, render target and later rendering a typical post-process fullscreen pass:


using (new GpuProfilePoint(context, "Downsample"))
    context.PixelShader.SetShaderResource(m_MainRenderTarget.m_RenderTargets[0].m_ShaderResourceView, 0);
    context.PixelShader.SetShaderResource(m_MainRenderTarget.m_DepthStencil.m_ShaderResourceView, 1);
    PostEffectHelper.RenderFullscreenTriangle(context, "DownsampleColorCoC");

We also get a wrapped GPU profiler for given section. :)

To create interesting resources (render target texture with all potentially interesting resource views) one would type once simply just:

m_DownscaledColorCoC = RenderTargetSet.CreateRenderTargetSet(device, m_ResolutionX / 2, m_ResolutionY / 2, Format.R16G16B16A16_Float, 1, false);

Ok, but how do we handle the shaders?

Data driven shaders

I wanted to avoid tedious manual compilation of shaders, creation of shader objects and determining their type. Adding a new shader should be done in just one place, shader file – so I went with data driven approach.

Part of the code called ShaderManager parses all the fx files in the executable directory with multiple regular expressions and looks for shader definitions, sizes of compute shader dispatch groups etc. and stores all the data.

So all shaders are defined in hlsl with some annotations in comments, they are automatically found and compiled. It supports also shader reloading and on shader compilation error presents a message box with error message and you can close it after fixing all of the shader compilation errors. (multiple retries possible)

This way shaders are automatically found, referenced in code by name.

// PixelShader: DownsampleColorCoC, entry: DownsampleColorCoC
// VertexShader: VertexFullScreenDofGrid, entry: VShader
// PixelShader: BokehSprite, entry: BokehSprite
// PixelShader: ResolveBokeh, entry: ResolveBokeh
// PixelShader: ResolveBokehDebug, entry: ResolveBokeh, defines: DEBUG_BOKEH

Data driven constant buffers

I also support data driven constant buffers and manual reflection system – I never really trusted DirectX effects framework / OpenGL reflection.

I use dynamic objects from .NET to access all constant buffer member variables just like regular C# member variables – both for read and write. It is definitely not the most efficient way to do it, forget about even hundreds of drawcalls with different constant buffers – but  on the other hand, it was never main goal of my simple framework – but real speed of prototyping.

Example of (messy) mixed read and write constant buffer code – none of “member” variables are defined anywhere in code:

mcb.zNear = m_ViewportCamera.m_NearZ;
mcb.zFar = m_ViewportCamera.m_FarZ;
mcb.screenSize = new Vector4((float)m_ResolutionX, (float)m_ResolutionY, 1.0f / (float)m_ResolutionX, 1.0f / (float)m_ResolutionY);
mcb.screenSizeHalfRes = new Vector4((float)m_ResolutionX / 2.0f, (float)m_ResolutionY / 2.0f, 2.0f / (float)m_ResolutionX, 2.0f / (float)m_ResolutionY);
m_DebugBokeh = mcb.debugBokeh > 0.5f;

Nice and useful part of parsing constant buffers with regular expressions is that I can directly specify which variables are supposed to be user driven. This way my UI is also created procedurally.


float ambientBrightness; // Param, Default: 1.0, Range:0.0-2.0, Gamma
float lightBrightness;   // Param, Default: 4.0, Range:0.0-4.0, Gamma
float focusPlane;        // Param, Default: 2.0, Range:0.0-10.0, Linear
float dofCoCScale;       // Param, Default: 6.0, Range:0.0-32.0, Linear
float debugBokeh;        // Param, Default: 0.0, Range:0.0-1.0, Linear

As you see it supports different curve responses of sliders. Currently is not very nice looking due to my low UI skills and laziness (“it kind of works, so why bother”) – but I promise to improve it a lot in the near future, both on the code side and usability.


Final feature I wanted to talk about and something  that was very important for me when developing my framework was possibility to use extensively multiple GPU profilers.

You can place lots of them with hierarchy and profiling system will resolve them (DX11 disjoint queries are not obvious to implement), I also created very crude UI that presents it in a separate window.


Future and licence

Finally, some words about the future of this framework and licence to use it.

This is 100% open source without any real licence name or restrictions, so use it however you want on your own responsibility. If you use it and publish something based on it and respect the graphics programming community and development, please share your sources as well and mention where and who you got original code from – but you don’t have to.

I know that it is in very rough form, lots of unfinished code, but every week it gets better (every time I use it and find something annoying or not easy enough, I fix it :) ) and I can promise to release updates from time to time.

Lots of stuff is not very efficient – but it doesn’t really matter, I will improve it only if I need to. On the other hand, I aim to improve code quality and readability constantly.

My nearest plans are to fix obj loader, add mesh and shader binary caching, better structure buffer object handling (like append/consume buffers), provide more supported types in constant buffers and fix the UI. Further future is adding more reflection for texture and UAV resources, font drawing and GPU buffer-based on-screen debugging.


Posted in Code / Graphics | Tagged , , , , , , | 10 Comments