Why 40mm?
What does it mean that this lens is “classic”?

Optical elements – notice how close pieces of glass are together (avoiding glass/air contact)

Optical elements – notice how close pieces of glass are together (avoiding glass/air contact)
I have just submitted onto GitHub small new script – Poisson-like distribution sampling generator suited for various typical rendering scenarios.
Unlike other small generators available it supports many sampling patterns – disk, disk with a central tap, square, repeating grid.
It outputs ready-to-use (and C&P) patterns for both hlsl and C++ code. It plots pattern on very simple graphs.
Generated sequence has properties of maximizing distance for every next point from previous points in sequence. Therefore you can use partial sequences (for example only half or a few samples based on branching) and have proper sampling function variance. It could be useful for various importance sampling and temporal refinement scenarios. Or for your DoF (branching on CoC).
Edit: I added also an option to optimize sequences for cache locality. It is very estimate, but should work for very large sequences on large sampling areas.
Just edit the options and execute script: “python poisson.py“. 🙂
Options are edited in code (I use it in Sublime Text and always launch as script, so sorry – no commandline parsing) and are self-describing.
# user defined options disk = False # this parameter defines if we look for Poisson-like distribution on a disk (center at 0, radius 1) or in a square (0-1 on x and y) squareRepeatPattern = True # this parameter defines if we look for "repeating" pattern so if we should maximize distances also with pattern repetitions num_points = 25 # number of points we are looking for num_iterations = 16 # number of iterations in which we take average minimum squared distances between points and try to maximize them first_point_zero = disk # should be first point zero (useful if we already have such sample) or random iterations_per_point = 64 # iterations per point trying to look for a new point with larger distance sorting_buckets = 0 # if this option is > 0, then sequence will be optimized for tiled cache locality in n x n tiles (x followed by y)
This simple script requires some scientific Python environment like Anaconda or WinPython. Tested with Anaconda.
Have fun sampling! 🙂
This is a new post for one of my favourite “off-topic” subjects – photography. I just recently (under 2 weeks ago) bought Sony A7 and wanted to share some my first impressions and write a mini review.
Why did I buy a new piece of photo hardware? Well, my main digital camera since 3-4 years was Fuji FinePix X100. I also owned some Nikon 35mm/FF DSLRs, but since my D700 (that I bought used cheaply with already big shutter counter value) got broken beyond repair I bought D600, I almost didn’t use Nikon gear. D600 is a terrible camera with broken AF, wrong metering (exposes +/- 1EV at random, lots of PP at home) and tons of other problems and honestly – I wouldn’t recommend it to anyone and I don’t use it anymore.
With Fuji X100 I share hate & love relationship. It has lots of advantages. Great image quality for such tiny size and APS-C sensor. It is very small, looks like a toy camera (serious advantage if you want to travel into not really safe areas or simply don’t want to attract too much attention, just enjoy taking photos). Bright f/2.0 lens and interesting focal length (one good photographer friend of mine told me once that there are no interesting photos taken with focal lengths of more than 50mm and while it was supposed to be a joke, I hope you can get the point). Finally nice small built-in flash and excellent fill light flash mode working great with leaf shutter and short sync times – it literally saved thousands of portraits in bright sunlight and other holiday photos. On the other hand, it is slow, has lots of quirks in usage (why do I need to switch to macro mode to take a regular situational portrait?!), slow and inaccurate AF (need to try to take a photo couple times, especially in low light…), it’s not pin-sharp and fixed 35mm focal length equivalent can be quite limiting – too wide for standard shooting, too narrow for wide angle shots.
Since at least a year I was looking around for alternatives / some additional gear and couldn’t find anything interesting enough. I looked into Fuji X100s – but simply a bit better AF and sensor wouldn’t justify such big expense + software has problems with X-Trans sensor pixel color reconstruction. I read a lot about Fuji X-series mirror-less system, but going into a new system and buying all the new lenses is a big commitment – especially on APS-C. Finally quite recent option is Sony RX-1. It seemed very interesting, but Angelo Pesce described it quite well – it’s a toy (NO OVF/EVF???).
Sony A7/A7R and recent A7S looked like interesting alternatives and something that would compete with famous Leica so I looked into it and after couple weeks of research I decided to buy the cheapest and most basic one – A7 with the kit lens. What do I need kit lens for? Well, to take photos. I knew that its IQ wouldn’t be perfect, but it’s cheap, not very heavy and it’s convenient to have one just in case – especially until having completed your target lens set. After few days of extensive use (a weekend trip to NYC, yay!) I feel like writing a mini review of it, so here we go!
I tested it with the kit lens (Sony FE 28-70mm f/3.5-5.6 OSS), Nikkor 50mm 1.4D and Voigtlander Nokton 40mm 1.4.
This one is pretty obvious. Full-frame 35mm camera sized smaller than many mirrorless APS-C or famous Leica cameras! Very light, so I just throw it in a bag or backpack. My neck doesn’t hurt even after whole day of photo shooting. Discrete when doing street photography. Nice style that is kind of blend between modern and retro cameras. Especially with M-mount lenses on – classic look and compact size. Really hard to beat in this area. 🙂
Its full-frame sensor has amazing dynamic range on low ISOs. 24MP resolution – way too much for anyone except for pros taking shots for printing on billboards, but useful for cropping or reducing high-ISO noise when downsizing. Very nice built-in color profiles and aesthetic color reproduction – I like them much better than Adobe Lightroom ones. I hope I don’t sound like audiophiles, but you really should be able to see the effect of full-frame and large pixel size on the IQ – like there is “medium-format look” even with mediocre scans, I believe there is “full-frame look” better than APS-C or Micro 4/3.
Surprisingly pleasant in use, high resolution and dynamic range and fast. I was used to Fuji X100 laggy EVF (still useful at night or when doing precise composition) and on Sony A7 I feel huge difference. Switches between EVF and back display quite quickly and eye sensor works nice. Back display can be tilted and I used it already couple times (photos near the ground or above my head), a nice feature to have.
This single advantage is really fantastic and I would buy this camera just because of that. Plugging in Voigtlander or Nikon lenses was super easy, camera automatically switched into manual focus mode and operated very well. Focusing with magnification and focus-assist is super easy and really pleasant. It feels like all those old manual cameras, same pleasure of slowly composing, focusing, taking your time and enjoying photography – but much more precise. With EVF and DoF preview always on you constantly think about DoF and its effect on composition, what will be sharp etc. To be honest, I never took so sharp and photos in my life – almost none deleted afterwards. So you spend more time on photo taking (it may be not acceptable for your friends or strangers asked to take a photo of you), but much less in post-processing and selection – again, kind of back to photography roots.

Photo of my wife. It was photo shot using Nikkor 50mm f/1.4D and MF – no AF ever gave me so precise results…
I won’t write any detailed review of the kit lens – but it’s acceptably sharp, nice micro-contrast and color reproduction, you can correct distortion and vignetting easily in Lightroom and it’s easy to take great low-light photos with relatively longer exposure times due to very good image stabilization. AF is usually accurate. While I don’t intend to use this lens a lot, I have much more fun with primes, I will keep it in my bag for sure and it proves itself useful. Only downside is size (zoom FF lenses cannot be tiny…) – because it is surprisingly light!
Again probably I feel so good about Sony A7 speed and handling because of moving from Fuji X100 – but ergonomics are great, it is fast to use and reacts quickly. Only disadvantage is how long it takes default photo preview and EVF showing image feed again – 2s is minimum time to select from a menu – way too long for me. There are tons of buttons configured very wisely by default – changing ISO or exposure compensation without taking your eye off the camera is easy.
Pro photographer probably doesn’t need any panorama mode, or night mode that automatically combines many frames to decrease noise / camera shake / blur, but I’m not a pro photographer and I like those features – especially panoramas. Super easy to take, decent quality and no need to spend hours post-processing or relying on stitch apps!
Current native FE (“full frame E-mount”) lens line-up is a joke. Apart from kit lens there are only 2 primes (why 35mm is only f/2.8 when so big?) and 2 zoom lenses – all definitely over-priced and too large. L There are some Samyang/Rokinon manual focus lenses available (I played a bit with 14mm 2.8 on Nikon and it was cheap and good quality – but way too large). There are rumors of many first and third party (Zeiss, Sigma, maybe Voigtlander) lenses to be announced at Photokina so we will see. For now one has to rely on adapters and manual focusing.
A big problem for me. I very often use flash as fill light and here it’s not possible. L Smallest Sony flash HVL-F20AM is currently not available (and not so small anyway).

Not too bad photo – but would have been much better with some fill light from a flash… (ok, I know – would be difficult to sync without ND filters / leaf shutter 🙂 )
System is very young so I expect things to improve – but currently availability of first or third party accessories (flashes, cases, screen protectors etc.) is way worse than for example Fuji X-series system. I hope things to change in the next months.
Well, maybe I’m picky and expected too much as I take tons of night photos and couple years ago it was one of the reasons I wanted to buy a full-frame camera. 🙂 But for a 2014 camera A7 high ISO quality degradation of detail (even in RAW files! they are not “true” RAW sensor feed…), color and dynamic range is a bit too high. A7S is much better in this area. Also the AF behavior is not perfect in low light…

Photo taken at night with Nikkor 50mm and f/1.4 – not too bad, but some grain visible and detail loss
The adapters I have for Nikon and M-mount are OK. Their built quality seems acceptable and I didn’t see any problems yet. But they are expensive – 50-200 dolars for a piece of metal/plastic? It would be also nice to have some information in EXIF – for example option to manually specify set focal length or detect aperture? Also Nikon/Sony A-mount/Canon adapters are too big (they cannot be smaller due to design of the lens – focal plane distance must match DSLRs) – what’s the point of having small camera with big, unbalanced lenses?

Kit zoom and tiny Nikkor 50mm 1.4D with adapter are too big… M-mount adapter and Voigtlander lens are much smaller and more useful.
I don’t really like how magnification button is placed and that by default it magnifies a lot (to 100% image crop level). I didn’t see any setting to change it – I would expect progressive magnification and better button placement like on Nikon camera.
I don’t think I will use it a lot – but sometimes it could be cool for remote control. In such case I tried to set it up and it took me 5mins or so to figure it out – definitely not something to do when willing to take a single nice photo with your camera placed on a bench at night.
In the next couple days (hopefully before the Siggraph as after I have a lot more to write!) I promise I will add in separate posts:
So stay tuned!
I didn’t really plan to write this post as I’m quite busy preparing for Siggraph and enjoying awesome Montreal summer, but after 3 similar discussion with friends developers I realized that the simple hair rendering trick I used during the prototyping stage at CD Projekt Red for Witcher 3 and Cyberpunk 2077 (I have no idea if guys kept that though) is worth sharing as it’s not really obvious. It’s not about hair simulation or content authoring, I’m not really competent to talk about those subjects and it’s really well covered in AMD Tress FX or nVidia HairWorks (plus I know that lots of game rendering engineers work on that topic as well), so check them out if you need awesome looking hair in your game. The trick I’m going to cover is to improve quality of typical alpha-tested meshes used in deferred engines. Sorry, but no images in this post though!
There are usually two problems associated with hair rendering that lot of games and game engines (especially deferred renderers) struggle with.
First problem is quite obvious – hair shading and material. Using standard Lambertian diffuse and Blinn/Blinn-Phong/microfacet specular models you can’t get proper looks of hair, you need some hair specific and strongly anisotropic model. Some engines try to hack some hair properties into the G-Buffer and use branching / material IDs to handle it, but as recently John Hable wrote in his great post about needs for forward shading – it’s difficult to get hair right fitting those properties into G-Buffer.
I’m also quite focused on performance, love low-level and analyzing assembly and it just hurts me to see branches and tons of additional instructions (sometimes up to hundreds…) and registers used to branch for various materials in the typical deferred shading shader. I agree that the performance impact can be not really significant compared to bandwidth usage on fat GBuffers and complex lighting models, but still it’s the cost that you pay for whole screen even though hair pixels don’t occupy too much of the screen area.
One of tricks we used on The Witcher 2 was faking hair specular using only dominant light direction + per character cube-maps and applying it as “emissive” mesh lighting part. It worked ok only because of really great artists authoring those shaders and cube-maps, but I wouldn’t say it is an acceptable solution for any truly next-gen game.
Therefore hair really needs forward shading – but how to do it efficiently and not pay the usual overdraw cost and combine it with deferred shading?
A nightmare of anyone using alpha-tested quads or meshes with hair strands for hair. Lots of games can look just terrible because of this hair aliasing (the same applies for foliage like grass). Epic proposed to fix it by using MSAA, but this definitely increases the rendering cost and doesn’t solve all the issues. I tried to do it using alpha-to-coverage as well, but the result was simply ugly.
Far Cry 3 and some other games used screen-space blur on hair strands along the hair tangenta and it can improve the quality a lot, but usually end parts of hair strands either still alias or bleed some background onto hair (or the other way around) in non-realistic manner.
Obvious solution here is again to use forward shading and transparency, but then we will face other family of problems: overdraw, composition with transparents and problems with transparency sorting. Again, AMD Tress FX solved it completely by using order-independent transparency algorithms on just hair, but the cost and effort to implement it can be too much for many games.
The solution I tried and played with is quite similar to what Crytek described that they tried in their GDC 2014 presentation. I guess we prototyped it independently in similar time frame (mid-2012?). Crytek presentation didn’t dig too much into details, so I don’t know how much it overlaps, but the core idea is the same. Another good reference is this old presentation from Scheuermann from ATI at GDC 2004! Their technique was different and based only on forward shading pipeline, not aimed to combined with deferred shading – but the main principle of multi pass hair rendering and treating transparents and opaque parts separately is quite similar. Thing worth noting is that with DX11 and modern GPU based forward lighting techniques it became possible to do it much easier. 🙂
Proposed solution is a hybrid of deferred and forward rendering techniques to solve some problems with it. It is aimed for engines that still rely on hair alpha tested stripes for hair rendering, have fluent alpha transition in the textures, but still most of hair strands are solid, not transparent and definitely not sub-pixel (then forget about it and hope you have the perf to do MSAA and even supersampling…). You also need to have some form of forward shading in your engine, but I believe that’s the only way to go for the next gen… Forward+/clustered shading is a must for material variety and properly lit transparency – even in mainly deferred rendering engines. I really believe in advantages of combining deferred and forward shading for different rendering scenarios within a single rendering pipeline.
Let me describe first proposed steps:
This algorithm assumes that you use regular Lambertian hair diffuse model. You can easily swap it, feel free to modify point 1 and 3 and first draw black albedo into G-Buffer and add the different diffuse model in step 3.
There are lots of advantages of this trick/algorithm – even with non-obvious hair mesh topologies I didn’t see any problems with alpha sorting – because alpha blended areas are small and usually on top of solid geometry. Also because most of the rendered hair geometry writes depth values it works ok with particles and other transparents. You avoid hacking of your lighting shaders, branching and hardcore VGPR counts. You have smooth and aliasing-free results and a proper, any shading model (not needing to pack material properties). It also avoids any excessive forward shading overdraw (z-testing set to equal and later regular depth testing on almost complete scene). While there are multiple passes, not all of them need to read all the textures (for example no need to re-read albedo after point 1 and G-Buffer pass can use some other normal map and no need to read specular /gloss mask). The performance numbers I had were really good – as hair covers usually very small part of the screen except for cutscenes – and proposed solution meant zero overhead/additional cost on regular mesh rendering or lighting.
Obviously, there are some disadvantages. First of all, there are 3 geometry passes for hair (one could get them to 2, combining points 3 and 4, but getting rid of some of advantages). It can be too much, especially if using some spline/tessellation based very complex hair – but this is simply not an algorithm for such cases, they really do need some more complex solutions… Again, see Tress FX. There can be a problem of lack of alpha blending sorting and later problems with combining with particles – but it depends a lot on the mesh topology and how much of it is alpha blended. Finally, so many passes complicate renderer pipeline and debugging can be problematic as well.
As a bonus description how in a very similar manner we hacked skin shading in The Witcher 2.
We couldn’t really separate our speculars from diffuse into 2 buffers (already way too many local lights and big lighting cost, increasing BW on those passes wouldn’t help for sure). We didn’t have ANY forward shading in Red Engine at the time as well! For skin shading I really wanted to do SSS without blurring neither albedo textures nor speculars. Therefore I came up with following “hacked” pipeline.
The main disadvantage of this technique is losing all specular color from lighting (especially visible in dungeons), but AFAIK there was a global, per-environment artist specified specular color multiplier value for skin. A hack, but it worked. Second, smaller disadvantage was higher cost of SSS blur passes (more surfaces to read to mask the skin).
In more modern engines and current hardware I honestly wouldn’t bother, do separate lighting buffers for diffuse and specular responses instead, but I hope it can inspire someone to creatively hack their lighting passes. 🙂
[1] http://www.filmicworlds.com/2014/05/31/materials-that-need-forward-shading/
[2] http://udn.epicgames.com/Three/rsrc/Three/DirectX11Rendering/MartinM_GDC11_DX11_presentation.pdf
[3] http://www.crytek.com/download/2014_03_25_CRYENGINE_GDC_Schultz.pdf
[5] https://developer.nvidia.com/hairworks
[6] “Forward+: Bringing Deferred Lighting to the Next Level” Takahiro Harada, Jay McKee, and Jason C.Yang https://diglib.eg.org/EG/DL/conf/EG2012/short/005-008.pdf.abstract.pdf
[7] “Clustered deferred and forward shading”, Ola Olsson, Markus Billeter, and Ulf Assarsson http://www.cse.chalmers.se/~uffe/clustered_shading_preprint.pdf
[8] “Screen-Space Perceptual Rendering of Human Skin“, Jorge Jimenez, Veronica Sundstedt, Diego Gutierrez
[9] “Hair Rendering and Shading“, Thorsten Scheuermann, GDC 2004
As I promised I posted my C#/.NET graphics framework (more about it and motivation behind it here) on GitHub: https://github.com/bartwronski/CSharpRenderer
This is my first GitHub submit ever and my first experience with Git, so there is possibility I didn’t do something properly – thanks for your understanding!
List of changes since initial release is quite big, tons of cleanup + some crashfixes in previously untested conditions, plus some features:
I added helper functions to manage lifetime of render targets and allow render target re-use. Using render target “descriptors” and RenderTargetManager you request a texture with all RT and shader resource views and it is returned from a pool of available surfaces – or lazily allocated when no surface fitting given descriptor is available. It allows to save some GPU memory and makes sure that code is 100% safe when changing configurations – no NULL pointers when enabling not enabled previously code paths or adding new ones etc.
I also added very simple “temporal” surface manager – that for every surface created with it stores N different physical textures for requested N frames. All temporal surface pointers are updated automatically at beginning of a new frame. This way you don’t need to hold states or ping-pong in your rendering passes code and code becomes much easier to follow eg.:
RenderTargetSet motionVectorsSurface = TemporalSurfaceManager.GetRenderTargetCurrent("MotionVectors");
RenderTargetSet motionVectorsSurfacePrevious = TemporalSurfaceManager.GetRenderTargetHistory("MotionVectors");
m_ResolveMotionVectorsPass.ExecutePass(context, motionVectorsSurface, currentFrameMainBuffer);
Nothing super interesting, but allows to much more easily experiment with algorithms like GI (see following point). In my backlog there is a task to add support for geometry shader and instancing for amplification of data for cubemaps (with proper culling etc.) that should speed it up by order of magnitude, but wasn’t my highest priority.
I added 2 elements: temporally supersampled SSAO and simple pre-baked global illumination + fully GPU-based naive GI baker. When adding those passes I was able to really stress my framework and check if it works as it is supposed to – and I can confirm that adding new passes was extremely quick and iteration times were close to zero – whole GI baker took me just one evening to write.
GI is stored in very low resolution, currently uncompressed volume textures – 3 1MB R16 RGBA surfaces storing incoming flux in 2nd order SH (not preconvolved with cosine lobe – not irradiance). There are some artifacts due to low resolution of volume (64 x 32 x 64), but for cost of 3MB for such scene I guess it’s good enough. 🙂
It is calculated by doing cubemap capture at every 3d grid voxel, calcularing irradiance for every texel and projecting it onto SH. I made sure (or I hope so! 😉 but seems to converge properly) it is energy conserving, so N-bounce GI is achieved by simply feeding previous N-1 bounce results into GI baker and re-baking the irradiance. I simplified it (plus improved baking times – converges close to asymptotic value faster) even a bit more, as baker uses partial results, but with N -> oo it should converge to the same value and be unbiased.
It contains “sky” ambient lighting pre-baked as well, but I will probably split those terms and store separately, quite possibly at a different storage resolution. This way I could simply “normalize” the flux and make it independent of sun / sky color and intensity. (it could be calculated in the runtime). There are tons of other simple improvements (compressing textures, storing luma/chroma separately in different order SH, optimizing baker etc) and I plan to gradually add them, but for now the image quality is very good (as for something without normalmaps and speculars yet 😉 ).
Again nothing that is super-interesting, rather extremely simple and usually unoptimal code just to help debugging other algorithms (and make their presentation easier). Again adding such features was matter of minutes and I can confirm that my framework succeeds so far in its design goal.
A feature that I’m not 100% happy with.
For me when working with almost anything in games – from programming graphics and shaders through materials/effects to gameplay scripting the biggest problem is finding proper boundaries between data and code. Where splitting point should be? Should code drive data, or the other way around. From multiple engines I have worked on (RedEngine, Anvil/scimitar, Dunia plus some very small experience just to familiarize myself with CryEngine, UnrealEngine 3, Unity3D) in every engine it was in a different place.
Coming back to shaders, usually tedious task is putting some stuff on the engine side in code, and some in the actual shaders while both parts must mach 100%. It not only makes it more difficult to modify some of such stuff, adding new properties, but also harder to read and follow code to understand the algorithms as it is split between multiple files not necessarily by functionality, but for example performance (eg. precalculate stuff on CPU and put into constants).
Therefore my final goal would be to have one meta shader language and using some meta decorators specify frequency of every code part – for example one part should be executed per frame, other per viewport, other per mesh, per vertex, per pixel etc. I want to go in this direction, but didn’t want to get myself into writing parsers and lexers and temporarily I used LUA (as extremely fast to integrate and quite decently performing).
Example would be one of my constant buffer definitions:
cbuffer PostEffects : register(b3)
{
/// Bokeh
float cocScale; // Scripted
float cocBias; // Scripted
float focusPlane; // Param, Default: 2.0, Range:0.0-10.0, Linear
float dofCoCScale; // Param, Default: 0.0, Range:0.0-32.0, Linear
float debugBokeh; // Param, Default: 0.0, Range:0.0-1.0, Linear
/* BEGINSCRIPT
focusPlaneShifted = focusPlane + zNear
cameraCoCScale = dofCoCScale * screenSize_y / 720.0 -- depends on focal length & aperture, rescale it to screen res
cocBias = cameraCoCScale * (1.0 - focusPlaneShifted / zNear)
cocScale = cameraCoCScale * focusPlaneShifted * (zFar - zNear) / (zFar * zNear)
ENDSCRIPT */
};
We can see that 2 constant buffer properties are scripted – there is zero code on C# side that would calculate it like this, instead a LUA script is executed every frame when we “compile” constant buffer for use by the GPU.
Simple change to improve readability of UI. Right now the UI code is the most temporary, messy part and I will change it completely for sure, but for the time being I focused on the use of it.
Right now everything in shader files and related to shaders is hot-swappable – constant buffer definitions, includes, constant scripts. Right now I can’t imagine working without it, definitely helps iterating faster.
I was testing only x64 version, 32 bit could be not configured properly and for sure is lacking proper dll versions.
One known issue (checked on a different machine with Windows 7 / x64 / VS2010) is runtime exception complaining about lack of “lua52.dll” – it is probably caused by lack of Visual Studio 2012+ runtime.
While I update stuff every week/day in my local repo, I don’t plan to do any public commits (except for something either cosmetic, or serious bug/crash fix) till probably late August. I will be busy preparing for my Siggraph 2014 talk and plan to release source code for the talk using this framework as well.
Yeah, finally I managed to go back to my past-time favourite hobby – film/analog photography that I started when I was 10 years old with following camera:
Now I’m a bit older and my photo gear has changed as well (but I really miss this camera!). 🙂 So I’m using at the moment:
Why film and not digital? Don’t get me wrong. I love digital photography for its quality, ease of use and possibility to document events and reality. It’s also very convenient on holiday (especially something small like my Fuji X100). However, lots of people (including me) find it easier to take more “artistic”/better aesthetic quality photos when working with film, especially on medium format – just due to the fact that you have 10, 12 or 15 (depending if it’s 645, 6×6 or 6×7) photos you think about every shot, composition and try to make best ones. Also shooting B&W is quite interesting challenge, as we are easily attracted to colors and shoot photos based on them, while in B&W it’s impossible and you have to look for interesting patterns, geometric elements, surface of objects and relations between them. Interesting way to try to “rewire” your brain and sense of aesthetics and learn a new skill.
Finally, developing your own film by yourself is amazing experience – you spend an hour in the darkroom, fully relaxed carefully treat film and obey all the rules and still you don’t know what will be the outcome, maybe no photo will be good at all. Great and relaxing experience for all OCD programmer guys. 😉
Some photos from just awesome Montreal summer – nothing special, just a test roll of Mamiya I brought from Poland (and it turns out it underexposes, probably old battery, will need to calibrate it properly with light meter…).
During Digital Dragons and tons of inspiring talks and discussions I’ve been asked by one Polish game developer (he and his team are doing quite cool early-access Steam economy/strategy game about space exploration programmes that you can check out here) to write a bit more about the tools we had for connectivity between game editor and final game running on a console on The Witcher 2. As increasing productivity and minimizing iteration times is one of my small obsessions, (I believe that fast iteration times, big productivity and efficient and robust pipelines are much much more important than adding tons of shiny features) I agreed that it is quite cool topic to write about. 🙂 While I realize that probably lots of other studios have similar pipelines, it is still a cool topic to talk about and multiple other (especially smaller) developers can benefit from it. As I don’t like sweeping problems under the carpet, I will discuss disadvantages and limitations of the solution we had at CD Projekt RED at that time.
Xbox 360 version of The Witcher 2 was first console game done 100% internally by CD Projekt RED. At that time X360 was already almost 7 years old and far behind the capabilities of modern PCs, for which we developed the game in the beginning. Whole studio – artists, designers and programmers were aware that we will need to cut down and change really lots of stuff to make game running on consoles – but have to do wisely not to sacrifice the high quality of players experience that our game was known for. Therefore programmers team apart from porting and optimizing had to design and implement multiple tools to aid the porting process.
Among multiple different tools, a need for connection between game editor and consoles appeared. There were 2 specific topics that made us consider doing such tools:
PC version sometimes had insane amounts of localized lights. If you look at following scene – one of game opening scenes, at specific camera angles it had up to 40 smaller or bigger localized deferred lights on a PC – and there were even heavier lit scenes in our game!

Yeah, crazy, but how was it even possible?
Well, our engine didn’t have any kind of Global Illumination or baking solution, one of early design decisions was that we wanted to have everything dynamic, switchable, changeable (quite important for such nonlinear game – most locations had many “states” that depended on game progress and player’s decision), animated.
Therefore, GI was faked by our lighting and environment artists by placing many lights of various kinds – additive, modulative, diffuse-only, specular-only, character or env-only with different falloffs, gobo lights, different types of animation on both light brightness and position (for shadow-casting lights it gives this awesome-looking torches and candles!) etc. Especially interesting ones were “modulative” lights that were subtracting energy from the scene to fake large radius AO / shadows – doing such small radius modulative light will be cheaper than rendering a shadowmap and gives nice, soft light occlusion.
All of this is totally against current trend of doing everything “physically-correct” and while I see almost only benefits of PBR approach and believe in coherency etc, I also trust great artists and believe they can also achieve very interesting results when crossing those physical boundaries and have “advanced mode” magical knobs and tweaks for special cases – just like painters and other artists that are only inspired by reality.
Anyway, having 40+ lights on screen (very often overlapping and causing massive lighting overdraw) was definitely a no-go on X360, even after we optimized our lighting shaders and pipelines a lot. It was hard for our artists to decide which lights should be removed, which ones add significant cost (large overdraw / covered area). Furthermore, they wanted to be able to decide in which specific camera takes big lighting costs were acceptable – even 12ms of lighting is acceptable if whole scene mesh rendering took under 10ms – to make game as beautiful as possible we had flexible and scene-dependent budgets.
All of this would be IMHO impossible to simulate with any offline tools – visualizing light overdraw is easy, but seeing final cost together with the scene drawing cost is not. Therefore we decided that artists need a way to tweak, add, remove, move and change lights in the runtime and see changes in performance immediately on screen and to create tools that support it.
Because of many performance considerations on x360 we went with RGBA 1010102 lighting buffer (with some exp bias to move it to “similar range” like on PC). We also changed our color grading algorithms, added filmic tone mapping curve and adapted gamma curves for TV display. All of this had simply devastating effect on our existing color precision – especially moving from 16bit lighting to 10 bit and having multiple lighting, fog and particle passes – as you might expect, the difference was huge. Also our artist wanted to have some estimate of how the game will look on TVs, with different and more limited color range etc. – on a PC version most of them used high quality, calibrated monitors to achieve consistency of texturing and color work in the whole studio. To both have a preview of this look on TV while tweaking color grading values and to fight the banding, again they wanted to have live preview of all of their tweaks and changes in the runtime. I think it was easier way to go (both in terms of implementation and code maintenance time), than trying to simulate looks of x360 code path in the PC rendering path.
Obviously, we ended up with many more benefits that I will try to summarize.
To implement this runtime console-editor connection, we wrote a simple custom command-based network protocol.
Our engine and editor already had support for network-based debugging for scripting system. We had a custom, internally written C-like scripting system (that automatically extended the RTTI, had access to all of the RTTI types, was aware of game saving/loading and had a built-in support for state machines – in general quite amazing piece of code and well-designed system, probably worth some more write-up). This scripting system had even its own small IDE, debugger with breakpoints and a sampling profiler system.
Gameplay programmers and script designers would connect with this IDE to running editor or game, could debug anything or even hot-reload all of the scripts and see the property grid layout change in the editor if they added/removed or renamed a dynamic property! Side note: everyone experienced with complex systems maintenance can guess how often those features got broken or crashed the editor after even minor changes… Which is unfortunate – as it discouraged gameplay scripters from using those features, so we got less bug reports and worked on repairing it even less frequently… Lesson learned is as simple as my advice – if you don’t have a huge team to maintain every single feature, KISS.
Having already such network protocol with support for commands sent both ways, it was super-easy to open another listener on another port and start listening to different types of messages!
To get it running and get first couple of commands implemented I remember it took only around one day. 🙂
So let’s see what kinds of commands we had:
Extremely simple – a command that hijacked in-game camera. After the connection from editor and enabling camera control, every in-editor camera move was just sent with all the the camera properties (position, rotation, near/far planes and FOV) and got serialized through the network.
Benefits from this feature were that it not only made easier working with all the remaining features – it also allowed debugging streaming, checking which objects were not present in final build (and why) and in general our cooking/exporting system debugging. If something was not present on the screen in final console build, artist or level designer could analyze why – whether it is also not present in the editor, does it have proper culling flags, is it assigned to a proper streaming layer etc. – and either fix it, or assign a systemic bug to programmers team.
Simple command that send a list of layers or layer groups to load or unload (while they got un/loaded in the editor), passed directly to the streaming system. Again allowed performance debugging and profiling of the streaming and memory cost – to optimize CPU culling efficiency, minimizing memory cost of loaded objects that were not visible etc.
While in theory something cool and helpful, I must admit that this feature didn’t work 100% as expected and wasn’t very useful and used commonly in practice for those goals. It was mostly because lots of our streaming was affected by hiding/unhiding layers by various gameplay conditions. As I mentioned, we had very non-linear game and streaming was also used for achiving some gameplay goals. I think that it was kind of a misconception and bad design of our entity system (lack of proper separation of objects logic and visual representation), but we couldn’t change it for Xbox 360 version of Witcher 2 easily.
Another simple feature. We could spawn in the runtime new lights, move existing ones and modify most of their properties – radius, decay exponent, brightness, color, “enabled” flag etc. Every time a property of a light was modified or new light component was added to a game world, we sent a command over network that replicated such event on console.
A disadvantage of such simple replication was that if we restarted the game running on console, we would lose all those changes. 😦 In such case either save + re-export (so cooking whole level again) or redoing those changes was necessary.
Very similar to the previous one. We had many “simple” meshes in our game (that didn’t have any gameplay logic attached to them) that got exported to a special, compressed list, to avoid memory overhead of storing whole entities and entity templates and they could be moved without the need of re-exporting whole level. As we used dynamic occlusion culling and scene hierarchy structure – a beam-tree, therefore we didn’t need to recompute anything, it just worked.
The most complex feature. Our “environment system” was a container for multiple time-of-day control curves for all post-effects, sun and ambient lighting, light groups (under certain mood dynamic lights had different colors), fog, color grading etc. It was very complex as it supported not only dynamic time of day, but multiple presets being active with different priorities and overriding specific values only per environment area. To be able to control final color precision on x360 it was extremely important to allow editing them in the runtime. IIRC when we started editing them while in the console connection mode, whole environment system on console got turned off and we interpolated and passed all parameters directly from the editor.
Obvious, simple and I believe that almost every engine has it implemented. For me it is obligatory to be able to work productively, therefore I understand how important it is to deliver similar functionalities to teams other than graphics programmers. 🙂
While our system was very beneficial for the project and seeing its benefits in every next project in any company I will opt for something similar, we didn’t implement many other features that would be as helpful.
Our system didn’t support adding or modifying any objects that got pre-compiled during export – mostly meshes and textures. It could be useful to quickly swap textures or meshes in the runtime (never-ending problems with dense foliage performance anyone? 🙂 so far the biggest perf problem on any project I worked on), but our mesh and texture caches were static. It would require partial dynamism of those cache files and system + adding more support for export in editor (for exporting we didn’t use the editor, but a separate “cooker” process).
While we supported recompiling hlsl based shaders used for post-effects, our system didn’t support swapping artist-authored particle or material shaders. Quite similar to the previous one – we would need to add more dynamism to the shader cache system… Wouldn’t be very hard to add if we weren’t already late in “game shipping” mode.
While we were able to move some “simple” static objects, the navmesh and gameplay collision didn’t change. It wasn’t a very big deal – artists almost never played on those modified levels – but it could make life of level and quest designers much easier – just imagine when having a “blocker” or wrong collision on a playthrough quick connection with editor, moving it and immediately checking the result – without the need to restart whole complex quest or starting it in the editor. 🙂
I think that being able to change particle system behaviors, curves and properties in the runtime would be really useful for FX artists. Effects are often hard to balance – there is a very thin line of compromise between the quality and performance due to multiple factors – resolution of the effect (half vs full res), resolution of flipbook textures, overdraw, alpha value and alpha testing etc. Being able to tweak such properties on a paused game during for instance explosion could be a miracle cure for frame timing spikes during explosions, smoke or spell effects. Still, we didn’t do anything about it due to complexity of particle systems in general and multiple factors to take into account… I was thinking about simply serializing all the properties, replicating them over the network and deserializing them – would work out of the box – but there was no time and we had many other, more important tasks to do.
While our system worked great on environment objects, we didn’t have anything for the dynamic objects like characters. To be honest, I’m not really sure if it would be possible to implement easily without doing a major refactor on many elements. There are many different systems that interact with each other, many global managers (which may not be the best “object-oriented” design strategy, but often are useful to create acceleration structures and a part of data/structure oriented design), many objects that need to have state captured, serialized and then recreated after reloading some properties – definitely not an easy task, especially under console memory constraints. Nasty side effect of this lack was something that I mentioned – problems with modifying semi-dynamic/semi-static objects like doors, gameplay torches etc.
While our whole network debugging code was designed in the first place to enable scripts reloading between the editor and a scripting IDE, it was impossible to do it on console the way it was implemented. Console version of the game had simplified and stripped RTTI system that didn’t support (almost) any dynamism and moving there some editor code would mean de-optimizing runtime performance. It could be a part of a “special” debug build, but the point of our dynamic console connection system was to be able to connect it simply to any running game. Also again capturing state while RTTI gets reinitialized + scripts code reloaded could be more difficult due to memory constraints. Still, this topic quite fascinates me and would be kind of ultimate challenge and goal for such connection system.
While our system was lacking multiple useful features, it was extremely easy and fast to implement (couple days total?). Having an editor-console live connection is very useful and I’m sure that time spent developing it paid off multiple times. It provides much more “natural” and artist-friendly interface than any in-game debug menus, allows for faster work and implementing much more complex debug/live editing features. It not only aids debugging as well as optimization, but if it was a bit more complex, it could even accelerate the actual development process. When your iteration times on various game aspects get shorter, you will be able to do more iterations on everything – which gives you not only more content in the same time/for the same cost, but also much more polished, bug-free and fun to play game! 🙂
This Friday I gave a talk on Digital Dragons 2014.
It was a presentation with tons of new, unpublished content and details about our:
If you have seen my GDC 2014 talk, then probably still there is lots of new content for you – I tried to avoid reusing my GDC talk contents as much as possible.
Here (and on publications page) are my slides for Digital Dragons 2014 conference:
PPTX Version, 226MB – but worth it (tons of videos!)
PPTX Version with extremely compressed videos, 47MB
PDF Version with sparse notes, 6MB
In my previous post about bokeh I promised that I will write a bit more about my simple C# graphics framework I use at home for prototyping various DX11 graphics effects.
You can download its early version with demonstration of bokeh effect here.
So, the first question I should probably answer is…
Well, there are really not many. 🙂 In the old days of DirectX 9, lots of coders seemed to be using ATI (now AMD) RenderMonkey . It is no longer supported, doesn’t have modern DirectX APIs support. I really doubt that with advanced DX10+ style API it would be possible to create something similar with full featureset – UAVs in all shader stages, tesselation, geometry and compute shaders.
Also today most of newly developed algorithms got much more complex.
Lots of coders seem to be using Shadertoy to showcase some effects or quite similar, quite an awesome example would be implementation of Brian Karis area lights by ben. Unfortunately such frameworks work well for fully procedural, usually raymarched rendering with a single pass – while you can demonstrate amazing visual effects (demoscene style), this is totally unlike regular rendering pipelines and is often useless for prototyping shippable rendering techniques. Also because of basing everything on raymarching, code becomes hard to follow and understand, with tons of magic numbers, hacks and functions to achieve even simple functionalities…
There are two frameworks I would consider using myself and that caught my attention:
A year or two ago I started to write my own simple tool, so I didn’t look very carefully into them, but I really recommend you to do so, both of them are for sure more mature and written better way than my simple tech.
Let’s get to my list of requirements and must-have when developing and prototyping stuff:
I’m not a very big fan of C++ and its object-oriented style of coding. I believe that for some tasks (not performance critical) scripting or data driven languages are much better, while other things are expressed much better in functional or data oriented style. C++ can be a “dirty” language, doesn’t have a very good standard library and templated extensions like boost (that you need for as simple tasks as regular expressions) are a nightmare to read. To make your program usable, you need to add tons of external library requirements. It gets quite hard to have them compile properly between multiple machines, configurations or library versions.
Obviosuly, C++ is here to stay, especially in games, I work with it every day and can enjoy it as well. But on the other hand I believe that it is very beneficial if a programmer works in different languages with different working philosophies – this way he can learn “thinking” about problems and algorithms, not the language specific solutions. So I love also Mathematica, multi-paradigm Python, but also C#/.NET.
As I said, I wanted to be able to code new algorithms in a “scripting” style, not really thinking about objects, but more about algorithms themselves – so I decided to use .NET and C#.
It has many benefits:
So, here I present my C# / .NET framework!
As I mentioned, my main reason to create this framework was making sure that it is trivial to add new passes, especially with various render targets, textures and potentially compute. Here is an example of adding simple pass together with binding some resources, render target and later rendering a typical post-process fullscreen pass:
using (new GpuProfilePoint(context, "Downsample"))
{
context.PixelShader.SetShaderResource(m_MainRenderTarget.m_RenderTargets[0].m_ShaderResourceView, 0);
context.PixelShader.SetShaderResource(m_MainRenderTarget.m_DepthStencil.m_ShaderResourceView, 1);
m_DownscaledColorCoC.Bind(context);
PostEffectHelper.RenderFullscreenTriangle(context, "DownsampleColorCoC");
}
We also get a wrapped GPU profiler for given section. 🙂
To create interesting resources (render target texture with all potentially interesting resource views) one would type once simply just:
m_DownscaledColorCoC = RenderTargetSet.CreateRenderTargetSet(device, m_ResolutionX / 2, m_ResolutionY / 2, Format.R16G16B16A16_Float, 1, false);
Ok, but how do we handle the shaders?
I wanted to avoid tedious manual compilation of shaders, creation of shader objects and determining their type. Adding a new shader should be done in just one place, shader file – so I went with data driven approach.
Part of the code called ShaderManager parses all the fx files in the executable directory with multiple regular expressions and looks for shader definitions, sizes of compute shader dispatch groups etc. and stores all the data.
So all shaders are defined in hlsl with some annotations in comments, they are automatically found and compiled. It supports also shader reloading and on shader compilation error presents a message box with error message and you can close it after fixing all of the shader compilation errors. (multiple retries possible)
This way shaders are automatically found, referenced in code by name.
// PixelShader: DownsampleColorCoC, entry: DownsampleColorCoC // VertexShader: VertexFullScreenDofGrid, entry: VShader // PixelShader: BokehSprite, entry: BokehSprite // PixelShader: ResolveBokeh, entry: ResolveBokeh // PixelShader: ResolveBokehDebug, entry: ResolveBokeh, defines: DEBUG_BOKEH
I also support data driven constant buffers and manual reflection system – I never really trusted DirectX effects framework / OpenGL reflection.
I use dynamic objects from .NET to access all constant buffer member variables just like regular C# member variables – both for read and write. It is definitely not the most efficient way to do it, forget about even hundreds of drawcalls with different constant buffers – but on the other hand, it was never main goal of my simple framework – but real speed of prototyping.
Example of (messy) mixed read and write constant buffer code – none of “member” variables are defined anywhere in code:
mcb.zNear = m_ViewportCamera.m_NearZ; mcb.zFar = m_ViewportCamera.m_FarZ; mcb.screenSize = new Vector4((float)m_ResolutionX, (float)m_ResolutionY, 1.0f / (float)m_ResolutionX, 1.0f / (float)m_ResolutionY); mcb.screenSizeHalfRes = new Vector4((float)m_ResolutionX / 2.0f, (float)m_ResolutionY / 2.0f, 2.0f / (float)m_ResolutionX, 2.0f / (float)m_ResolutionY); m_DebugBokeh = mcb.debugBokeh > 0.5f;
Nice and useful part of parsing constant buffers with regular expressions is that I can directly specify which variables are supposed to be user driven. This way my UI is also created procedurally.
float ambientBrightness; // Param, Default: 1.0, Range:0.0-2.0, Gamma float lightBrightness; // Param, Default: 4.0, Range:0.0-4.0, Gamma float focusPlane; // Param, Default: 2.0, Range:0.0-10.0, Linear float dofCoCScale; // Param, Default: 6.0, Range:0.0-32.0, Linear float debugBokeh; // Param, Default: 0.0, Range:0.0-1.0, Linear
As you see it supports different curve responses of sliders. Currently is not very nice looking due to my low UI skills and laziness (“it kind of works, so why bother”) – but I promise to improve it a lot in the near future, both on the code side and usability.
Final feature I wanted to talk about and something that was very important for me when developing my framework was possibility to use extensively multiple GPU profilers.
You can place lots of them with hierarchy and profiling system will resolve them (DX11 disjoint queries are not obvious to implement), I also created very crude UI that presents it in a separate window.
Finally, some words about the future of this framework and licence to use it.
This is 100% open source without any real licence name or restrictions, so use it however you want on your own responsibility. If you use it and publish something based on it and respect the graphics programming community and development, please share your sources as well and mention where and who you got original code from – but you don’t have to.
I know that it is in very rough form, lots of unfinished code, but every week it gets better (every time I use it and find something annoying or not easy enough, I fix it 🙂 ) and I can promise to release updates from time to time.
Lots of stuff is not very efficient – but it doesn’t really matter, I will improve it only if I need to. On the other hand, I aim to improve code quality and readability constantly.
My nearest plans are to fix obj loader, add mesh and shader binary caching, better structure buffer object handling (like append/consume buffers), provide more supported types in constant buffers and fix the UI. Further future is adding more reflection for texture and UAV resources, font drawing and GPU buffer-based on-screen debugging.
Recently I was working on console version depth of field suitable for gameplay – so simple, high quality effect, running with a decent performance on all target platforms and not eating big percent of budget.
There are tons of publications about depth of field and bokeh rendering, personally I like photographic, circular bokeh and it was also request from the art director, so my approach is doing simple poisson-like filtering – not separable, but achieves nice circular bokeh. Nothing fancy to write about.
If you wanted to do it with other shapes, I have two recommendations:
1. For hexagon shape a presentation how to approximate it by couple passes of separable skewed box blurs from John White, Colin Barré-Brisebois from Siggraph 2011. [1]
2. Probably best for “any” shape of bokeh – smart modern DirectX 11 / OpenGL idea of extracting “significant” bokeh sprites by Matt Pettineo. [2]
But… I looked at some old screenshots of the game I spent significant part of my life on – The Witcher 2 and missed its bokeh craziness – just look at this bokeh beauty! 🙂
I will write a bit about technique we used and aim to start small series about getting “insane” high quality bokeh effect aimed only for cutscenes and how to optimize it (I already have some prototypes of tile based and software rasterizer based approaches).
I am a big fan of analog and digital photography, I love medium format analog photography (nothing teaches you expose and compose your shots better than 12 photos per quite expensive film roll plus time spent in the darkroom developing it 🙂 ) and based on my photography experience sometimes I really hate bokeh used in games.
First of all – having “hexagon” bokeh in games other than aiming to simulate lo-fi cameras is very big mistake of art direction for me. Why?
Almost all photographers just hate hexagonal bokeh that comes from aperture blades shape. Most of “good quality” and modern lenses use either higher number or rounded aperture blades to help fight this artificial effect as this is something that photographers really want to fight.
So while I understand need for it in racing games or Kayne & Lynch gonzo style lo-fi art direction – it’s cool to simulate TV or cheap cameras with terrible lenses, but having it in either fantasy, historical or sci-fi games just makes no sense…
Furthermore, there are two quite contradictory descriptions of high quality bokeh that depend on the photo and photographer itself:
Both example photos are taken by me on Iceland. Even first one (my brother) taken with portrait 85mm lens doesn’t melt the background completely – a “perfect” portrait lens (135mm + ) would.
So while the first kind of bokeh is quite cheap and easy to achieve (but it doesn’t eat couple millis, so nobody considers it “truly next gen omg so many bokeh sprites wow” effect 😉 ), the second one is definitely more difficult and requires having arbitrary, complex shapes of your bokeh sprites.
So… How did I achieve bokeh effect in The Witcher 2? Answer is simple – full brute-force with point sprites! 🙂 While other developers proposed it as well at similar time [3], [4], I believe we were the first ones to actually ship the game with such kind of bokeh and we didn’t have DX10/11 support in our engine, so I wrote everything using vertex and pixel shaders.
Edit: Thanks to Stephen Hill for pointing out that actually Lost Planet was first… and much earlier, in 2007! [8]
The algorithm itself looked like:
Seems insane? Yes it is! 🙂 Especially for larger bokeh sprites the overdraw and performance costs were just insane… I think that some scenes could take up to 10ms on just bokeh on some latest GPUs at that time…
However, it worked due to couple of facts:
Obviously, being older and more experienced I see how many things we did wrong. AFAIR the code for calculating CoC and later composition pass were totally hacked, I think I didn’t use indexed draw calls (so potentially no vertices reusing) and multi-pass approach was naive as well – all those vertex texture fetches done twice…
On the other hand, I think that our lack of DX10+ kind of saved us – we couldn’t use expensive geometry shaders, so probably vertex shaders were more optimal. You can check some recent AMD investigations on this topic with nice numbers comparisons – and it is quite similar to my experiences even with the simples geometry shaders. [5]
As I mentioned, I have some ideas to optimize this effect using modern GPU capabilities as UAVs, LDS and compute shaders. Probably they are obvious for other developers. 🙂
But before I do, (as I said, I hope this to be whole post series) I reimplemented this effect at home “for fun” and to have some reference.
Very often at home I work just for myself on something that I wouldn’t use in shipping game, I’m unsure if it will work or will be shippable or simply want to experiment. That’s how I worked on Volumetric Fog for AC4 – I worked on it in my spare time and on weekends at home and realizing that it actually can be shippable, I brought it to work. 🙂
Ok, so some results for scatter bokeh.
I think it is quite faithful representation of what we had quality-wise. You see some minor half-res artifacts (won’t be possible to fully get rid of them… unless you do temporal supersampling :> ) and some blending artifacts, but the effect is quite interesting.
What is really nice about this algorithm is possibility of having much better near plane depth of field with better “bleeding” onto background (not perfect though!)- example here.
Another nice side-effect is having possibility of doing “physically-based” chromatic aberrations.
If you know about physical reasons for chromatic aberrations, you know that what games usually do (splitting RGB and offsetting it slightly) is completely wrong. But with custom bokeh texture, you can do them accurately and properly! 🙂
Here is some example of bokeh texture with some aberrations baked in (those are incorrect, I should scale color channels not move, but done like that they are more pronounced and visible on such non-HDR screenshots).
And examples how it affects image – on non-HDR it is very subtle effect, but you may have noticed it on other screenshots.

Instead of just talking about the implementation, here you have whole source code!
This is my C# graphics framework – some not optimal code written to make it extremely easy to prototype new graphics effects and for me to learn some C# features like dynamic scripting etc.
I will write more about it, its features and reasoning behind some decisions this or next week, meanwhile go download and play for yourself! 🙂
Licence to use both this framework and bokeh DoF code is 100% open source with no strings attached – but if you publish some modifications to it / use in your game, just please mention me and where it comes from (you don’t have to). I used Frank Meinl Sponza model [5] and SlimDX C# DirectX 11 wrapper [6].
As I said, I promise I will write a bit more about it later.
The effect quality-wise is 100% what was in The Witcher 2, but there are some improvements performance-wise from Witcher 2 effect.
I think that this atlasing part might require some explanation. For bokeh accumulation I use double-width texture, and spawn “far” bokeh sprites into one half, while the other ones in the second one. This way, I avoid overdraw / drawing them multiple times (MRT), geometry shaders (necessary for texture arrays as render targets) and avoid multiple vertex shader passes. Win-win-win!
I will write more about performance in later – but you can try for yourself and check that it is not great, I have even seen 11ms with extremely blurry close DoF plane filling whole screen on GTX Titan! 🙂
1. “More Performance! Five Rendering Ideas from Battlefield 3 and Need for Speed: The Run”, John White, Colin Barré-Brisebois http://advances.realtimerendering.com/s2011/White,%20BarreBrisebois-%20Rendering%20in%20BF3%20(Siggraph%202011%20Advances%20in%20Real-Time%20Rendering%20Course).pptx
2. “Depth of Field with Bokeh Rendering”, Matt Pettineo and Charles de Rousiers, OpenGL Insights and http://openglinsights.com/renderingtechniques.html#DepthofFieldwithBokehRendering http://mynameismjp.wordpress.com/2011/02/28/bokeh/
3. The Technology Behind the DirectX 11 Unreal Engine Samaritan Demo (Presented by NVIDIA), GDC 2011, Martin Mittring and Bryan Dudash http://www.gdcvault.com/play/1014666/-SPONSORED-The-Technology-Behind
4. Secrets of CryENGINE 3 Graphics Technology, Siggraph 2011, Tiago Sousa, Nickolay Kasyan, and Nicolas Schulz http://advances.realtimerendering.com/s2011/SousaSchulzKazyan%20-%20CryEngine%203%20Rendering%20Secrets%20((Siggraph%202011%20Advances%20in%20Real-Time%20Rendering%20Course).ppt
5. Vertex Shader Tricks – New Ways to Use the Vertex Shader to Improve Performance, GDC 2014, Bill Bilodeau. http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/Vertex-Shader-Tricks-Bill-Bilodeau.ppsx
6. Crytek Sponza, Frank Meinl http://www.crytek.com/cryengine/cryengine3/downloads
7. SlimDX
8. Lost Planet bokeh depth of field http://www.beyond3d.com/content/news/499 http://www.4gamer.net/news/image/2007.08/20070809235901_21big.jpg
I wanted to do another follow-up post to my GDC presentation, you can grab its slides here.
I talked for quite long about shader occupancy concept, which is extremely important and allows to do some memory latency hiding.
The question that arises is “when should I care”?
It is a perfect question, because sometimes high wave occupancy can have no impact on your shader cost, sometimes it can speed up whole pass couple times and sometimes it can be counter-productive!
Unfortunately my presentation showed only very basics about our experiences with GCN architecture, so I wanted to talk about it a bit more.
I’ve had some very good discussions and investigations about it with my friend Michal Drobot (you can recognize his work on area lights in Killzone: Shadowfall [1] and earlier work on Parallax Occlusion Mapping acceleration techniques [2]) about it and we created set of general rules / guidelines.
Before I begin, please download AMD Sea Islands ISA [3] (modern GCN architecture), AMD GCN presentation[4] and AMD GCN whitepaper [5] and have them ready! 🙂
One of most important instructions I will be referring to is
S_WAITCNT
According to the ISA this is dependency resolve instruction – waiting for completion of loading scalar or vector data.
Wait for scalar data (for example constants from a constant buffer that are coherent to a whole wavefront) are signalled as:
LGKM_CNT
In general we don’t care as much about them – you will be unlikely bound by them, as latency of constant cache (separate, faster cache unit – page 4 in the GCN whitepaper) is much lower and you should have all such values ready.
On the other hand, there is:
VM_CNT
Which is vector register memory load/write dependency resolve and has much higher potential latency and/or cost – if you have L2 or L1 cache miss for instance…
So if we look at example extremely simple shader disassembly (from my presentation):
s_buffer_load_dwordx4 s[0:3], s[12:15], 0x08
s_waitcnt lgkmcnt(0)
v_mov_b32 v2, s2
v_mov_b32 v3, s3
s_waitcnt vmcnt(0) & lgkmcnt(15)
v_mac_f32 v2, s0, v0
v_mac_f32 v3, s1, v1
We see some batched constant loading followed by an immediate wait for it, before it is moved to vector register, while later there is a wait for vector memory load to v0 and v1 (issued by earlier shader code, which I omitted – it was just to load some basic data to operate on it so that compiler doesn’t optimize out everything as scalar ops 🙂 ) before it can be actually used by ALU unit.
If you want to understand the numbers in parenthesis, read the explanation in ISA – counter is arranged in kind of “stack” way, while reads are processed in sequential way.
I will be mostly talking about s_waitcnt on vector data.
We have two ways of latency hiding:
The first option should be quite familiar for every shader coder, previous hardware also had similar capabilities – but unfortunately is not always possible. If we have some dependent texture reads, dependent ALU or nested branches based on a result of data fetch, compiler will have to insert s_waitcnt and stall whole wave until the result is available. I will talk later about such situations.
While second option existed before, it was totally hidden from PC shader coders (couldn’t measure its impact in any way… Especially on powerful nVidia cards) and in my experience it wasn’t as important on X360 and its effects as pronounced as on GCN. It allows you to hide lots of latency on dependent reads, branches or shaders with data-dependent flow control. I will also mention later shaders that really need it to perform well.
If we think about it, those two ways are a bit contradictory – one depends on register explosion (present for example when we do loop unrolling that contains some texture reads and some ALU on it), while the other one can be present when we have low shader register count and large wave occupancy.
Ok, so we know about two ways of hiding latency, how are they applied in practice? By default, compilers do lots of loop unrolling.
So let’s say we have such a simple shader (old-school poisson DOF).
for(int i = 0; i < SAMPLE_COUNT; ++i)
{
float4 uvs;uvs.xy = uv.xy + cSampleBokehSamplePoints[i].xy * samplingRadiusTextureSpace;
uvs.zw = uv.xy + cSampleBokehSamplePoints[i].zw * samplingRadiusTextureSpace;float2 weight = 0.0f;
float2 depthAndCocSampleOne = CocTexture.SampleLevel(PointSampler, uvs.xy, 0.0f ).xy;
float2 depthAndCocSampleTwo = CocTexture.SampleLevel(PointSampler, uvs.zw, 0.0f ).xy;weight.x = depthCocSampleOne.x > centerDepth ? 1.0f : depthAndCocSampleOne.y;
weight.y = depthCocSampleTwo.x > centerDepth ? 1.0f : depthAndCocSampleTwo.y;colorAccum += ColorTexture.SampleLevel(PointSampler, uvs.xy, 0.0f ).rgb * weight.xxx;
colorAccum += ColorTexture.SampleLevel(PointSampler, uvs.zw, 0.0f ).rgb * weight.yyy;weightAccum += weight.x + weight.y;
}
Code is extremely simple and pretty self-explanatory, there is no point to write about it – but just to make it clear, I batched two sample reads for the reason of combining 2 poisson xy offsets inside a single float4 for constant loading efficiency reasons (they are read into 4 registers with a single instruction).
Just a part of the generated ISA assembly (simplified a bit) could look something like:
image_sample_lz v[9:10], v[5:8], s[4:11], s[12:15]
image_sample_lz v[17:19], v[5:8], s[32:39], s[12:15]
v_mad_legacy_f32 v7, s26, v4, v39
v_mad_legacy_f32 v8, s27, v1, v40
image_sample_lz v[13:14], v[7:10], s[4:11], s[12:15]
image_sample_lz v[22:24], v[7:10], s[32:39], s[12:15]
s_buffer_load_dwordx4 s[28:31], s[16:19]
s_buffer_load_dwordx4 s[0:3], s[16:19]
s_buffer_load_dwordx4 s[20:23], s[16:19]
s_waitcnt lgkmcnt(0)
v_mad_legacy_f32 v27, s28, v4, v39
v_mad_legacy_f32 v28, s29, v1, v40
v_mad_legacy_f32 v34, s30, v4, v39
v_mad_legacy_f32 v35, s31, v1, v40
image_sample_lz v[11:12], v[27:30], s[4:11], s[12:15]
v_mad_legacy_f32 v5, s0, v4, v39
v_mad_legacy_f32 v6, s1, v1, v40
image_sample_lz v[15:16], v[34:37], s[4:11], s[12:15]
s_buffer_load_dwordx4 s[16:19], s[16:19]
image_sample_lz v[20:21], v[5:8], s[4:11], s[12:15]
v_mad_legacy_f32 v8, s3, v1, v40
v_mad_legacy_f32 v30, s20, v4, v39
v_mad_legacy_f32 v31, s21, v1, v40
v_mad_legacy_f32 v32, s22, v4, v39
v_mad_legacy_f32 v33, s23, v1, v40
s_waitcnt lgkmcnt(0)
v_mad_legacy_f32 v52, s17, v1, v40
v_mad_legacy_f32 v7, s2, v4, v39
v_mad_legacy_f32 v51, s16, v4, v39
v_mad_legacy_f32 v0, s18, v4, v39
v_mad_legacy_f32 v1, s19, v1, v40
image_sample_lz v[39:40], v[30:33], s[4:11], s[12:15]
image_sample_lz v[41:42], v[32:35], s[4:11], s[12:15]
image_sample_lz v[48:50], v[30:33], s[32:39], s[12:15]
image_sample_lz v[37:38], v[51:54], s[4:11], s[12:15]
image_sample_lz v[46:47], v[0:3], s[4:11], s[12:15]
image_sample_lz v[25:26], v[7:10], s[4:11], s[12:15]
image_sample_lz v[43:45], v[7:10], s[32:39], s[12:15]
image_sample_lz v[27:29], v[27:30], s[32:39], s[12:15]
image_sample_lz v[34:36], v[34:37], s[32:39], s[12:15]
image_sample_lz v[4:6], v[5:8], s[32:39], s[12:15]
image_sample_lz v[30:32], v[32:35], s[32:39], s[12:15]
image_sample_lz v[51:53], v[51:54], s[32:39], s[12:15]
image_sample_lz v[0:2], v[0:3], s[32:39], s[12:15]
v_cmp_ngt_f32 vcc, v9, v3
v_cndmask_b32 v7, 1.0, v10, vcc
v_cmp_ngt_f32 vcc, v13, v3
v_cndmask_b32 v8, 1.0, v14, vcc
v_cmp_ngt_f32 vcc, v11, v3
v_cndmask_b32 v11, 1.0, v12, vcc
s_waitcnt vmcnt(14) & lgkmcnt(15)
v_cmp_ngt_f32 vcc, v15, v3
v_mul_legacy_f32 v9, v17, v7
v_mul_legacy_f32 v10, v18, v7
v_mul_legacy_f32 v13, v19, v7
v_cndmask_b32 v12, 1.0, v16, vcc
v_mac_legacy_f32 v9, v22, v8
v_mac_legacy_f32 v10, v23, v8
v_mac_legacy_f32 v13, v24, v8
s_waitcnt vmcnt(13) & lgkmcnt(15)
I omitted the rest of waits and ALU ops – this is only part of the final assembly – note how much scalar architecture makes your shaders longer and potentially less readable!
So we see that compiler will probably do loop unrolling, decide to pre-fetch all the required data into multiple VGPRs (huge amount of them!).
Our s_waitcnt on vector data is much later than the first texture read attempt.
But if we count the actual cycles (again – look into ISA / whitepaper / AMD presentations) of all those small ALU operations that happen before it, we can estimate that if data was in the L2 or L1, (probably it was, as CoC of central sample must have been fetched before the actual loop) there probably will be no actual wait.
If you just look at the register count, it is huge (remember that the whole CU has only 256 VGPRs per a SIMD!) and the occupancy will be very low. Does it matter? Not really 🙂
My experiments with forcing loop there (it is tricky and involves forcing loop counter into a to uniform…) show that even if you get much better occupancy, the performance can be the same or actually lower (thrashing cache, still not hiding all the latency, limited amount of texturing units).
So the compiler will probably guess properly in such case and we got our latency hidden very well even within one wave. It is not always the case – so you should count those cycles manually (it’s not that difficult nor tedious) or rely on special tools to help you track such stalls (I cannot describe them for obvious reasons).
I mentioned that sometimes it is just impossible to do s_waitcnt much later than the actual texture fetch code.
Perfect example of it can be such code (isn’t useful in any way, just an example):
int counter = start;
float result = 0.0f;
while(result == 0.0f)
{
result = dataRBuffer0[counter++];
}
It is quite obvious that every next iteration of loop or an early-out relies on a texture fetch that has just happened. 😦
Shader ISA disassembly will look something like:
label_before_loop:
v_mov_b32 v1, 0
s_waitcnt vmcnt(0) & lgkmcnt(0)
v_cmp_neq_f32 vcc, v0, v1
s_cbranch_vccnz label_after_loop
v_mov_b32 v0, s0
v_mov_b32 v1, s0
v_mov_b32 v2, 0
image_load_mip v0, v[0:3], s[4:11]
s_addk_i32 s0, 0x0001
s_branch label_before_loop
label_after_loop:
So in this case having decent wave occupancy is the only way to hide latency and keep the CU busy – and only if you have somewhere else in your shader or in a different wave on the CU ALU-heavy code.
This was the case in for instance screenspace reflections or parallax occlusion mapping code I implemented for AC4 and that’s why I showed this new concept of “wave occupancy” on my GDC presentation and I find it very important. And in such cases you must keep your vector register count very low.
I think that in general (take it with a grain of salt and always check yourself) low wave occupancy and high unroll rate is good way of hiding latency for all those “simple” cases when you have lots of not-dependent texture reads and relatively moderate to high amount of simple ALU in your shaders.
Examples can be countless, but it definitely applies to various old-school simple post-effects taking numerous samples.
Furthermore, too high occupancy could be counter-productive there, thrashing your caches. (if you are using very bandwidth-heavy resources)
On the other hand, if you have only small amount of samples, require immediate calculations based on them or even worse do some branching relying on it, try to go for bigger wave occupancy.
I think this is the case for lots of modern and “next-gen” GPU algorithms:
But in the end and as always – you will have to experiment yourself.
I hope that by this post I also have convinced you how important it is to look through the ISA and all documents / presentations on hardware, its architecture and all low-level and final disassembly code – even if you consider yourself a “high level and features graphics / shader coder” (I believe that there is no such thing as “high level programmer” that doesn’t need to know target hardware architecture in real-time programming and especially in high-quality, console or PC games). 🙂
[1] http://www.guerrilla-games.com/publications.1
[3] http://developer.amd.com/wordpress/media/2013/07/AMD_Sea_Islands_Instruction_Set_Architecture1.pdf
[4] http://developer.amd.com/wordpress/media/2013/06/2620_final.pdf
[5] http://www.amd.com/Documents/GCN_Architecture_whitepaper.pdf
After GDC I’ve had some great questions and discussions about techniques we’ve used to filter and upsample the screenspace reflections to avoid flickering and edge artifacts. Special thanks here go to Angelo Pesce, who convinced me that our variation of weighting the up-sampling and filtering technique is not obvious and worth describing.
As I mentioned in my presentation, there were four reasons to blur the screenspace reflections:
First I’m going to describe our up-sampling technique, as it is very simple.
For up-sampling we tried first industry standard depth-edge aware bilateral up-sampling. It worked just fine for geometric and normal edges, but we faced different problem. Due to different gloss of various areas of same surface, blur kernel was also different (blur was also in half resolution).
We observed a quite serious problem on important part of our environments – water puddles that stayed after the rain. We have seen typical jaggy edge and low-res artifacts on a border of very glossy and reflective water puddle surface (around it there was quite rough ground / dirt).
As roughness affects also reflection / indirect specular visibility / intensity, effect was even more pronounced. Therefore I have tried adding a second up-sample weight based on comparison of surface reflectivity (combination of gloss based specular response and Fresnel) and it worked just perfectly!
It could be used even on its own in our case – but it may not be true in case of other games – we used it to save some ALU / BW. For us it discriminated very well general geometric edges (characters / buildings had very different gloss values than the ground), but probably not every game or scene could do it.
We spend really lots of time on getting filtering of the reflections buffer right – probably more than on the actual raytracing code or optimizations.
As kind of pre-pass and help for it, we did cross-style slight blur during downsampling of our color buffer for the screenspace reflections.
Similar technique was suggested by Mittring for bloom [1] and in general is very useful to fight various aliasing problems when using half-res colour buffers and I recommend it to anyone trying to use half-res color buffer for anything. 🙂
Then later we performed a weighted separable blur for performance / quality reasons – to get properly blurred screenspace reflections for very rough surfaces blurring radius must be huge! Using separable blur with varying radius is in general improper (special thanks to Stephen Hill for reminding it to me) as the second pass could catch some wrong blurred samples (with a different blur radius in orthogonal direction), but worked in our case – as surface glossiness was quite coherent on screen, we didn’t have any mixed patterns that would break it.
Also screen-space blur is in general improper approximation of convolution of multiple rays agains the BRDF kernel, but as also both Crytek and Guerrilla Games mentioned in their GDC presentations [2] [3], it looks quite convincing.
Filtering radius depended just on two factors. Quite obvious one is surface roughness. We ignored the effect of cone widening with distance – I knew it would be “physically wrong” but from my experiments and comparing against real multiple ray traced reference convolved with BRDF – visual difference was significant only on rough, but flat surfaces (like polished floors) and vert close to the reflected surface – with normal maps and on organic and natural surfaces or bigger distances it wasn’t noticeable as something “wrong”. Therefore for performance / simplicity reasons we have ignored it.
At first I have tried basing the blur radius on some approximation of a fixed-distance cone and surface glossiness (similar to the way of biasing mips of pre filtered cubemaps). However, artists complained about the lack of control and as our rendering was not physically based, I just gave them blur bias and scale control based on the gloss.
There was a second filtering factor – when there was a “hole” in our reflections buffer, we artificially increased the blurring radius, even for shiny surfaces. Therefore we applied form of push-pull filter.
It was better to fill the holes and look for proper samples in the neighbourhood than have ugly flickering image.
Our filtering weight depended on just two factors:
Reason for the first one was again to ignore missing samples and do pulling of proper information from pixel neighbourhood. We didn’t weight hole samples to 0.0f – AFAIR it was 0.3f. The reason here was to still get some proper fadeout of reflections and to have lower screen-space reflections weight in “problematic” areas to blend them out to fall-back cube-map information.
Finally, the Gaussian function isn’t 100% accurate approximation of Blinn-Phong BRDF shape, but smoothed out the result nicely. Furthermore and as I mentioned previously, whole no screen-space blur is a proper approximation of 3D multiple ray convolution with BRDF – but can look properly to human brain.
Thing worth noting here is that our filter didn’t use depth difference in weighting function – but on depth discontinuities there was already no reflection information, so we didn’t see any visible artifacts from reflection leaking. Guerilla Games presentation by Michal Valient [3] also mentioned doing regular full blur – without any depth or edge-aware logic.
[1] Mittring, “The Technology behind the Unreal Engine 4 Elemental Demo”
[2] Schulz, “Moving to the Next Generation: The Rendering Technology of Ryse”
[3] Valient, “Taking Killzone Shadow Fall Image Quality into the Next Generation”
Before I address temporal supersampling, just a quick reminder on what aliasing is.
Aliasing is a problem that is very well defined in signal theory. According to the general sampling theorem we need to have our signal spectrum containing only frequencies lower than Nyquist frequency. If we don’t (and when rasterizing triangles we always will as triangle edge is infinite frequency spectrum, step-like response) we will have some frequencies appearing in the final signal (reconstructed from samples) that were not in the original signal. Visual aliasing can have different appearance, it can appear as regular patterns (so-called moire), noise or flickering.
Classic supersampling is a technique that is extremely widely used by the CGI industry. Per every target image fragment we perform sampling multiple times at much higher frequencies (for example by tracing multiple rays per simply pixel or shading fragments multiple times at various positions that cover the same on-screen pixel) and then performing the signal downsampling/filtering – for example by averaging. There are various approaches to even easiest supersampling (I talked about this in one of my previous blog posts), but the main problem with it is the associated cost – N times supersampling means usually N times the basic shading cost (at least for some pipeline stages) and sometimes additionally N times the basic memory cost. Even simple, hardware-accelerated techniques like MSAA that do estimate only some parts of the pipeline (pixel coverage) in higher frequency and don’t provide as good results, have quite big cost on consoles.
But even if supersampling is often unpractical technique, it’s temporal variation can be applied with almost zero cost.
So what is the temporal supersampling? Temporal supersampling techniques base on a simple observation – from frame to frame most of the on-screen screen content do not change. Even with complex animations we see that multiple fragments just change their position, but apart from this they usually correspond to at least some other fragments in previous and future frames.
Based on this observation, if we know the precise texel position in previous frame (and we often do! Using motion vectors that are used for per-object motion blur for instance), we can distribute the multiple fragment evaluation component of supersampling between multiple frames.
What is even more exciting is that this technique can be applied to any pass – to your final image, to AO, screen-space reflections and others – to either filter the signal or increase the number of samples taken. I will first describe how it can be used to supersample final image and achieve much better AA and then example of using it to double or triple number of samples and quality of effects like SSAO.
I have no idea which game was the first to use the temporal supersampling AA, but Tiago Sousa from Crytek had a great presentation on Siggraph 2011 on that topic and its usage in Crysis 2 [1]. Crytek proposed using a sub pixel jitter to the final MVP transformation matrix that alternates every frame – and combine two frames in post-effect style pass. This way they were able to increase the sampling resolution twice at almost no cost!
Too good to be true?
Yes, the result of such simple implementation looks perfect on still screenshots (and you can implement it in just couple hours!***), but breaks in motion. Previous frame pixels that correspond to current frame were in different positions. This one can be easily fixed by using motion vectors, but sometimes the information you are looking for was occluded or had. To address that, you cannot rely on depth (as the whole point of this technique is having extra coverage and edge information from the samples missing in current frame!), so Crytek proposed relying on comparison of motion vector magnitudes to reject mismatching pixels.
***yeah, I really mean maximum one working day if you have a 3D developer friendly engine. Multiply your MVP matrix with a simple translation matrix that jitters in (-0.5 / w, -0.5 / h) and (0.5 / w, 0.5 / h) every other frame plus write a separate pass that combines frame(n) and frame(n-1) together and outputs the result.
For a long time we relied on FXAA (aided by depth-based edge detection) as a simple AA technique during our game development. This simple technique usually works “ok” with static image and improves its quality, but breaks in motion – as edge estimations and blurring factors change from frame to frame. While our motion blur (simple and efficient implementation that used actual motion vectors for every skinned and moving objects) helped to smooth edge look for objects moving quite fast (small motion vector dilation helped even more), it didn’t do anything with calm animations and subpixel detail. And our game was full of them – just look at all the ropes tied to sails, nicely tessellated wooden planks and dense foliage in jungles! 🙂 Unfortunately motion blur did nothing to help the antialiasing of such slowly moving objects and FXAA added some nasty noise during movement, especially on grass. We didn’t really have time to try so-called “wire AA” and MSAA was out of our budgets so we decided to try using temporal antialiasing techniques.
I would like to thank here especially Benjamin Goldstein, our Technical Lead with whom I had a great pleasure to work on trying and prototyping various temporal AA techniques very late in the production.
As a first iteration, we started with single-frame variation of morphological SMAA by Jimenez et al. [2] In its even most basic settings it showed definitely better-quality alternative to FXAA (at a bit higher cost, but thanks to much bigger computing power of next-gen consoles it stayed in almost same budget compared to FXAA on current-gen consoles). There was less noise and artifacts and much better morphological edge reconstruction , but obviously it wasn’t able do anything to reconstruct all this subpixel detail.
So the next step was to try to plug in temporal AA component. Couple hours of work and voila – we had much better AA. Just look at the following pictures.
Pretty amazing, huh? 🙂
Sure, but this was at first the result only for static image – and this is where your AA problems start (not end!).
Ok, so we had some subtle and we thought “precise” motion blur, so getting motion vectors to allow proper reprojection for moving objects should be easy?
Well, it wasn’t. We were doing it right for most of the objects and motion blur was ok – you can’t really notice lack of motion blur or slightly wrong motion blur on some specific objects. However for temporal AA you need to have them proper and pixel-perfect for all of your objects!
Other way you will get huge ghosting. If you try to mask out this objects and not apply temporal AA on them at all, you will get visible jittering and shaking from sub-pixel camera position changes.
Let me list all the problems with motion vectors we have faced and some comments of whether we solved them or not:
Ok, we spend 1-2 weeks on fixing our motion vectors (and motion blur also got much better! 🙂 ), but in the meanwhile realized that the approach proposed by Crytek and used in SMAA for motion rejection is definitely far from perfect. I would divide problems into two categories.
It was something we didn’t really expect, but temporal AA can break if menu pops up quickly, you pause the game, you exit to console dashboard (but game remains visible), camera teleports or some post-effect immediately kicks in. You will see some weird transition frame. We had to address each case separately – by disabling the jitter and frame combination on such frame. Add another week or two to your original plan of enabling temporal AA to find, test and fix all such issues…
This is my actual biggest problem with naive SMAA-like way of rejecting blending by comparing movement of objects.
First of all, we a had very hard time to adjust the “magic value” for the rejection threshold and 8-bit motion vectors didn’t help it. Objects were either ghosting or shaking.
Secondly, there were huge problems on for example ground and shadows – the shadow itself was ghosting – well, there is no motion vector for shadow or any other animated texture, right? 🙂 It was the same with explosions, particles, slowly falling leaves (that we simulated as particle systems).
For both of those issues, we came up with simple workaround – we were not only comparing similarity of motion of objects, but on top of it added a threshold value – if object moved faster than around ~2 pixels per frame in current or previous frame, do not blend them at all! We found such value much easier to tweak and to work with. It solved the issue of shadows and visible ghosting.
We also increased motion blur to reduce any potential visible shaking.
Unfortunately, it didn’t do anything for transparent or animated texture changes over time, they were blended and over-blurred – but as a cool side effect we got free rain drops and rain ripples antialiasing and our art director preferred such soft, “dreamy” result. 🙂
Recently Tiago Souse in his Siggraph 2013 talk proposed to address this issue by changing metric to color-based and we will investigate it in the near future [3].
I wanted to mention another use of temporal supersampling that got into final game on the next-gen consoles and that I really liked. I got inspired by Matt Swoboda’s presentation [4] and mention of distributing AO calculation sampling patterns between multiple frames. For our SSAO we were having 3 different sampling patterns (spiral-based) that changed (rotated) every frame and we combined them just before blurring the SSAO results. This way we effectively increased number of samples 3 times, needed less blur and got much much better AO quality and performance for cost of storing just two additional history textures. 🙂 Unfortunately I do not have screenshots to prove that and you have to take my word for it, but I will try to update my post later.
For rejection technique I was relying on a simple depth comparison – we do not really care about SSAO on geometric foreground object edges and depth discontinuities as by AO definition, there should be almost none. Only visible problem was when SSAO caster moved very fast along static SSAO receiver – there was visible trail lagging in time – but this situation was more artificial problem I have investigated, not a serious in-game problem/situation. Unlike the temporal antialiasing, putting this in game (after having proper motion vectors) and testing took under a day, there were no real problems, so I really recommend using such techniques – for SSAO, screen-space reflections and many more. 🙂
Temporal supersampling is a great technique that will increase final look and feel of your game a lot, but don’t expect that you can do it in just couple days. Don’t wait till the end of the project, “because it is only a post-effect, should be simple to add” – it is not! Take weeks or even months to put it in, have testers report all the problematic cases and then properly and iteratively fix all the issues. Have proper and optimal motion vectors, think how to write them for artist-authored materials, how to batch your objects in passes to avoid using extra MRT if you don’t need to write them (static objects and camera-only motion vector). Look at differences in quality between 16bit and 8bit motion vectors (or maybe R11G11B10 format and some other G-Buffer property in B channel?), test all the cases and simply take your time to do it all properly and early in production, while for example changing a bit skeleton calculation or caching vertex skinning information (having “vertex history”) is still an acceptable option. 🙂
[1] http://iryoku.com/aacourse/
[2] http://www.iryoku.com/smaa/
[3] http://advances.realtimerendering.com/s2013/index.html
[4] http://directtovideo.wordpress.com/2012/03/15/get-my-slides-from-gdc2012/
Couple days ago a friend from smaller gamedev company asked me very interesting question – why while smaller companies allow freelancers some freedom choosing 3d software, big and AAA companies usually force people to learn and use one 3d environment? Even if they don’t require prior knowledge, in job offers they often say they expect people to learn and become proficient in their use within first months? Why won’t managers just let everyone use his favourite software?
Well, the answer is simple.
Productivity.
But it’s not necessarily productivity to quickly deliver one asset (every artist will argue over which 3d soft is best for him – and probably will be right!), but productivity in terms of a big studio delivering optimal and great looking assets in huge amount and for a big game.
Let’s have a look at couple aspects of this kind of “productivity” – both on management and technical side.
Usually in bigger studios there are no “exclusive” assets that are touched only by a single person. This would add a big risk in the long run – what if someone quits the company? Gets sick before very important milestone/demo? Goes on parental leave? Has too much stuff to do and somebody has to help him?
Just imagine – how this could work if two artists that were about to share some work used different software for source data? Do you need to install and learn different software, or struggle with conversions where you could lose lots of important metadata and asset edition history?
How source data from various programs should be organized in source data repository?
To avoid such issues technical art directors force software used by the whole company and its version, art pipelines, source and target data folders structures etc. to help coordinate whole work on big team level.
Quite obvious one. 🙂 Bulk licenses for big teams are much cheaper than buying single licenses of various software. I just don’t want to imagine the nightmare of IT team having to buy and install everyone his favourite software + plugins…
If you export to collada/obj/fbx it means having an additional file on disk. Should it be submitted to source repository? I think so (point 1 and being able to reimport it without external software). But having such additional file adds some complexity to usual resource management problem when tracking bugs – was this file exported/saved/submitted properly? Where does the version mismatch come from – wrong export or import? Again another layer of files adds another layer of potential problems.
When I worked at CD Projekt Red, for a long time we used intermediate files to export from 3ds max to the game engine. But it meant simply a nightmare for an artist iterating and optimizing them. Let’s look at typical situation that happened quite often and was part of daily pipeline:
Asset is created, exported (click click click), imported (click click click), materials are setup in the engine (click click click), it is saved and placed on the scene. And ok, artist sees some bug/problem. He had to go through the same steps again (except for material setup – if he was lucky!) and verify again. Iteration times were extremely long and iteration itself was tedious and bug-prone – like sometimes on export materials could be reordered and had to be setup again!
I believe that artists should do art, not fight with the tools…
That’s why most of big studios create tools for live connection of 3d software and the engine – with auto import/export/scene refresh on key combination, single button click or save. It makes multiple steps taking potentially minutes not necessary – everything happens automatically in seconds!
This feature was always requested by artists and they loved it – however it is not an immediate thing to write. Multiple smaller companies don’t have enough tool programmers to manage to develop it… It takes some time, works with given version and I cannot imagine redoing such work for many 3d soft packages…
Raise your hands if you never had any problems with smoothing groups being interpreted differently in various software packages, different object scales or coordinate spaces (left vs right-handed). Every software has its quirks and problems and you can easily fix them… but again it is much easier for just a single program.
Another sub-point here is “predictability” – senior artists know what to expect after importing asset from one kind of software and it is much easier to quickly fix such bugs than to try investigate and google them from scratch.
I’ll just describe an example texture pipeline using Photoshop.
Usually typical pipeline for textures is creating them in “some” software, saving in TGA or similar format and importing it in the engine. It was acceptable and relatively easy (although suffers from problems described in 3,4,5) in old school pipelines when you needed just albedo + normal maps + sometimes gloss / specularity maps. But with PBR and tons of different important surface parameters it gets nightmarish – for single texture set you need to edit and export/import 4-5 texture maps, need to make sure they are swizzled properly and proper channel corresponds to target physical value. What if technical artists decide to change packing? Remove/add some channel? You need to reimport and edit all your textures…
So pipeline that works in many bigger studios uses ability of PSD files to have layers. You can have your PBR properties on different layers, packed and swizzled automatically on save. You don’t need to think what was in the alpha for given material type (“was it alpha test or translucency?”). Couple textures are stored inside the engine as a single texture pack, not tons of files with names like “tree_bark_sc” when you can easily forget (or not know) what does s or c stand for. You can use again the live connection and option to hide/show layers to compare 2 texture versions within seconds. It really helps debugging some assets and I love it myself when I have to do some technical art or want to check something. Another benefit is that if technical directors decide to change texture packing you don’t need to worry, on next save/import you won’t accidentally break the data.
Finally, you don’t need to lose any data when saving your source files – you have both non-compressed non-flattened data as well as lack of intermediate files.
Why not store all this intermediate and source data directly in the editor/engine? Answer is simple – storage size. Source data takes tons of terabytes and you don’t want your whole team including LDs and programmers to sync on all of that.
Almost no big company uses “vanilla” pure 3d software without any tools and plugins. Technical artists and 3d/tool programmers create plugins that enhance the productivity a lot – for example automatize vertex painting for some physics/vertex shader animation properties of vertices, automatize layer blending or help simplify some very common operations. Because of existence of those plugins it is sometimes problematic to even switch to a new software version – but to have their authors rewrite them for a different software or multiple products… It needs really strong justification. 🙂
I think this point quite overlaps with 3-7, but it is quite important one – usually “pure” assets require some post steps inside the engine to make them skinned, assign proper materials or have as part of some other mesh/template – and to make artist job easier, programmers write special tools/plugins. It is much easier to automate this tedious work if you have only one pipeline and one 3d program to support.
LODs. Rigging. Skinning. Exporting animations. Impostors. Billboards.
All of this is very technical kind of art and is very software dependent. Companies write whole sets of plugins and tools to simplify it and they often are written by technical artists as plugins for 3d software.
Again overlap with 7, but also important – definitely artists should be able to see their assets properly in 3d software (it both helps them and also saves time on iterating), with all the masking of layers, alpha testing and even lighting. It all changes per material type, so you need to support in-game materials in your 3d soft (or the other way around). It is tricky to implement even for one environment (both automated shader export/import or writing custom viewport renderer are “big” tasks) and almost impossible for multiple ones.
Controversial and “hardcore” one. I remember a presentation from Guerrilla Games – Siggraph 2011 I think – about how they use Maya as their whole game editor for everyone – from env artists and lighters to LDs. From talking with various friends at other studios it seems that while it is not the most common practice, still multiple studios are using it and are happy with it. I don’t want to comment it right now (worth a separate post I guess), I see lots of advantages and disadvantages, but just mention it as an option – definitely interesting and tempting one. :>
There are multiple reasons behind big studios using single art pipeline and one dominating 3d software. It is not the matter of which one technical directors prefer, or is it easiest to create asset of type X – but simply organizing work of whole studio depends on it. It means creating tools for potentially hundreds of people, buying licences and setting up network infrastructure to handle huge source assets.
If you are either beginning or experienced – but used to only one program – artist, I have only one advice for you – be flexible. 🙂 Learn different ways of doing assets, various new tools, different pipelines and approaches. I guarantee you that you will be a great addition to any AAA team and will feel great in any studio working environment or pipeline – no matter if for given task you will have to use 3ds max/Maya/Blender/Houdini or Speedtree modeller.
I have some mixed feelings about the blog post I’m about to write. On the one hand, it is something obvious and rudimentary in graphics workflows, lots of graphics blogs use such techniques, but on the other hand, I’ve seen tons of blog posts, programmer discussions and even scientific papers that seem to totally not care about it. So I still feel that it is quite important topic and will throw some ideas, so let’s get to it.
Imagine that you are working on some topic (new feature? major optimization that can sacrifice “a bit” of quality? new pipeline?) for couple days. You got some promising results, AD seems to like it, just some small changes and you will submit it. You watch your results on a daily basis (well, you see them all the time), but then you call another artist or programmer to help you evaluate the result and you start discussing/arguing/wondering, “should it really look like this”?
Well, that’s a perfect question. Is the image too bright? Or maybe your indirect light bounce is not strong enough? Is your area lights approximation plausible and energy conserving? What about the maths – did I forget the (infamous) divide by PI? Was my monitor de-calibrated by my cat, or maybe did art director look earlier from a different angle? 🙂
Honestly I always lost track of what looks ok and what doesn’t after just couple iterations – and checked back with a piece of concept art, AD feedback or photographs.
Answer to all of those questions is almost impossible to make just by looking at the image. It is also sometimes very difficult to make without complex and long analysis of code and maths. That’s why it is essential to have a reference / comparison version.
Just to clarify. I’m not talking about automatic testing. It is important topic, lots of companies use it and it makes perfect sense, but my blog post has nothing to do with it. It is relatively easier to keep away from breaking stuff that was already done right (or you accepted some version), but it is very difficult to get things “right” when you don’t know how the final result should look like.
Ok, so what could be this reference version? I mean that you should have some implementation of a “brute-force” solution to the problem you are working on. Naive, without any approximations / optimizations, running in even seconds instead of your desired 16/33 millis.
For years, most of game 3d graphics people didn’t use any reference versions – and it has a perfect explanation. Games were so far away from CGI rendering that there was no point in comparing the results. Good game graphics were product of clever hacks, tricks and approximations and interesting art direction. Therefore, still lots of old school programmers and artists have a habit of only checking if something “looks ok” or hacking it until it does. While art direction will always be the most important part of amazing 3D visuals, since we discovered the power of physically based shading and started to use techniques like GI / AO / PBR / area lights etc. there is no turning back, some tricks must be replaced by terms that make physical / mathematical sense. Fortunately, we can compare them against ground truth.
I’m going to give just couple examples of applications of how it can be used and implemented for some selected topics.
Actually, the topic of area lights is the one why I started to think about writing this blog post. We have seen multiple articles and presentations on that topic, some discussing energy conservation or looks of final light reflection shape – but how many have compared it against ground truth? And I’m not talking only about a comparison of incoming energy in Mathematica for some specific light / BRDF setup – it is important, but I believe that checking the results in real time in your game editor is way more useful.
Think about it – it is trivial to implement even 64 x 64 loop in your shader that integrates the light area by summing sub-lights – it will run in 10fps on your GTX Titan, but you will be able to immediately compare your approximations with ground truth. You will see the edge cases, where is diverges from expected results and will be able to truly evaluate this solution with your lighters.
You could even do it on the CPU side and have 64×64 grid of shadow casting lights and check the (soft)shadowing errors with those area lights how useful is that to check your PCSS soft-shadows?
Very important one – as signal aliasing is one of the basic problems of real-time computer graphics. There are recently lots of talks about geometric aliasing, texture aliasing, shading aliasing (Toksvig, specular, or diffuse AA anyone?), problems with alpha tested geometry etc. Most of presentations and papers fortunately do present comparisons with a reference version, but have you compared it yourself in your engine? 🙂 Are you sure you got it right?
Maybe you have some MSAA bug, maybe your image-based AA works very poorly in motion or maybe your weights for temporal AA are all wrong? Maybe your specular / diffuse AA calculations are improper, or just the implementation has a typo in it? Maybe artist-authored vertex and pixel shaders are introducing some “procedural” aliasing? Maybe you have geometric normals shading aliasing (common techniques like Toksvig work only in normal-map space)? Maybe actually your shadow mapping algorithm is introducing some flickering / temporal instability?
There are tons of other potential problems with aliasing that comes from different sources (well… all the time we are trying to resample some data containing information way above Nyquist frequency), but we need to be sure if it is the source of our problem in given case.
Obviously, doing a proper, reference super-resolution image rendering and resampling it helps here. I would recommend two alternate solutions:
I had good experiences with the second one (as it usually works well with blur-based post-effects like bloom), but to get it right don’t forget a small simple trick – apply a negative mip bias (~to log2 supersampling level in one axis) and a geometric LOD bias. This way your mip-mapping will work like if you had much higher screen resolution and you will potentially see some bugs that come from improper LODs. A fact that I find quite amusing – we implemented this for The Witcher 2 as graphic option for future players (we were really proud of graphics in the final game and thought that it would be awesome if your game looked as great in 10years, right? 🙂 ) – but most PC enthusiasts hated us for that! They are used to putting everything to max to test their $3-5k PC setups (and justify the expense), but this option “surprisingly” (even if there was a warning in the menu!) cut their performance for example 4x on the GPU. 😉
Probably the most controversial one – as very difficult and problematic to implement. I won’t cover here all the potential problems, but implementing reference GI could take weeks and rendering will take seconds / minutes to complete. Your materials could look different. CPU/GPU solutions require completely different implementations.
Still I think it is quite important, because I had endless discussions like “are we getting enough bounced lighting here?”, “this looks too bright / too dark” etc. and honestly – I was never sure of the answer…
This one could be easier for the ones who use Maya/other 3D software as their game editor, but probably will be problematic for all the other ones. Still you could consider doing it step by step – having a simple BVH/kd-Tree and raytracing based AO baker / estimator should be quite easy to write (max couple days), will help you to evaluate your SSAO and larger scale AO algorithms. In future you could extend it to multiple light bounce GI estimator. With PBR and next-gen gaming I think it will be the crucial factor at some point that could really speed-up both your R&D and the final production – as artists used to work in CGI/movies will get the same, proper results in the game engine.
A perfect example was given by Brian Karis on the last Physically Based Shading at Siggraph 2013 course on the topic of “environment BRDF”. By doing a brute force integration over whole hemisphere and BRDF response to the incoming irradiance from your env map, you can check how it is really supposed to look. I would recommend doing it without any importance sampling as a starting point – because you could also make a mistake or introduce some errors / bias doing so!
Having such reference version it is way easier to check your approximations – you will immediately see what are the edge cases and potential disadvantages of given approximation. Having such mode in your engine you will check if you pick proper mip maps or if you forgot to multiply/divide by some constant coefficient. You will see how much you are losing by ignoring the anisotropic lobe or by decoupling some integration terms. Just do it, it shouldn’t take you more than hours with all the proper testing!
Just couple thoughts on how it should be implemented: I think there is quite a big problem of where you want to place your solution on the line where the two extremes are:
On one hand, if developing a reference version takes too much time, you are not going to do it. 🙂 The least usable solution is probably still better than no solution – if you will be scared (or not allowed to by your manager) of implementing a reference version because it takes too long to do so, you will not get any benefits.
On the other hand, if switching between versions takes too much time, you need to wait seconds to see some results or even have to manually recompile some shaders or compare versions in Photoshop, the benefits of having a reference version will be also diminished and there could be no point in using it.
Every case is different – probably a reference BRDF integrator will take minutes to write, but reference GI screenshots / live mode can take weeks to complete. Therefore I can only give you the advice to be reasonable about it. 🙂
One thing to think about is having some in-engine or in-editor support/framework that makes the use of referencing of various passes easier. Just look at photo applications like great Adobe Lightroom – you have both a “slider” for the split image modes as well as options to place compared images on different monitors.
There is also a “preview before” button always available. It could be useful for other topics – imagine how having such button for lighting / post-effects settings would make life easier for your lighting artist! One click to compare with what he had 10minutes ago – a great help for answering the classic “am I going in the right direction?” question. Having such tools as a part of your pipeline is probably not immediate thing to develop, you will need help of good tool programmers, but I think it may pay back quite quickly.
Having a reference version will help you during development and optimization. Ground truth version is an objective reference point – unlike judgement of people that can be biased, subjective or depend on emotional / non-technical factors (see the list of cognitive biases in psychology! An amazing problem that you always need to take into account, not only working with other people, but also alone). Implementing a reference version can take various amount of time (from minutes to weeks) and probably sometimes it is too much work/difficulty to do, so you need to be reasonable about it (especially if you work in a production, non-academic environment), but just keeping it in mind could help you solve some problems or explain them to other people (artists, other programmers).
The technique was first mentioned by Crytek among some of their improvements (like screenspace raytraced shadows) in their DirectX 11 game update for Crysis 2 [1] and then was mentioned in couple of their presentations, articles and talks. In my free time I implemented some prototype of this technique in CD Projekt’s Red Engine (without any filtering, reprojection and doing a total bruteforce) and results were quite “interesting”, but definitely not useable. Also at that time I was working hard on The Witcher 2 Xbox 360 version, so there was no way I could try to improve it or ship in the game I worked on, so I just forgot about it for a while.
On Sony Devcon 2013 Michal Valient mentioned in his presentation about Killzone: Shadow Fall [2] using screenspace reflections together with localized and global cubemaps as a way to achieve a general-purpose and robust solution for indirect specular and reflectivity and the results (at least on screenshots) were quite amazing.
Since then, more and more games have used it and I was lucky to be working on one – Assassin’s Creed 4: Black Flag. I won’t dig deeply into details here about our exact implementation – to learn them come and see my talk on GDC 2014 or wait for the slides! [7]
Meanwhile I will share some of my experiences with the use of this technique and benefits, limitations and conclusions of my numerous talks with friends at my company, as given increasing popularity of the technique, I find it really weird that nobody seems to share his ideas about it…
Advantages of screenspace raymarched reflections are quite obvious and they are the reason why so many game developers got interested in it:
We have seen all those benefits in our game. On this two screenshots you can see how screenspace reflections easily enhanced the look of the scene, making objects more grounded and attached to the environment.
One thing worth noting is that in this level – Abstergo Industries – walls had complex animations and emissive shaders on them and it was all perfectly visible in the reflections – no static cubemap could allow us to achieve that futuristic effect.
Ok, so this is a perfect technique, right? Nope. The final look in our game is effect of quite long and hard work on tweaking the effect, optimizing it a lot and fighting with various artifacts. It was heavily scene dependent and sometimes it failed completely. Let’s have a look on what can causes those problem.
Well, this one is obvious. With all of screenspace based techniques you will miss some information. On screenspace reflections they are caused by three types of missing information:
Ok, it’s not perfect, but it was to be expected – all of the screenspace based techniques reconstructing 3D information from depth buffer have to fail sometimes. But is it really that bad? Industry accepted SSAO (although I think that right now we should already be transiting to 3D techniques like the one developed for The Last of Us by Michal Iwanicki [3]) and its limitations, so what can be worse about SSRR? Most of objects are non-metals, they have high Fresnel effect and when the reflections are significant and visible, the required information should be somewhere around, right?
If some problems caused by lack of screenspace information were “stationary”, it wouldn’t be that bad. The main issues with it are really ugly.
Flickering.
Blinking holes.
Weird temporal artifacts from characters.
I’ve seen them in videos from Killzone, during the gameplay of Battlefield 4 and obviously I had tons of bug reports on AC4. Ok, where do they come from?
They all come from lack of screenspace information that is changing between frames or changes a lot between adjacent pixels. When objects or camera move, the information available on screen changes. So you will see various noisy artifacts from the variance in normal maps. Ghosting of reflections from moving characters. Suddenly appearing and disappearing whole reflections or parts of them. Aliasing of objects.
All of it gets even worse if we take into account the fact that all developers seem to be using partial screen resolution (eg. half res) for this effect. Suddenly even more aliasing is present, more information is not coherent between the frames and we see more intensive flickering.
Obviously programmers are not helpless – we use various temporal reprojection and temporal supersampling techniques [4], (I will definitely write a separate post about them! As we managed to use them for AA and SSAO temporal supersampling) bilateral methods, conservative tests / pre-blurring source image, do the screenspace blur on final reflection surface to simulate glossy reflections, hierarchical upsampling, try to fill the holes using flood-fill algorithms and finally, blend the results with cubemaps.
It all helps a lot and makes the technique shippable – but still the problem is and will always be present… (just due to limited screenspace information).
Ok, so given those limitations and ugly artifacts/problems, is this technique worthless? Is it just a 2013/2014 trend that will disappear in couple years?
I have no idea. I think that it can be very useful and definitely I will vote for utilizing it in the next projects I will be working on. It never should be the only source of reflections (for example without any localized / parallax corrected cubemaps), but as an additional technique it is still very interesting. Just couple guidelines on how to get best of it:
Also some research that is going on right now on topic of SSAO / screen-space GI etc can be applicable here and I would love to hear more feedback in the future about:
As probably all of you noticed, I deliberately didn’t mention the console performance and exact implementation details on AC4 – for it you should really wait for my GDC 2014 talk. 🙂
Anyway, I’m really interested in other developer findings (especially the ones that already shipped their game with similar technique(s)) and can’t wait for bigger discussion about the problem of handling indirect specular BRDF part, often neglected in academic real-time GI research.
[1] http://www.geforce.com/whats-new/articles/crysis-2-directx-11-ultra-upgrade-page-2/
[2] http://www.guerrilla-games.com/presentations/Valient_Killzone_Shadow_Fall_Demo_Postmortem.html
[3] http://miciwan.com/SIGGRAPH2013/Lighting%20Technology%20of%20The%20Last%20Of%20Us.pdf
[4] http://directtovideo.wordpress.com/2012/03/15/get-my-slides-from-gdc2012/
[5] http://blog.selfshadow.com/publications/s2013-shading-course/
[6] http://directtovideo.wordpress.com/2013/05/08/real-time-ray-tracing-part-2/
[7] http://schedule.gdconf.com/session-id/826051
[8] http://seblagarde.wordpress.com/2012/11/28/siggraph-2012-talk/
Mathematics are essential part of (almost?) any game programmers work. It was always especially important in work of graphics programmers – all this lovely linear algebra and analytic geometry! – but with more powerful hardware and more advanced GPU rendering pipelines it becomes more and more complex. 4×4 matrices and transformations can be trivially simplified by hand and using a notebook as a main tool, but recently, especially with physically based shading becoming an industry standard we need to deal with integrals, curve-fitting and various functions approximations.
Lots of this is because of trying to fit complex analytical models or real captured data. Doesn’t matter if you look at rendering equation and BRDFs, atmospheric scattering or some global illumination – you need to deal with complex underlying mathematics and learn to simplify them – to make it run with decent performance or to help your artists. Doing so you cannot introduce visible errors or inconsistency.
However understanding maths is not enough – getting results quickly, visualizing them and being able to share with other programmers is just as important – and for this good tools are essential.
Mathematica is an awesome mathematics package from Wolfram and is becoming an industry standard for graphics programmers. We use it at work, some programmers exchange and publish their Mathematica notebooks (for example last year’s Siggraph course “Physically Based Shading” downloadable material includes some). Recently there was an excellent post on #AltDevBlog by Angelo Pesce on Mathematica usage and it definitely can help you get into using this awesome tool.
However, rest of this post is not going to be about Mathematica. Why not stick with it as a main toolbox? There are couple reasons:
For various smaller personal tasks that are not performance-critical like scripting, quickly prototyping or even demonstrating algorithms I always loved “modern” or scripting languages. I find myself coding often in C#/.NET, but just the project setup for some smallest tasks can be consuming too much time relative to the time spent solving problems. Since couple years I used Python on several occasions and always really enjoyed it. Very readable, compact and self-contained code in 1 file were a big advantage for me – just like great libraries (there is even OpenCL / CUDA Python support) or language-level support for basic collections. So I started to check for options of using Python as mathematics and scientific toolset and viola – Numpy and Scipy!
So what are those packages?
NumPy is a linear algebra package implemented mostly in native code. It supports n-dimensional arrays and matrices, various mathematical functions and some useful sub-packages as for example random number generators. It’s functionality is quite comparable to pure Matlab or open-source GNU Octave (which I used extensively during university studies and it worked quite ok). Due to native-code implementation, it is orders of magnitude faster than the same functions implemented in Python. Python serves only as glue code to tie everything together – load the data, define functions and algorithms etc. So this way we keep simplicity and readability of Python code and performance of native-written applications. As NumPy releases the Python’s GIL (global interpreter lock), its code is very easily parallelizable and can be multi-threaded.
Just NumPy allowed me to save some time, but its capabilities are limited comparing to full Matlab or Mathematica. It doesn’t even have plotting functions… That’s where the SciPy comes into play. This is a fully featured mathematical and scientific package, containing everything from optimization (finding function extremes), numerical integration, curve-fitting, k-means algorithms, to data mining and clustering methods… Just check it out on official page or wikipedia. It also comes with nice plotting library Matplotlib, which handles multiple types of plots, 2D, 3D etc. Using numpy and scipy I have almost everything I really need.
One of quite big disadvantages of Python environment, especially on Windows is quite terrible installation, setup and packaging. Downloading tons of installers, hundreds of conflicting versions and lack of automatic update. Linux (and probably Mac?) users are way more lucky, automatic packaging systems solve most of problems.
That’s why I use (and recommend to everyone) WinPython package. It is a portable distribution, you can use it right away without installing. Getting a new version is just downloading the new package. If you want, you can “register” it in Windows as a regular Python distribution, recognized by other packages. It has not only Python distribution – but also some package management system, editors, better shells (useful especially for non-programmers who want to use it in interactive command line style) and most importantly all interesting packages! Just download it, unpack it, register it with windows in “control panel” exe, maybe add to system env PATH and you can start your work.
I usuallly don’t use the text editor and shell that comes with it. Don’t get me wrong – Spyder is quite decent, it has support for debugging and allows you to work without even setting any directories/paths in system. However as I mentioned previously, one of my motivations for looking for some other environment than Mathematica was possibility to have one env for “everything” and running and learning yet another app doesn’t satisfy those conditions.
Instead I use a general-purpose text editor Sublime Text that I was recommended by a friend and I just love it. This is definitely the best and easiest programmers text editor I have used so far – and it has some basic “Intellisense-like” features, syntax coloring for almost any language, build-system features, projects, package manager, tons of plugins and you can do everything using either your mouse or keyboard. It looks great on every platform and is very user friendly (so don’t try to convert me, vim users! 😉 ). Trial is unlimited and free, so give it a try or just check out its programmer-oriented text-editing features on website – and if you like it, buy the licence.
So basically to write a new script I just create new tab in Sublime Text which I have always opened, write code (which almost always ends up in 10-100 lines for simple math related tasks), save it in my draft folder with .py extension, press ctrl+b and get it running – definitely the workflow I was looking for. 🙂
One quite serious limitation for various graphics-related work is lack of symbolic analysis comparable to Mathematica in NumPy/SciPy. We are very often interested in finding integrals and then simplifying them under some assumptions. There is one package and one tool in Python to help in those tasks, sympy is even part of SciPy and WinPython package – but I’ll be honest – I haven’t used them, so cannot really say anything more… If I will have any experience with them, I will probably write a follow-up post. For the time being I still stick with Mathematica as definitely the best toolset for symbolic analysis.
To demonstrate usage of Python in computer-graphics related mathematical analysis, I will try to back up some of my future posts with simple Python scripts and I hope this will help at least some of you.
[1] http://www.wolfram.com/mathematica/
[2] http://blog.selfshadow.com/publications/s2013-shading-course/
[3] http://www.altdevblogaday.com/2013/10/07/wolframs-mathematica-101/
[4] http://c0de517e.blogspot.ca/
[6] http://www.gnu.org/software/octave/
[8] http://en.wikipedia.org/wiki/SciPy#The_SciPy_Library.2FPackage
[9] http://winpython.sourceforge.net/
[10] http://sourceforge.net/p/winpython/wiki/PackageIndex_33/
My first post is going to be more of a post just for myself – my practice with the wordpress and its layouts. To be honest I never did have a blog of any kind (well, except for a “homepage” in early 00’s – with obligatory guestbook written in php, hobbies, ugly hover images for buttons etc. – but who didn’t own one at that time? 🙂 ), so I’m really inexperienced in that field and the beginning can be rough – especially that I was thinking about starting one for quite a long time.
That’s why I start from something relatively easy – just couple thoughts and a small gallery from my vacations – not much text for you to read or me to write, just lots of settings to fight with.
I got to visit Cuba for winter holidays 2013 – after just having shipped Assassin’s Creed 4: Black Flag two months ago. For everyone who doesn’t know the game (check it out!) – it takes place during the “golden age of piracy” in various cities and villages set in Caribbean Sea, including Spanish-era Cuban Havana. So I spent over a year of my life (after joining Ubisoft Montreal) trying to help to create technology to depict realistically something I got to see in real life afterwards – which I find quite ironic. 🙂
Anyway, the visit itself was really interesting experience. I won’t bore you to death with something that you can find way more competent people writing about. But personally I found travelling through this island interesting due to three aspects.
First of all, I was really surprised by similarities it bears with early 90’s Eastern Europe and Poland in particular (where I come from) and in general its communistic architecture, social structure or commercial organisation that was still present in European post-communistic period. Even some cars like old Fiat 125/126p or Ladas are the ones that were a common dream of 70s/80s Polish family – that’s not something you can read about in travel guides written for English-speaking people. And looking how country dynamically changes I think that it will probably fade away in couple years, so if I think it’s worth hurrying up to still “experience” it.
Secondly, I expected architectural and cultural variety, but still it was an amazing experience – colonialism and Spanish era, African influences, native Caribbean culture, modernism and art-deco, revolution and Soviet-bloc influences… all create really unique and vivid mixture that you want to immerse in.
Last but not least and from the perspective of a graphics programmer I was pleasantly surprised by variety of landscapes (from jungles that you travel through in Russian old trucks, sandy beautiful beaches, rocky mountains full of extremely dense foliage to colourful colonial towns) and weather changes. I am quite proud how Ubisoft teams (scattered around the world) captured it accurately – for 6 platforms, even for older consoles (and developing a launch title for next generation consoles!) and without any physically-based or even HDR/gamma-correct lighting pipeline. I think that all of our artists did really great job and the art direction was really accurate.
Just a couple of photos from my vacations. It was visually really inspiring experience, so I hope you will enjoy some of those and maybe it will inspire you as well.
You must be logged in to post a comment.