CSharpRenderer Framework update

In couple days I’m saying goodbye to my big desktop PC for several next weeks (relocation), so time to commit some stuff to my CSharpRenderer GitHub repository that was waiting for it for way too long. :)

Startup time optimizations

The goal of this framework was to provide as fast iterations as possible. At first with just few simple shaders it wasn’t a big problem, but when it started growing it became something to address. To speed it up I did following two optimizations:

Gemetry obj file caching

Fairly simple – create a binary instead of loading and processing obj file text every time. On my hd in Debug mode gives up to two seconds of start-up time speed-up.

Multi-tasked shader compilation

Shader compilation (pre-processing and building binaries) is trivialy parallelizable, so I simply needed to make sure it’s stateless and only loading binaries to driver and device happens from a main, immediate context.

I highly recommend .NET Task Parallel Library – it is both super simple and powerful, has very nice syntax with lambdas and allows for complex task dependencies (child tasks, task continuations etc.). It also hides from user problematic thread vs task management (think with tasks and multi-tasking, not multiple threads!). I didn’t use all of its power (like Dataflow features which would make sense), but it is definitely worth taking into consideration when developing any form of multitasking in .NET.

Additional tools for debugging


I added simple features toggles (auto-registered and auto-reloaded UI) to allow easier turning on-off from within the UI. To provide additional debugging help with this feature and also some other features (like changing a shader when optimizing and checking if anything changed quality-wise and in which parts of the scene) I added option of taking “snapshots” of final image. I supports quickly switching between snapshot and current final image or displaying snapshot vs current image difference. Much faster than reloading a whole shader.

Half resolution / bilateral upsampling helpers

Some helper code to generate offsets texture for bilateral upsampling. For every full res pixel it generates offset information that depending on depth differences between full-res and half-res pixels uses either original bilinear information (offset equal zero) or snaps to edge-bilinear (instead of quad-bilinear) or even point sampling (closest depth) from low resolution texture when depth differences are big. Benefit of doing it this way (not in every upscale shader) is much less shader complexity and potentially performance (when having multiple half res – > full res steps); also less used registers and better occupancy in final shaders.


Physically-correct env LUTs, cubemap reflections and (less correct) screen-space reflections

I added importance sampling based cubemap mip chain generation for GGX distribution and usage of proper environment light LUTs – all based on last year’s Brian Karis Siggraph talk.

I also added very simple screen-space reflections. They are not full performance (reflection calculation code is “simple”, not super-optimized) or quality (noise and temporal smoothing), more as a demonstation of the technique and showing why adding indirect specular occlusion is so important.

Screen-space reflections are temporally supersampled with additional blurring step (source of not being physically correct) and by default look very subtle due to lack of metals or very glossy materials, but still useful for occluding indirect speculars.


As they re-use previous frame lighting buffer we actually get multi-bounce screen-space reflections at cost of increasing temporal smoothing and trailing of moving objects.

Weather to use them or not in a game is something I don’t have a clear opinion on – my views were expressed in one of first posts in this blog. :)


I probably won’t update the framework because of having only MacBook Pro available for at least several weeks / possibly months (unless I need to integrate a critical fix), but I plan to do quite big write-up about my experiences with creating efficient next-gen game post-processing pipeline and optimizing it – and later definitely post some source code. :)

Posted in Code / Graphics | Tagged , , , , , , , | 2 Comments

Review: “Multithreading for Visual Effects”, CRC Press 2014

Today I wrote a short review about a book I bought and read recently – “Multithreading for Visual Effects” published by CRC Press 2014 and including articles by Martin Watt, Erwin Coumans, George ElKoura, Ronald Henderson, Manuel Kraemer, Jeff Lait, James Reinders. Couple friends asked me if I recommend it, so I will try to briefly describe its contents and who I can recommend it for.


What this book is not

This book is a collection of various VFX related articles. It is not meant to be a complete / exhaustive tutorial for designing multi-threaded programs or algorithms or how VFX industry approaches multithreading in general. On the other hand, I don’t really feel it’s just a collection of technical papers / advancements like ShaderX or GPU Pro books are. It doesn’t include very detailed presentation of any algorithm or technique. Rather it is a collection of post-mortems of various studios, groups and people working on a specific piece of technology and how they had to face multi-threading, problems they encountered and how they solved them.

Lots of articles have no direct translation to games or real time graphics – you won’t get any ready-to-use recipe for any specific problem, so don’t expect it.

What I liked about it

I really enjoyed practical aspects of the book – talking about actual problems. Most of the problem comes from the fact that existing code bases contain tons of not threaded / tasked, legacy code with tons of global states and “hacks”. It is trivial to say “just rewrite bad code”, but when talking about technology developed for many years, producing desired results and already deployed in huge studios (seems that VFX studios are often order of magnitude larger than game ones…) obviously it is rarely possible. One article provides very interesting reasoning in whole “refactor vs rewrite” discussion.

Authors are not afraid to talk about such not-perfect code and provide practical information how to fix it and avoid such mistakes in the future. There are at least couple articles that mention best code practices and ideas about code design (like working on contexts, stateless / functional approach, avoiding global states, thinking in tasks etc.).

I also liked that authors provided very clear description of “failures” and practicality of final solutions, what did and what didn’t work and why. Definitely this is something most scientific / academic papers are lacking, but here it was described clearly and will definitely help readers.

Short chapters descriptions

“Introduction and Overview”, James Reinders

Brief introduction in history of hardware, its multi-threading capabilities and why they are so important. Distinction between threading and tasking. Presentation of different parallel computations solutions easily available in C++ – OpenMP, Intel TBB, OpenCL and others. Very good book introduction.

“Houdini, Multithreading existing software”, Jeff Lait

Great article about the problem of multithreading existing, often legacy code bases. Description of best practices when designing multi-threaded/tasked code and how to fix existing, not perfect one (and various kinds of problems / anti-patterns you may face). I can honestly recommend this article to any game or tools programmer.

“The Presto Execution System: Designing for Multithreading”, George ElKoura

Introductory article about threaded systems designs when dealing with animations. Very beneficial for any engine or tools programmers as describes many options for parallelism strategies, their pros and cons. Final applied solution is not really applicable for games run-time, but IMO this article is a still very practical and good read for game programmers.

“LibEE: Parallel Evaluation of Character Rigs”, Martin Watts

Second chapter exclusively about animations, but applicable to any node/graph-based systems and their evaluation. Probably my favorite article in the book because of all the performance numbers, compared approaches and practical details. I really enjoyed its in-depth analysis of several cases, how multi-tasking worked on specific rigs and how content creators can (and probably at some point will have to) optimize their content for optimal and parallel evaluation. The last part is something often not covered by any articles at all.

“Fluids: Simulation on the CPU”, Ronald Henderson

Interesting article describing process of picking and evaluating most efficient parallel data structures and algorithms for specific case of fluids simulation. It is definitely not exhaustive description of fluids simulation problem, but rather example analysis of parallelizing a specific problem – very inspiring.

“Bullet Physics: Simulation with OpenCL”, Erwin Coumans

Introduction to GPGPU with OpenCL with case study of Bullet physics engine. Introduction to rigid body simulation, collision detection (tons of references to great “Real-time collision detection“) nicely overlapping with description of OpenCL, GPGPU / compute simulations and differences between them and classic simulation solutions.

“OpenSubdiv: Interoperating GPU Compute and Drawing”, Manuel Kraemer

IMO the most specialized article. As I’m not an expert on mesh topologies, tessellation and Catmull-Clark surfaces it was for me quite hard to follow. Still, depiction of the title problem is clear and proposed solutions can be understood even by someone who doesn’t fully understand the domain.

Final words / recommendation

I feel that with next-gen and bigger game levels, vertex counts and texture resolutions we need not only better runtime algorithms, but also better content creation and modification pipelines. Tools need to be as responsive as they used to be couple years ago, but this time with order of magnitude bigger data sets to work on. This is the area where we almost converged with the problems VFX industry faces. From discussions with many developers, it seems to be the biggest concern of most game studios at the moment – tools are lagging in development compared to the runtime part and we are just beginning to utilize network caches and parallel, multithreaded solutions.

I always put emphasis on short iteration times (they allow to fit more iterations at the same time, more prototypes that directly translate to better final quality of anything – from core gameplay to textures and lighting), but with such big data sets to process, they would have to grow unless we optimize pipelines for modern workstations. Multi-threading and multi-tasking is definitely the way to go.

Too many existing articles and books either only mentioned parallelization problem, or silently ignored it. “Multithreading for Visual Effects” is very good as it finally describes practical side of designing code for multi-threaded execution.

I can honestly recommend “Multithreading for Visual Effects” to any engine, tools and animations programmers. Gameplay or graphics programmers will benefit from it as well and hopefully it will help them create better quality code that runs efficiently on modern multi-core machines.

Posted in Code / Graphics | Tagged , , , | Leave a comment

Python as scientific toolbox – 8 months later

I started this blog with a simple post about my attempts to find free Mathematica replacement tool for general scientific computing with focus on graphics. At that time I recommended scientific Python and WinPython environment.
Many months have passed, I used lots of numerical Python at home, I used a bit of Mathematica at work and I would like to share my experiences – both good and bad as well as some simple tips to increase your productivity. This is not meant to be any kind of detailed description, guide or even tutorial – so if you are new to Python as scientific toolset, I recommend you to check out great Scientific Python 101 by Angelo Pesce before reading my post.
My post is definitely not exhaustive and is very personal – if you have different experiences or I got something wrong – please comment! :)

Use Anaconda distribution

In my original post I recommended WinPython. Unfortunately, I don’t use it anymore and at the moment I definitely can vote for Anaconda. One quite obvious reason for that is that I started to use MacBook Pro and Mac OSX – WinPython doesn’t work there. I’m not a fan of having different working environments and different software on different machines, so I had to find something working on both Win and MacOSX.

Secondly, I’ve had some problems with WinPython. It works great as a portable distribution (it’s very handy to have it on USB key), but once you want to make it essential part of your computational environment, problems with its registration in system start to appear. Some packages didn’t want to install, some other ones had problems to update and there were conflicts in versions. I even managed to break distro by desperate attempts to make one of packages work.

Anaconda is great. Super easy to install, has tons of packages, automatic updater and “just works”. Its registration with system is also good and “works”. Not all interesting packages are available through its packaging system, but I found no conflicts so far with Python pip, so you can work with both.

At the moment, my recommendation would be – if you have administrative rights on a computer, use Anaconda. If you don’t (working not on your computer), or want to go portable, have WinPython on your USB key – might be handy.

Python 2 / 3 issue is not solved at all

This one is a bit sad and ridiculous – perfect example of what goes wrong in all kinds of open source communities. When someone asks me if they should get Python 2.7+ or 3.4+, I simply don’t have an easy answer – I don’t know. Some packages don’t work with Python 3, some others don’t work with Python 2 anymore. I don’t feel there is any strong push for Python 3, for “compatibility / legacy reasons”… Very weird situation and definitely blocks development of the language.

At the moment I use Python 2, but try to use imports from __future__ and write everything compatible with Python 3, so I won’t have problems if and when I switch. Still, I find lack of push in the community quite sad and really limiting the development/improvement of the language.

Use IPython notebooks

My personal mistake was that for too long I didn’t use the IPython and its amazing notebook feature. Check out this presentation, I’m sure it will convince you. :)

I was still doing oldschool code-execute-reload loop that was hindering my productivity. With Sublime Text and Python registered in the OS it is not that bad, but still, with IPython you can get way better results. Notebooks provide interactivity maybe not as good as Mathematica, but comparable to and much better than regular software development loop. You can easily re-run, change parameters, debug, see help and profile your code and have nice text, TeX or image annotations. IPython notebooks are easy to share, store and to come back to later.

Ipython as shell is also quite ok itself – even as environment to run your scripts from (with handy profiling macros, help or debugging).

NumPy is great and very efficient…

NumPy is almost all you need for your basic numerical work. SciPy linear algebra packages (like distance arrays, least squares fitting or other regression methods) provide almost everything else. :) For stuff like Monte Carlo, numerical integration, pre-computing some functions and many others I found it sufficient and performing very well. Slicing and indexing options can be not obvious at beginning, but once you get some practice they are very expressive. Big volume operations can boil down to a single expression with implicit loops over many elements that are internally written in efficient C. If you ever worked with Matlab / Octave you will feel very comfortable with it – to me it is definitely more readable than weird Mathematica syntax. Also interfacing with file operations and many libraries is trivial – Python becomes expressive and efficient glue code.

…but you need to understand it and hack around silent performance killers

On the other hand, using NumPy very efficiently requires quite deep understanding of its internal way of working. This is obvious and true in case of any programming language, environment or algorithm – but unfortunately in case of numerical Python it can be very counter-intuitive. I won’t cover examples here (you can easily find numerous tutorials on numpy optimizations), but often writing efficient code means writing not very readable and not self-documenting code. Sometimes there are absurd situations like some specialized functions performing worse than generic ones, or need to write incomprehensible hacks (funniest one was suggestion to use complex numbers as most efficient way for simple Euclidean distance calculations)… Hopefully after couple numerically heavy scripts you will understand when NumPy does internal copies (and it does them often!), that any Python iteration over elements will kill your perf, that you need to try to use implicit loops and slicing etc.

There is no easy way to use multiple cores

Unfortunately, multithreading, multitasking and parallelism are simply terrible in Python. Whole language wasn’t designed to be multitasked / multithreaded and Global Interpreter Lock as part of language design makes it a problem almost impossible to solve. Even if most NumPy code releases GIL, there is quite a big overhead from doing so and other threads becoming active – you won’t notice big speed-ups if you don’t have really huge volumes of work done in pure, single NumPy instructions. Every single line of Python glue-code will become a blocking, single-threaded path. And according to Amdahl’s law, it will make any massive parallelism impossible. You can try to work around it using multiprocessing – but in such case it is definitely more difficult to pass and share data between processes. I haven’t researched it exhaustively – but anyway no simple / annotation based (like in OpenMP / Intel TBB) solution exists.

SymPy cannot serve as replacement for Mathematica

I played with SymPy just several times – it definitely is not any replacement for symbolic operations in Mathematica. It works ok for symbol substitution, trivial simplification or very simple integrals (like regular Phong normalization), but for anything more complex (normalizing Blinn-Phong level… yeah) it doesn’t work – after couple minutes (!) of calculations produces no answer. Its syntax is definitely not as friendly for interactive work like Mathematica as well. So for symbolic work it’s not any replacement at all and isn’t very useful. One potential benefit of using it is that it embeds nicely and produces nice looking results in IPython notebooks – can be good for sharing them.

No very good interactive 3D plotting

There is matplotlib. It works. It has tons of good features

…But its interactive version is not embeddable in IPython notebooks, 3D plotting runs very slow and is quite ugly. In 2D there is beautiful Bokeh generating interactive html files, but nothing like that for 3D. Nothing on Mathematica level.

I played a bit with Vispy – if they could create as good WebGL backend for IPython notebooks like they promise, I’m totally for it (even if I have to code visualizations myself). Until then it is “only” early stage project for quickly mapping between numerical Python data and simple OpenGL code – but very cool and simple one, so it’s fun to play with it anyway. :)

There are packages for (almost) everything!

Finally, while some Python issues are there and I feel won’t be solved in the near future (multithreading), situation is very dynamic and changes a lot. Python becomes standard for scientific computing and new libraries and packages appear every day. There are excellent existing ones and it’s hard to find a topic that wasn’t covered yet. Image processing? Machine learning? Linear algebra? You name it. Just import proper package and adress the problem you are trying to solve, not wasting your time on coding everything from scratch or integrating obscure C++ libraries.
Therefore I really believe it is worth investing your time in learning it and adapting to your workflow. I wish it became standard for many CS courses at universities instead of commercial Matlab, poorly interfaced Octave or professors asking students to write whole solutions in C++ from scratch. At least in Poland they definitely need more focus on problems, solutions and algorithms, not on coding and learning languages…

Posted in Code / Graphics | Tagged , , , , , , | 1 Comment

New debugging options in CSharpRenderer framework

Hi, minor update to my C#/.NET graphics rendering framework / playground got just submitted to GitHub. I implemented following new features:

Surface debugging snapshots

One of commentators asked me how to easily display for debug SSAO buffer – I had no easy answer (except for hacking shaders). I realized that very often debugging requires displaying various buffers that can change / get overwritten in time – we cannot rely simply on grabbing such surface at the end of the frame for display…


Therefore I added option to create in code various surface “snapshots” that get copied to a debug buffer if needed. They are copied at given time only if user requested such copy. You can display RGB, A or fractional values (useful for depth / world position display). In future there could be some options for linearization / sRGB, range stretching clamping etc., but for now I didn’t need it. :)

Its use is trivial – in code after rendering information that you possibly could want debugged – like SSAO and its blurs, write:

// Adding SSAO debug
SurfaceDebugManager.RegisterDebug(context, "SSAOMain", ssaoCurrent);

// Do SSAO H blurring (...)

// Adding SSAO after H blur debug
SurfaceDebugManager.RegisterDebug(context, "SSAOBlurH", tempBlurBuffer);

// Adding SSAO after V blur debug
SurfaceDebugManager.RegisterDebug(context, "SSAOBlurV", tempBlurBuffer);

No additional shader code is needed and debug display is handled on “program” level. Passes get automatically registered and UI refreshed.

GPU debugging using UAVs

Very often when writing complex shader code (especially compute shaders) we would like to have some “printf” / debugging options. There are many tools for shader debugging that (I personally recommend excellent and open-source RenderDoc), but often launching external tools adds time overhead and it can be not possible to debug everything in case of complex compute shaders.

We would like to have “printf” like functionality form the GPU. While no APIs provide it, we can use append buffers UAVs and simply hack it in shaders and later either display such values on screen or lock/read them on the CPU. This is not a new idea.

I implemented very rough and basic functionality in the playground and made it work for both Pixel Shaders and Compute Shaders.


It doesn’t require any regular code changes, just shader file change + checking option in UI to ON.

In pixel shader one would write something like:

float3 lighting = finalLight.diffuse * albedo + finalLight.specular;

if (DEBUG_FILTER_VPOS(i.position, 100, 100))
 DebugInfo(i.position.xyz, lighting);

In compute shader you would write accordingly:

float4 finalOutValue = float4(lighting * scattering, scattering + absorbtion);

if (DEBUG_FILTER_TID(dispatchThreadID, 10, 10, 10))
 DebugInfo(dispatchThreadID, finalOutValue);

If you don’t want to specify / filter values manually in shader you don’t have to – you can override position filter in UI and use DEBUG_FILTER_CHECK_FORCE_VPOS and DEBUG_FILTER_CHECK_FORCE_TID macros instead.

If you want to debug pixels you can even automatically set those filter values in UI from last viewport click position (hacked, returns coords only in full resolution, but can be useful when trying to track negative values / NaN sources).

Minor / other

  • Improved a bit temporal AA based on luma clamping only – loosely inspired by excellent Brian Karis UE4 Siggraph AA talk. :)
  • Added small dithering / noise to the final image – I quite like this subtle effect.
  • Extracted global / program constant buffer – very minor, but could help reorganizing code in the future.
  • Option to freeze the time – useful for debugging / comparison screenshots.


Posted in Code / Graphics | Tagged , , , , , | 4 Comments

Updated Poisson-like generator with GUI and more


Just a super short note:

I updated my simple rendering-oriented Poisson-like pattern generator with:

  • Very simple GUI made in PyQt to make experimenting easier.
  • Option to do rotating disk (with minimizing rotated point distance) for things like Poisson bokeh / shadow maps PCF.
  • Better visualizations with guidelines.
  • …And optimized algorithm a bit.

I’m definitely finished working with it (unless I find some bugs and will fix them), it was mostly done to learn to create Python GUIs quickly and for fun. :)

And also I have started writing a longer blog note about my experiences with Python as scientific environment and free / open source alternative to Mathematica. What worked well, what didn’t and some tips & tricks to avoid my mistakes. :)

Stay tuned!

Posted in Code / Graphics | Tagged , , , , , , , , , | Leave a comment

Major C#/.NET graphics framework update + volumetric fog code!

As I already promised too many times, here comes major CSharpRenderer framework update!

As always, all code available on GitHub.

Note that the goal is still the same – not to write most beautiful or fast code, but to provide a prototype playground / framework for hacking and having fun with iteration times approaching 0. :) It still will undergo some major changes.

Apart from volumetric code as example for my Siggraph talk (which is not in perfect shape code quality wise – it is supposed to be a quickly written demo of this technique; note also that this is not the code that was used for shipping game, it is just a demo; original code had some NDAd and console specific optimizations), other major changes cover:

“Global” shader defines visible from code

You can define some constant as “global” one in shader and immediately have it reflected in C# side after changing / reloading. This way I removed some data / code duplication and potential for mistakes.


// shader side

#define GI_VOLUME_RESOLUTION_X 64.0 // GlobalDefine

// C# side

m_VolumeSizeX = (int)ShaderManager.GetUIntShaderDefine("GI_VOLUME_RESOLUTION_X");

Derivative maps

Based on old but excellent post by Rory Driscoll. I didn’t see much sense in computing tangent frames in mesh preprocessing for needs of such simple framework. I used “hack” of using normal maps as derivative map approximation – doesn’t really care in such demo case.

“Improved” Perlin noise textures + generation

Just some code based on state of the art article from GPU Pro by Simon Green. Used in volumetric fog for some animated, procedural effect.

Very basic implementation of BRDFs

GGX Specular based on a very good post about optimizing it by John Hable.

Note that lighting code is a bit messy now, its major clean-up is my next task.


Minor changes added are:

  • UI code clean-up and dynamic UI reloading/recreating after constant buffer / shader reload.
  • Major constants renaming clean-up.
  • Actually fixing structured buffers.
  • Some simple basic geometric algorithms I found useful.
  • Adding shaders to project (actually I had it added, have no idea why it didn’t get in the first submit…).
  • Some more easy-to-use operations on context (blend state, depth state etc.).
  • Simple integers supported in constant buffer reflection.
  • Other type of temporal AA – accumulation based, trails a bit – I will later try to apply some ideas from excellent Epic UE4 AA talk.
  • Time-delta based camera movement (well, yeah…).
  • Fixed FPS clamp – my GPU was getting hot loud. :)
  • More use of LUA constant buffer scripting – it is very handy and serves purpose very well.
  • Simple basis for “particle” rendering based on vertex shaders and GPU buffer objects.
  • Some stupid animated point light.
  • Simple environment BRDF approximation by Dimitar Lazarov from Black Ops 2

Future work

Within next few weeks I should update it with:

  • Rewriting post-effects, tone-mapping etc.
  • Adding GPU debugging
  • Improving temporal techniques
  • Adding naive screens-pace reflections and an env cube-map
  • Adding proper area light support (should work super-cool with volumetric fog!)
  • Adding local lights shadows
Posted in Code / Graphics | Tagged , , , , , , , , | 14 Comments

Siggraph 2014 talk slides are up!

As promised during my talk, I have just added Siggraph 2014 Advances in the Real-Time rendering slides, check them out at my Publications page. Some extra future ideas I didn’t manage to cover in time are in bonus slides section, so be sure to check it out.

They should be also soon online at “Advances in the Real-Time” web page. When they land there, check them the whole page out as there was lots of amazingly good and practical content in this course this year! Thanks again to Natalya Tatarchuk for organizing the whole event.

…Also I promised that I will release some source code soon, so stay tuned! :)

Posted in Code / Graphics | Tagged , , , , , , , | Leave a comment