In couple days I’m saying goodbye to my big desktop PC for several next weeks (relocation), so time to commit some stuff to my CSharpRenderer GitHub repository that was waiting for it for way too long. 🙂
Startup time optimizations
The goal of this framework was to provide as fast iterations as possible. At first with just few simple shaders it wasn’t a big problem, but when it started growing it became something to address. To speed it up I did following two optimizations:
Gemetry obj file caching
Fairly simple – create a binary instead of loading and processing obj file text every time. On my hd in Debug mode gives up to two seconds of start-up time speed-up.
Multi-tasked shader compilation
Shader compilation (pre-processing and building binaries) is trivialy parallelizable, so I simply needed to make sure it’s stateless and only loading binaries to driver and device happens from a main, immediate context.
I highly recommend .NET Task Parallel Library – it is both super simple and powerful, has very nice syntax with lambdas and allows for complex task dependencies (child tasks, task continuations etc.). It also hides from user problematic thread vs task management (think with tasks and multi-tasking, not multiple threads!). I didn’t use all of its power (like Dataflow features which would make sense), but it is definitely worth taking into consideration when developing any form of multitasking in .NET.
Additional tools for debugging
I added simple features toggles (auto-registered and auto-reloaded UI) to allow easier turning on-off from within the UI. To provide additional debugging help with this feature and also some other features (like changing a shader when optimizing and checking if anything changed quality-wise and in which parts of the scene) I added option of taking “snapshots” of final image. I supports quickly switching between snapshot and current final image or displaying snapshot vs current image difference. Much faster than reloading a whole shader.
Half resolution / bilateral upsampling helpers
Some helper code to generate offsets texture for bilateral upsampling. For every full res pixel it generates offset information that depending on depth differences between full-res and half-res pixels uses either original bilinear information (offset equal zero) or snaps to edge-bilinear (instead of quad-bilinear) or even point sampling (closest depth) from low resolution texture when depth differences are big. Benefit of doing it this way (not in every upscale shader) is much less shader complexity and potentially performance (when having multiple half res – > full res steps); also less used registers and better occupancy in final shaders.
Physically-correct env LUTs, cubemap reflections and (less correct) screen-space reflections
I added importance sampling based cubemap mip chain generation for GGX distribution and usage of proper environment light LUTs – all based on last year’s Brian Karis Siggraph talk.
I also added very simple screen-space reflections. They are not full performance (reflection calculation code is “simple”, not super-optimized) or quality (noise and temporal smoothing), more as a demonstation of the technique and showing why adding indirect specular occlusion is so important.
Screen-space reflections are temporally supersampled with additional blurring step (source of not being physically correct) and by default look very subtle due to lack of metals or very glossy materials, but still useful for occluding indirect speculars.
As they re-use previous frame lighting buffer we actually get multi-bounce screen-space reflections at cost of increasing temporal smoothing and trailing of moving objects.
Weather to use them or not in a game is something I don’t have a clear opinion on – my views were expressed in one of first posts in this blog. 🙂
I probably won’t update the framework because of having only MacBook Pro available for at least several weeks / possibly months (unless I need to integrate a critical fix), but I plan to do quite big write-up about my experiences with creating efficient next-gen game post-processing pipeline and optimizing it – and later definitely post some source code. 🙂