Jekyll2024-02-06T03:31:53+00:00https://torust.me/feed.xmltorust.metechnical and non-technical musings from a computer graphics tourist.Ambient Dice Specular Approximation2019-06-25T00:00:00+00:002019-06-25T00:00:00+00:00https://torust.me/2019/06/25/ambient-dice-specular<p><em><a href="#ResultImages">Jump to result images</a></em></p>
<blockquote>
<p>Before we get into the technique: why might you want to use this? Perhaps you want specular lightmaps, have been considering spherical Gaussians, but want something cheaper and generally higher quality; this provides that.</p>
</blockquote>
<blockquote>
<p>Alternatively, you might currently store radiance for light probes in order-2 spherical harmonics (with 9 coefficients per colour channel) and would like indirect specular from them; for three more coefficients per channel, you can store radiance in the Ambient Dice format and get both diffuse and specular lighting with this technique.</p>
</blockquote>
<blockquote>
<p>For rough materials, this technique could even replace specular cubemaps; you could choose or blend between the Ambient Dice specular, light probes, and screen-space reflections based on roughness.</p>
</blockquote>
<p><a href="/2019/06/21/rbf-for-indirect-specular.html">In the last post</a>, I introduced the idea of using the <a href="http://miciwan.com/EGSR2017/AmbientDice.pdf">Ambient Dice</a> basis function for both diffuse and specular irradiance, and briefly described my method for the fit. This post will cover the specular fit in more detail. While it focuses on the Ambient Dice SRBF basis function and single-scattering GGX specular, it should be fairly straightforward to extend to other basis functions (particularly cosine-lobe-based ones) or BRDFs.</p>
<p>The ShaderToy below shows my fit against the reference (note: I’ve tested this to work in Firefox and Chrome, but it doesn’t appear to display in Safari):</p>
<iframe width="800" height="450" frameborder="0" src="https://www.shadertoy.com/embed/3d23Wz?gui=true&t=10&paused=true&muted=false"></iframe>
<p><a href="https://www.shadertoy.com/view/3d23Wz">ShaderToy link with source code</a></p>
<p><em>The left half of the sphere is the approximation, and the right half is the ground truth, where green is the f0 and blue is the f90 material scale factor. The sphere is parameterised by viewing direction; the edges of the sphere are at grazing angles, while the centre is viewing along the normal. The red dot is the lobe direction; you can move it around the sphere by dragging up and down with the mouse. Dragging left to right will change the surface roughness.</em></p>
<p>In general, finding the integral of a specular BRDF with illumination from an arbitrary basis function is a difficult problem due to the large number of free parameters. For a general specular model parameterised by some isotropic roughness \(\alpha\), normal direction \(n\), and reflectance at normal and grazing angles \(f_0\) and \(f_{90}\), the illumination from a light source in some linear basis is given by:</p>
\[C_j(\omega_o ) = \int_\Omega B_i(\omega_i ) f_{br}(\alpha, \omega_i, \omega_o, n, f_0, f_{90}) d\omega_i\]
<p>As far as I’m aware, there’s no closed-form solution to this integral for the GGX BRDF that I use for specular<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. Instead, we can either use Monte-Carlo integration to evaluate it, or we can use a fitted approximation or lookup table. Unfortunately, using Monte Carlo integration for this is overly expensive for real-time applications; visual artefacts are still readily apparent with as many as 32 samples when estimating \(C_j(\omega_o )\) – try setting <code class="language-plaintext highlighter-rouge">sampleCount</code> to <code class="language-plaintext highlighter-rouge">32</code> in <code class="language-plaintext highlighter-rouge">groundTruth</code> in the ShaderToy to see this in effect.</p>
<p>However, it <em>is</em> possible to derive a reasonably good fit to the specular irradiance. There are three key observations that we can use:</p>
<ul>
<li>For a perfectly smooth specular reflector with the roughness parameter \(\alpha\) approaching zero, the BRDF becomes a delta function oriented in the surface’s reflection direction (given the view direction). The radiance in this case is just the basis function evaluated in the reflection direction multiplied by the <a href="https://en.wikipedia.org/wiki/Schlick%27s_approximation">Fresnel response</a>.</li>
<li>
<p>For a very rough surface, the specular response will approach a Lambertian diffuse response. It so happens that in this case, scaling the diffuse irradiance by the BRDF’s response in a split-sum approximation (as inspired by Karis’ <a href="https://blog.selfshadow.com/publications/s2013-shading-course/karis/s2013_pbs_epic_slides.pdf">Real Shading in Unreal Engine 4</a>) is a reasonably close match to the ground truth:</p>
\[C_j(\omega_o ) \approx (\int_\Omega B_i(\omega_i ) d\omega_i) (\int_\Omega f_{br}(\alpha, \omega_i, \omega_o, n, f_0, f_{90}) d\omega_i)\]
</li>
<li>We can use a lookup texture to store the BRDF response for different viewing directions and across the roughness range in a manner similar to Karis’ approach for image-based lighting.</li>
</ul>
<p>Theres observations enable us to evaluate the specular irradiance at the two extremes of the roughness spectrum. The most obvious thing to do for the middle, therefore, is just to blend between them.</p>
<p>Unfortunately, using the split-sum approximation across the entire range gives fairly poor results – separating out the BRDF from the basis function response only really works for high roughnesses. Let’s take a step back and assume that we don’t have a lookup table at all. In that case, the best we can do is perform a parameterised lerp between the fully smooth response and the fully rough diffuse response, which could look something like this:<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">float2</span> <span class="nf">ApproximateAmbientDiceLobeSpecular</span><span class="p">(</span><span class="n">float3</span> <span class="n">lobeDirection</span><span class="p">,</span> <span class="n">float3</span> <span class="n">viewDirection</span><span class="p">,</span> <span class="kt">float</span> <span class="n">ggxAlpha</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">float</span> <span class="n">NdotLobe</span> <span class="o">=</span> <span class="n">dot</span><span class="p">(</span><span class="n">normal</span><span class="p">,</span> <span class="n">lobeDirection</span><span class="p">);</span>
<span class="n">float3</span> <span class="n">reflectionDir</span> <span class="o">=</span> <span class="n">reflect</span><span class="p">(</span><span class="o">-</span><span class="n">viewDirection</span><span class="p">,</span> <span class="n">normal</span><span class="p">);</span>
<span class="kt">float</span> <span class="n">RdotLobe</span> <span class="o">=</span> <span class="n">dot</span><span class="p">(</span><span class="n">reflectionDir</span><span class="p">,</span> <span class="n">lobeDirection</span><span class="p">);</span>
<span class="kt">float</span> <span class="n">basisInMirrorDir</span> <span class="o">=</span> <span class="n">AmbientDiceCosineBasisFunction</span><span class="p">(</span><span class="n">RdotLobe</span><span class="p">);</span>
<span class="kt">float</span> <span class="n">diffuseParam</span> <span class="o">=</span> <span class="n">lerp</span><span class="p">(</span><span class="n">RdotLobe</span><span class="p">,</span> <span class="n">NdotLobe</span><span class="p">,</span> <span class="n">saturate</span><span class="p">(</span><span class="n">ggxAlpha</span><span class="p">));</span>
<span class="kt">float</span> <span class="n">diffuse</span> <span class="o">=</span> <span class="n">EvaluateAmbientDiceLobeDiffuse</span><span class="p">(</span><span class="n">diffuseParam</span><span class="p">);</span>
<span class="k">return</span> <span class="n">lerp</span><span class="p">(</span><span class="n">basisInMirrorDir</span><span class="p">,</span> <span class="n">diffuse</span><span class="p">,</span> <span class="n">saturate</span><span class="p">(</span><span class="n">ggxAlpha</span><span class="p">));</span>
<span class="p">}</span>
</code></pre></div></div>
<p>This is obviously going to be fairly inaccurate – we’re not accounting for the specific BRDF response at all. To fix this, we can reintroduce the lookup table; however, rather than storing the BRDF response, the table instead stores <em>whatever value will make our approximation match the true value</em> – in other words, the lookup table stores the true value divided by the lerped approximation. This means that we can just multiply our approximation by the lookup table’s value to get the true irradiance value.</p>
<p>Let’s bring that into the code snippet above. Note that this time I’ve included my fitted parameters for the interpolation;<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> I chose a quartic in \(\sqrt{\alpha}\), but it’s quite likely that there are better possibilities.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">const</span> <span class="kt">float</span> <span class="n">parameters</span><span class="p">[</span><span class="mi">8</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">3</span><span class="p">.</span><span class="mo">04</span><span class="mi">98910522220495</span><span class="p">,</span> <span class="o">-</span><span class="mi">6</span><span class="p">.</span><span class="mi">983002509990005</span><span class="p">,</span> <span class="mi">7</span><span class="p">.</span><span class="mi">388270435580356</span><span class="p">,</span> <span class="o">-</span><span class="mi">2</span><span class="p">.</span><span class="mi">662756921813306</span><span class="p">,</span> <span class="o">-</span><span class="mi">0</span><span class="p">.</span><span class="mi">4005429486854629</span><span class="p">,</span> <span class="mi">5</span><span class="p">.</span><span class="mi">626699351644211</span><span class="p">,</span> <span class="o">-</span><span class="mi">6</span><span class="p">.</span><span class="mo">0400</span><span class="mi">98716506305</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">9006124935607012</span> <span class="p">};</span>
<span class="kt">float</span> <span class="nf">EvaluateAmbientDiceLobeDiffuse</span><span class="p">(</span><span class="kt">float</span> <span class="n">cosTheta</span><span class="p">)</span> <span class="p">{</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">.</span><span class="mo">05</span><span class="mi">981</span> <span class="o">+</span> <span class="mi">0</span><span class="p">.</span><span class="mi">12918</span> <span class="o">*</span> <span class="n">cosTheta</span> <span class="o">+</span> <span class="mi">0</span><span class="p">.</span><span class="mo">07056</span> <span class="o">*</span> <span class="n">cosTheta</span> <span class="o">*</span> <span class="n">cosTheta</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">float</span> <span class="nf">EvaluateAmbientDiceLobeSpecular</span><span class="p">(</span><span class="n">float3</span> <span class="n">lobeDirection</span><span class="p">,</span> <span class="n">float3</span> <span class="n">viewDirection</span><span class="p">,</span>
<span class="kt">float</span> <span class="n">ggxAlpha</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">float</span> <span class="n">NdotLobe</span> <span class="o">=</span> <span class="n">dot</span><span class="p">(</span><span class="n">normal</span><span class="p">,</span> <span class="n">lobeDirection</span><span class="p">);</span>
<span class="n">float3</span> <span class="n">reflectionDir</span> <span class="o">=</span> <span class="n">reflect</span><span class="p">(</span><span class="o">-</span><span class="n">viewDirection</span><span class="p">,</span> <span class="n">normal</span><span class="p">);</span>
<span class="kt">float</span> <span class="n">RdotLobe</span> <span class="o">=</span> <span class="n">dot</span><span class="p">(</span><span class="n">reflectionDir</span><span class="p">,</span> <span class="n">lobeDirection</span><span class="p">);</span>
<span class="kt">float</span> <span class="n">basisInMirrorDir</span> <span class="o">=</span> <span class="n">AmbientDiceCosineBasisFunction</span><span class="p">(</span><span class="n">RdotLobe</span><span class="p">);</span>
<span class="kt">float</span> <span class="n">sqrtAlpha</span> <span class="o">=</span> <span class="n">sqrt</span><span class="p">(</span><span class="n">ggxAlpha</span><span class="p">);</span>
<span class="kt">float</span> <span class="n">focusLerp</span> <span class="o">=</span> <span class="n">parameters</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">*</span> <span class="n">sqrtAlpha</span> <span class="o">+</span> <span class="n">parameters</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">*</span> <span class="n">ggxAlpha</span> <span class="o">+</span>
<span class="n">parameters</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">*</span> <span class="n">sqrtAlpha</span> <span class="o">*</span> <span class="n">ggxAlpha</span> <span class="o">+</span> <span class="n">parameters</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">*</span> <span class="n">ggxAlpha</span> <span class="o">*</span> <span class="n">ggxAlpha</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">diffuseParam</span> <span class="o">=</span> <span class="n">lerp</span><span class="p">(</span><span class="n">RdotLobe</span><span class="p">,</span> <span class="n">NdotLobe</span><span class="p">,</span> <span class="n">saturate</span><span class="p">(</span><span class="n">focusLerp</span><span class="p">));</span>
<span class="kt">float</span> <span class="n">diffuse</span> <span class="o">=</span> <span class="n">EvaluateAmbientDiceLobeDiffuse</span><span class="p">(</span><span class="n">diffuseParam</span><span class="p">);</span>
<span class="kt">float</span> <span class="n">alphaLerp</span> <span class="o">=</span> <span class="n">parameters</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span> <span class="o">*</span> <span class="n">sqrtAlpha</span> <span class="o">+</span> <span class="n">parameters</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">*</span> <span class="n">ggxAlpha</span> <span class="o">+</span>
<span class="n">parameters</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span> <span class="o">*</span> <span class="n">sqrtAlpha</span> <span class="o">*</span> <span class="n">ggxAlpha</span> <span class="o">+</span> <span class="n">parameters</span><span class="p">[</span><span class="mi">7</span><span class="p">]</span> <span class="o">*</span> <span class="n">ggxAlpha</span> <span class="o">*</span> <span class="n">ggxAlpha</span><span class="p">;</span>
<span class="k">return</span> <span class="n">lerp</span><span class="p">(</span><span class="n">basisInMirrorDir</span><span class="p">,</span> <span class="n">diffuse</span><span class="p">,</span> <span class="n">saturate</span><span class="p">(</span><span class="n">alphaLerp</span><span class="p">));</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">evaluate</span><span class="p">()</span> <span class="p">{</span>
<span class="n">float2</span> <span class="n">lutValue</span> <span class="o">=</span> <span class="n">AmbientDiceLUTValue</span><span class="p">(</span><span class="n">NdotV</span><span class="p">,</span> <span class="n">ggxAlpha</span><span class="p">);</span>
<span class="n">float3</span> <span class="n">specularIrradiance</span> <span class="o">=</span> <span class="n">float3</span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">);</span>
<span class="k">for</span> <span class="n">lobe</span> <span class="n">in</span> <span class="n">lobes</span> <span class="p">{</span>
<span class="n">specularIrradiance</span> <span class="o">+=</span> <span class="n">EvaluateAmbientDiceLobeSpecular</span><span class="p">(</span><span class="n">lobe</span><span class="p">.</span><span class="n">direction</span><span class="p">,</span> <span class="n">V</span><span class="p">,</span>
<span class="n">ggxAlpha</span><span class="p">,</span> <span class="n">lutValue</span><span class="p">)</span>
<span class="p">}</span>
<span class="n">specularIrradiance</span> <span class="o">*=</span> <span class="n">materialF0</span> <span class="o">*</span> <span class="n">lutValue</span><span class="p">.</span><span class="n">x</span> <span class="o">+</span> <span class="n">materialF90</span> <span class="o">*</span> <span class="n">lutValue</span><span class="p">.</span><span class="n">y</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>That’s all great – we now have a simple and efficient way of evaluating the specular irradiance from an Ambient Dice lobe at runtime. However, if you’ve been following closely, you’ll notice that we’ve still got a problem. The lookup table still needs to somehow capture all of the free parameters of the integral – the angle between the normal and viewing direction, the BRDF roughness parameter, and the two angles between the viewing direction and the lobe (since both \(\theta\) and \(\phi\) affect the result).</p>
<p>To work around this, we can approximate by using a fixed lobe direction for each roughness value, using the assumption that the scale contained in the lookup table is reasonably independent of the lobe rotation. Doing so allows us to reduce the lookup table to be two-dimensional, parameterised by the roughness \(\alpha\) and the cosine of angle between the normal and viewing direction <em>NdotV</em> in the same manner as Karis’ split-sum for IBLs. While this assumption doesn’t generally hold, it gives fairly good results in practice.<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup></p>
<p>This approach also means that the lookup table value is independent from the lobe direction, requiring only a single texture lookup per pixel (rather than one per pixel per lobe). The lerp parameters in <code class="language-plaintext highlighter-rouge">EvaluateAmbientDiceLobeSpecular</code> are also independent of the lobe direction and so can be factored out, making the per-lobe evaluation very inexpensive.</p>
<p>For the choice of lobe direction, I use an approximation from <a href="https://seblagarde.files.wordpress.com/2015/07/course_notes_moving_frostbite_to_pbr_v32.pdf">Moving Frostbite to Physically Based Rendering</a> for the dominant reflection direction for GGX – for smooth surfaces, this is the mirror reflection direction, while for rough surfaces this becomes aligned with the normal. The idea was to capture the response most accurately where the basis function would have highest intensity; however, it could be interesting to try different choices for the lobe direction to see how it affects the overall fit.</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">float3</span> <span class="nf">GGXDominantDirection</span><span class="p">(</span><span class="n">float3</span> <span class="n">N</span><span class="p">,</span> <span class="n">float3</span> <span class="n">R</span><span class="p">,</span> <span class="kt">float</span> <span class="n">roughness</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">float</span> <span class="n">smoothness</span> <span class="o">=</span> <span class="n">saturate</span><span class="p">(</span><span class="mi">1</span><span class="p">.</span><span class="n">f</span> <span class="o">-</span> <span class="n">roughness</span><span class="p">);</span>
<span class="kt">float</span> <span class="n">lerpFactor</span> <span class="o">=</span> <span class="n">smoothness</span> <span class="o">*</span> <span class="p">(</span><span class="n">sqrt</span><span class="p">(</span><span class="n">smoothness</span><span class="p">)</span> <span class="o">+</span> <span class="n">roughness</span><span class="p">);</span>
<span class="k">return</span> <span class="n">normalize</span><span class="p">(</span><span class="n">lerp</span><span class="p">(</span><span class="n">N</span><span class="p">,</span> <span class="n">R</span><span class="p">,</span> <span class="n">lerpFactor</span><span class="p">));</span>
<span class="p">}</span>
<span class="n">float2</span> <span class="nf">IntegrateLUTAmbientDice</span><span class="p">(</span><span class="kt">float</span> <span class="n">NdotV</span><span class="p">,</span> <span class="kt">float</span> <span class="n">ggxAlpha</span><span class="p">)</span> <span class="p">{</span>
<span class="k">const</span> <span class="n">uint</span> <span class="n">sampleCount</span> <span class="o">=</span> <span class="mi">256u</span><span class="p">;</span>
<span class="k">const</span> <span class="kt">float</span> <span class="n">sampleScale</span> <span class="o">=</span> <span class="mi">1</span><span class="p">.</span><span class="n">f</span> <span class="o">/</span> <span class="kt">float</span><span class="p">(</span><span class="n">sampleCount</span><span class="p">);</span>
<span class="k">const</span> <span class="n">float3</span> <span class="n">normal</span> <span class="o">=</span> <span class="n">float3</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="n">float3</span> <span class="n">viewDirection</span> <span class="o">=</span> <span class="n">float3</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">sqrt</span><span class="p">(</span><span class="mi">1</span><span class="p">.</span><span class="n">f</span> <span class="o">-</span> <span class="n">NdotV</span> <span class="o">*</span> <span class="n">NdotV</span><span class="p">),</span> <span class="n">NdotV</span><span class="p">);</span>
<span class="n">float3</span> <span class="n">R</span> <span class="o">=</span> <span class="n">reflect</span><span class="p">(</span><span class="o">-</span><span class="n">viewDirection</span><span class="p">,</span> <span class="n">normal</span><span class="p">);</span>
<span class="n">float3</span> <span class="n">lobeDirection</span> <span class="o">=</span> <span class="n">GGXDominantDirection</span><span class="p">(</span><span class="n">normal</span><span class="p">,</span> <span class="n">R</span><span class="p">,</span> <span class="n">ggxAlpha</span><span class="p">);</span>
<span class="kt">float</span> <span class="n">fittedValue</span> <span class="o">=</span> <span class="n">EvaluateAmbientDiceLobeSpecular</span><span class="p">(</span><span class="n">lobeDirection</span><span class="p">,</span>
<span class="n">viewDirection</span><span class="p">,</span>
<span class="n">ggxAlpha</span><span class="p">);</span>
<span class="n">float2</span> <span class="n">groundTruth</span> <span class="o">=</span> <span class="n">float2</span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">);</span> <span class="c1">// for f0 and f90MinusF0</span>
<span class="k">for</span> <span class="p">(</span><span class="n">uint</span> <span class="n">sampleIt</span> <span class="o">=</span> <span class="mi">0u</span><span class="p">;</span> <span class="n">sampleIt</span> <span class="o"><</span> <span class="n">sampleCount</span><span class="p">;</span> <span class="n">sampleIt</span> <span class="o">+=</span> <span class="mi">1u</span><span class="p">)</span> <span class="p">{</span>
<span class="n">float2</span> <span class="n">sampleUV</span> <span class="o">=</span> <span class="n">hammersley2D</span><span class="p">(</span><span class="n">sampleIt</span><span class="p">,</span> <span class="n">sampleCount</span><span class="p">);</span>
<span class="n">float3</span> <span class="n">H</span> <span class="o">=</span> <span class="n">sampleGGXVNDF</span><span class="p">(</span><span class="n">viewDirection</span><span class="p">,</span> <span class="n">ggxAlpha</span><span class="p">,</span> <span class="n">ggxAlpha</span><span class="p">,</span>
<span class="n">sampleUV</span><span class="p">.</span><span class="n">x</span><span class="p">,</span> <span class="n">sampleUV</span><span class="p">.</span><span class="n">y</span><span class="p">);</span>
<span class="n">float3</span> <span class="n">lightDirectionTangent</span> <span class="o">=</span> <span class="n">reflect</span><span class="p">(</span><span class="o">-</span><span class="n">viewDirection</span><span class="p">,</span> <span class="n">H</span><span class="p">);</span>
<span class="kt">float</span> <span class="n">Vis</span> <span class="o">=</span> <span class="n">SmithGGXMaskingShadowingG2OverG1Reflection</span><span class="p">(</span><span class="n">viewDirection</span><span class="p">,</span>
<span class="n">lightDirectionTangent</span><span class="p">,</span>
<span class="n">H</span><span class="p">,</span> <span class="n">ggxAlpha</span><span class="p">);</span>
<span class="kt">float</span> <span class="n">f0Weight</span> <span class="o">=</span> <span class="mi">1</span><span class="p">.</span><span class="n">f</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">f90MinusF0Weight</span> <span class="o">=</span> <span class="n">pow</span><span class="p">(</span><span class="mi">1</span><span class="p">.</span><span class="n">f</span> <span class="o">-</span> <span class="n">saturate</span><span class="p">(</span><span class="n">dot</span><span class="p">(</span><span class="n">viewDirection</span><span class="p">,</span> <span class="n">H</span><span class="p">)),</span> <span class="mi">5</span><span class="p">.</span><span class="n">f</span><span class="p">);</span>
<span class="kt">float</span> <span class="n">basis</span> <span class="o">=</span> <span class="n">AmbientDiceCosineBasisFunction</span><span class="p">(</span>
<span class="n">dot</span><span class="p">(</span><span class="n">lobeDirection</span><span class="p">,</span> <span class="n">lightDirectionTangent</span><span class="p">)</span>
<span class="p">);</span>
<span class="n">float2</span> <span class="n">brdf</span> <span class="o">=</span> <span class="n">float2</span><span class="p">(</span><span class="n">f0Weight</span> <span class="o">-</span> <span class="n">f90MinusF0Weight</span><span class="p">,</span> <span class="n">f90MinusF0Weight</span><span class="p">)</span> <span class="o">*</span>
<span class="n">Vis</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">lightDirectionTangent</span><span class="p">.</span><span class="n">z</span> <span class="o">></span> <span class="mi">0</span><span class="p">.</span><span class="n">f</span><span class="p">)</span> <span class="p">{</span>
<span class="n">groundTruth</span> <span class="o">+=</span> <span class="n">basis</span> <span class="o">*</span> <span class="n">brdf</span> <span class="o">*</span> <span class="n">sampleScale</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">groundTruth</span> <span class="o">/</span> <span class="n">fittedValue</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>The lookup table generation uses Heitz’s method for <a href="http://jcgt.org/published/0007/04/01/">Sampling the GGX Distribution of Visible Normals</a>; look there or at the source code for the ShaderToy attached to this post for the full source code.</p>
<h1 id="diffuse-fits-for-cosine-lobe-basis-functions">Diffuse Fits for Cosine-Lobe Basis Functions</h1>
<p>The specular solution depends on having a diffuse fit for the basis function. I’ve found polynomial fits for the Ambient Dice cosine-lobe basis function, where \(s\) is the normal direction and \(v_i\) is the lobe direction:</p>
\[\cos(\theta) = (s \cdot v_i) \\
B_i(s) = 0.35 \max(\cos(\theta), 0)^2 + 0.25 \max(\cos(\theta), 0)^4\]
<p>along with for basis functions created from increasing powers of clamped cosine. The quadratic fits are of the form \(f(x) = a + bx + cx^2\), where \(x = \cos(\theta)\), while the quartic fits are of the form \(f(x) = a + bx + cx^2 + ex^4\). Note that the \(x^3\) term had negligible contribution in all of the fits and was therefore dropped.</p>
<p>Note that the basis functions use the <em>clamped</em> cosine term (i.e. \(\max(s \cdot v_i, 0)^n\)) , while the fits use the <em>unclamped</em> cosine (i.e. \(s \cdot v_i\)).</p>
<p>The fits are:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Ambient Dice (0.35x^2 + 0.25x^4) (Quadratic Fit) (RMSE = 0.00039830040057973803, max delta = 0.0011889899178523927):
(a: 0.059806690913006784, b: 0.12917904381845316, c: 0.07056134282329878)
Ambient Dice (0.35x^2 + 0.25x^4) (Quartic Fit) (RMSE = 2.2924203751463493e-06, max delta = 5.9859991145602e-06):
(a: 0.05935860814656071, b: 0.12917904381815673, c: 0.07503324753667442, e: -0.005206825865963849)
x^2 (Quadratic Fit) (RMSE = 3.0408681122397585e-06, max delta = 6.544694088506109e-06):
(a: 0.12496957942811276, b: 0.25002412959296777, c: 0.1250610948589435)
x^2 (Quartic Fit) (RMSE = 3.0405945870056214e-06, max delta = 6.423524563908889e-06):
(a: 0.1249695335534175, b: 0.2500241295938416, c: 0.12506155284179532, e: -5.33276807285143e-07)
x^4 (Quadratic Fit) (RMSE = 0.001593126030041, max delta = 0.004746797107934839):
(a: 0.06426935244942271, b: 0.1666823938414644, c: 0.10715983849997654)
x^4 (Quartic Fit) (RMSE = 5.021965714382778e-06, max delta = 1.4951021221017158e-05):
(a: 0.062477085624468985, b: 0.16668239384246453, c: 0.12504681624769828, e: -0.02082655700848173)
x^6 (Quadratic Fit) (RMSE = 0.0021760428533152162, max delta = 0.005930189360399074):
(a: 0.041656708863260575, b: 0.12501163876845992, c: 0.08928511926559841)
x^6 (Quartic Fit) (RMSE = 0.0001519926731062455, max delta = 0.0005550233310728514):
(a: 0.039214627502226256, b: 0.12501163876761548, c: 0.11365730224790668, e: -0.02837755296728098)
x^8 (Quadratic Fit) (RMSE = 0.0023537857692170917, max delta = 0.0060368080144615754):
(a: 0.030296404520220775, b: 0.10000916682096359, c: 0.07574957031520438)
x^8 (Quartic Fit) (RMSE = 0.0002839638757536194, max delta = 0.0009460013572480663):
(a: 0.027667723047883053, b: 0.10000916682187795, c: 0.10198403855930246, e: -0.03054588966300734)
</code></pre></div></div>
<p>If you’re interested in sharp specular highlights (as you might get from spherical Gaussians with high \(\lambda\) values) then using a basis function like \(B_i(s) = \max(\cos(\theta), 0)^8\) might be a good fit; note, however, that this means your environment map may start to look like a series of point lights, as it does with spherical Gaussians, and the accuracy of diffuse irradiance will likely suffer.</p>
<p><a name="ResultImages"></a></p>
<h1 id="results">Results</h1>
<p>The table below compares spherical Gaussians (\(\lambda = 6\)) with 9 or 12 lobes against Ambient Dice with 9 or 12 lobes (where the nine-lobe variant contains only the lobes oriented towards the upper hemisphere) on the <a href="http://gl.ict.usc.edu/Data/HighResProbes">Ennis environment map</a>. The spherical Gaussians use an <a href="https://cg.cs.tsinghua.edu.cn/people/~kun/asg/">anisotropic spherical Gaussian fit</a> for specular as <a href="https://mynameismjp.wordpress.com/2016/10/09/sg-series-part-4-specular-lighting-from-an-sg-light-source/">detailed by MJP</a>.</p>
<table>
<thead>
<tr>
<th> </th>
<th style="text-align: center">Reference</th>
<th style="text-align: center">AD9</th>
<th style="text-align: center">AD12</th>
<th style="text-align: center">SG9</th>
<th style="text-align: center">SG12</th>
</tr>
</thead>
<tbody>
<tr>
<td>Radiance</td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/radianceMCIS.png" alt="MCIS Radiance" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/radianceADRBF-Hemi.png" alt="Hemi AD Radiance" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/radianceADRBF.png" alt="AD Radiance" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/radianceSGLS9.png" alt="SG9 Radiance" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/radianceSGLS12.png" alt="SG12 Radiance" /></td>
</tr>
<tr>
<td>RMSE</td>
<td style="text-align: center">-</td>
<td style="text-align: center">5.25</td>
<td style="text-align: center">5.83</td>
<td style="text-align: center">6.45</td>
<td style="text-align: center">5.54</td>
</tr>
<tr>
<td>Lambert</td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/irradianceMCIS.png" alt="MCIS Lambert" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/irradianceADRBF-Hemi.png" alt="Hemi AD Lambert" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/irradianceADRBF.png" alt="AD Lambert" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/irradianceSGLS9.png" alt="SG9 Lambert" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/irradianceSGLS12.png" alt="SG12 Lambert" /></td>
</tr>
<tr>
<td>RMSE</td>
<td style="text-align: center">-</td>
<td style="text-align: center">0.148</td>
<td style="text-align: center">0.105</td>
<td style="text-align: center">0.327</td>
<td style="text-align: center">0.225</td>
</tr>
<tr>
<td>GGX \(\alpha = 0.1\)</td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/specular-alpha=0.1-MCIS.png" alt="MCIS GGX-0.1" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/specular-alpha=0.1-ADRBF-Hemi.png" alt="Hemi AD GGX-0.1" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/specular-alpha=0.1-ADRBF.png" alt="AD GGX-0.1" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/specular-alpha=0.1-SGLS9.png" alt="SG9 GGX-0.1" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/specular-alpha=0.1-SGLS12.png" alt="SG12 GGX-0.1" /></td>
</tr>
<tr>
<td>RMSE</td>
<td style="text-align: center">-</td>
<td style="text-align: center">2.17</td>
<td style="text-align: center">2.74</td>
<td style="text-align: center">2.97</td>
<td style="text-align: center">2.34</td>
</tr>
<tr>
<td>GGX \(\alpha = 0.2\)</td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/specular-alpha=0.2-MCIS.png" alt="MCIS GGX-0.2" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/specular-alpha=0.2-ADRBF-Hemi.png" alt="Hemi AD GGX-0.2" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/specular-alpha=0.2-ADRBF.png" alt="AD GGX-0.2" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/specular-alpha=0.2-SGLS9.png" alt="SG9 GGX-0.2" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/specular-alpha=0.2-SGLS12.png" alt="SG12 GGX-0.2" /></td>
</tr>
<tr>
<td>RMSE</td>
<td style="text-align: center">-</td>
<td style="text-align: center">0.947</td>
<td style="text-align: center">1.37</td>
<td style="text-align: center">1.47</td>
<td style="text-align: center">1.12</td>
</tr>
<tr>
<td>GGX \(\alpha = 0.4\)</td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/specular-alpha=0.4-MCIS.png" alt="MCIS GGX-0.4" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/specular-alpha=0.4-ADRBF-Hemi.png" alt="Hemi AD GGX-0.4" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/specular-alpha=0.4-ADRBF.png" alt="AD GGX-0.4" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/specular-alpha=0.4-SGLS9.png" alt="SG9 GGX-0.4" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/specular-alpha=0.4-SGLS12.png" alt="SG12 GGX-0.4" /></td>
</tr>
<tr>
<td>RMSE</td>
<td style="text-align: center">-</td>
<td style="text-align: center">0.276</td>
<td style="text-align: center">0.409</td>
<td style="text-align: center">0.566</td>
<td style="text-align: center">0.574</td>
</tr>
<tr>
<td>GGX \(\alpha = 0.6\)</td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/specular-alpha=0.6-MCIS.png" alt="MCIS GGX-0.6" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/specular-alpha=0.6-ADRBF-Hemi.png" alt="Hemi AD GGX-0.6" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/specular-alpha=0.6-ADRBF.png" alt="AD GGX-0.6" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/specular-alpha=0.6-SGLS9.png" alt="SG9 GGX-0.6" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/specular-alpha=0.6-SGLS12.png" alt="SG12 GGX-0.6" /></td>
</tr>
<tr>
<td>RMSE</td>
<td style="text-align: center">-</td>
<td style="text-align: center">0.130</td>
<td style="text-align: center">0.137</td>
<td style="text-align: center">0.378</td>
<td style="text-align: center">0.466</td>
</tr>
<tr>
<td>GGX \(\alpha = 0.8\)</td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/specular-alpha=0.8-MCIS.png" alt="MCIS GGX-0.8" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/specular-alpha=0.8-ADRBF-Hemi.png" alt="Hemi AD GGX-0.8" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/specular-alpha=0.8-ADRBF.png" alt="AD GGX-0.8" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/specular-alpha=0.8-SGLS9.png" alt="SG9 GGX-0.8" /></td>
<td style="text-align: center"><img src="/assets/thesis/SG-vs-AD/Ennis/specular-alpha=0.8-SGLS12.png" alt="SG12 GGX-0.8" /></td>
</tr>
<tr>
<td>RMSE</td>
<td style="text-align: center">-</td>
<td style="text-align: center">0.119</td>
<td style="text-align: center">0.091</td>
<td style="text-align: center">0.299</td>
<td style="text-align: center">0.374</td>
</tr>
</tbody>
</table>
<p>More comparisons of this type on a wide range of environment maps are available in Appendix C of <a href="/thesis">my thesis</a>.</p>
<table>
<thead>
<tr>
<th style="text-align: center">Indirect lighting from baked lightmaps in Crytek Sponza with only single-scattering GGX specular materials</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><img src="/assets/thesis/MetallicSponza/Sponza-Metals-Metallic-AmbientDiceHemiNN-Crop.jpg" alt="Hemispherical Ambient Dice" /></td>
</tr>
<tr>
<td style="text-align: center">Non-negative hemispherical Ambient Dice (nine lobes) (0.9ms per frame)</td>
</tr>
<tr>
<td style="text-align: center"><img src="/assets/thesis/MetallicSponza/Sponza-Metals-Metallic-PT-Crop.jpg" alt="Path-traced reference" /></td>
</tr>
<tr>
<td style="text-align: center">Path-traced reference</td>
</tr>
<tr>
<td style="text-align: center"><img src="/assets/thesis/MetallicSponza/Sponza-Metals-Metallic-SG12NN-Crop.jpg" alt="Path-traced reference" /></td>
</tr>
<tr>
<td style="text-align: center">Non-negative spherical Gaussians (twelve lobes, \(\lambda = 8\)) (2.5ms per frame)</td>
</tr>
</tbody>
</table>
<table>
<thead>
<tr>
<th style="text-align: center">The <a href="https://github.com/TheRealMJP/BakingLab">Baking Lab</a> scene using indirect illumination lightmaps</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><img src="/assets/thesis/BakingLabSceneGallery/PathTraced.jpg" alt="Path-Traced Reference" /></td>
</tr>
<tr>
<td style="text-align: center">Path-Traced Reference</td>
</tr>
<tr>
<td style="text-align: center"><img src="/assets/thesis/BakingLabSceneGallery/ADLightmap-Only.jpg" alt="Hemispherical Ambient Dice" /></td>
</tr>
<tr>
<td style="text-align: center">Hemispherical Ambient Dice (nine lobes)</td>
</tr>
<tr>
<td style="text-align: center"><img src="/assets/thesis/BakingLabSceneGallery/ADLightmap+SSR.jpg" alt="Hemispherical Ambient Dice with SSR" /></td>
</tr>
<tr>
<td style="text-align: center">Hemispherical Ambient Dice (nine lobes) with screen-space reflections</td>
</tr>
<tr>
<td style="text-align: center"><img src="/assets/thesis/BakingLabSceneGallery/SG12Lightmap-Only.jpg" alt="Hemispherical spherical Gaussians" /></td>
</tr>
<tr>
<td style="text-align: center">Spherical Gaussians (twelve lobes)</td>
</tr>
<tr>
<td style="text-align: center"><img src="/assets/thesis/BakingLabSceneGallery/SG12Lightmap+SSR.jpg" alt="Hemispherical Ambient Dice with SSR" /></td>
</tr>
<tr>
<td style="text-align: center">Spherical Gaussians (twelve lobes) with screen-space reflections</td>
</tr>
</tbody>
</table>
<script type="text/javascript" async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=default.js">
</script>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>More precisely, this fit is for the single-scattering GGX specular model using the Smith height-correlated masking-shadowing function; Heitz provides details in <a href="http://jcgt.org/published/0003/02/03/paper.pdf">Understanding the Masking-Shadowing Function in Microfacet-Based BRDFs</a>. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>We can do better than directly using <code class="language-plaintext highlighter-rouge">ggxAlpha</code> for the lerp parameter; this code snippet is just for the sake of example. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>I used <a href="https://www.physics.wisc.edu/~craigm/idl/fitting.html">MPFIT</a> to find the parameters that minimised the error of my LUT-multiplied approximation over the parameter space. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:4" role="doc-endnote">
<p>One possibility would be to use a 3D lookup table, with the angle between the normal and lobe direction as the third coordinate for the table. I was happy enough with the quality of the 2D LUT that I didn’t feel the need to try this. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Jump to result imagesSpherical Radial Basis Functions for Lighting2019-06-21T22:58:26+00:002019-06-21T22:58:26+00:00https://torust.me/2019/06/21/rbf-for-indirect-specular<table>
<thead>
<tr>
<th style="text-align: center"><strong>Indirect lighting from baked lightmaps in Crytek Sponza with only single-scattering GGX specular materials</strong></th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><img src="/assets/thesis/MetallicSponza/Sponza-Metals-Metallic-AmbientDiceHemiNN-Crop.jpg" alt="Hemispherical Ambient Dice" /></td>
</tr>
<tr>
<td style="text-align: center">Non-negative hemispherical Ambient Dice (nine lobes) (0.9ms per frame)</td>
</tr>
<tr>
<td style="text-align: center"><img src="/assets/thesis/MetallicSponza/Sponza-Metals-Metallic-PT-Crop.jpg" alt="Path-traced reference" /></td>
</tr>
<tr>
<td style="text-align: center">Path-traced reference</td>
</tr>
<tr>
<td style="text-align: center"><img src="/assets/thesis/MetallicSponza/Sponza-Metals-Metallic-SG12NN-Crop.jpg" alt="Path-traced reference" /></td>
</tr>
<tr>
<td style="text-align: center">Non-negative spherical Gaussians (twelve lobes, \(\lambda = 8\)) (2.5ms per frame)</td>
</tr>
</tbody>
</table>
<p>With <a href="/thesis">my thesis now published</a>, I wanted to break down one of its key contributions: namely, the extension of Iwanicki and Sloan’s <a href="http://miciwan.com/EGSR2017/AmbientDice.pdf">Ambient Dice</a> basis function to store and evaluate both diffuse and specular irradiance. This content mainly comes from Chapters 7 & 8 of the thesis.</p>
<p>Firstly, it’s worth quickly covering what linear bases are. When we add multiple functions \(B_i(s)\) together and multiply by <em>basis coefficients</em> \(b_i\), we get a <em>linear basis</em>, where \(B_i(s)\) is the \(i^{th}\) basis function of the linear basis:</p>
\[\sum_i b_i B_i(s)\]
<p>Determining the basis coefficient vector \(b\) that best fits some function \(f(s)\) such that</p>
\[f(s) \approx \sum_i b_i B_i(s)\]
<p>can be done <a href="/2018/10/02/running-average-derivation">by least-squares encoding</a>; essentially, least-squares encoding tries to find the coefficients \(b\) that will enable the weighted sum of the basis functions to best approximate the original function \(f(s)\).</p>
<table>
<thead>
<tr>
<th style="text-align: center">\(f(s)\)</th>
<th style="text-align: center">\(\sum_i b_i B_i(s)\)</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align: center"><img src="/assets/thesis/AmbientDiceBasis/wells.png" alt="Wells HDR Environment Map" /></td>
<td style="text-align: center"><img src="/assets/thesis/AmbientDiceBasis/Wells-ADSRBFRadiance.png" alt="Wells HDR Ambient Dice approximation" /></td>
</tr>
</tbody>
</table>
<table>
<tbody>
<tr>
<td>Approximation of the <a href="http://dativ.at/lightprobes">Wells HDR environment map</a> with the <a href="http://miciwan.com/EGSR2017/AmbientDice.pdf">Ambient Dice SRBF</a> basis functions.</td>
</tr>
</tbody>
</table>
<p>Many graphics programmers will be aware of the spherical harmonic basis functions, which are separated into bands where successive bands represent increasingly high-frequency components of the source signal. Even if you’re familiar with spherical harmonics, however, you may not have realised that the functions form a linear basis just like any other, and you can mix and match basis functions from different families together. In doing so, however, you will likely lose the orthonormality property of spherical harmonics.</p>
<p>Orthonormality means that encoding can be performed by simple projection of the function values onto the basis functions:</p>
\[b_i = m_i = \int_S f(s) B_i(s) \mathrm{d} s\]
<p>where \(m_i\) here means the \(i^{th}\) <em>moment</em> of the linear basis. If you don’t have an orthonormal basis, encoding becomes slightly more complex, generally involving a matrix multiplication with the precomputed <em>Gram matrix inverse</em>. The Gram matrix can be generated by Monte-Carlo sampling and integration, and generally requires a linear algebra library to perform the inversion. As a simpler, approximate alternative, my <a href="/rendering/irradiance-caching/spherical-gaussians/2018/09/21/spherical-gaussians">progressive least-squares encoding method</a> doesn’t require any precomputation, which makes it easy to experiment with different basis functions. See Chapter 7 of the thesis for more details on both encoding techniques. Given those tools, we can expand to basis functions beyond spherical harmonics.</p>
<p>There’s a particularly useful set of basis functions called <a href="https://en.wikipedia.org/wiki/Radial_basis_function">radial basis functions</a>. In the context of encoding light over a sphere (i.e. in all directions), radial just means that the basis function has some 3D direction and a value determined by how close the query direction is to the basis direction - e.g. a dot product. The simplest spherical RBF (SRBF) is:</p>
\[B_i(s) = s \cdot v_i\]
<p>where \(v_i\) is some fixed direction on the unit sphere. Note that, due to the definition of the dot product<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>, this basis function returns the cosine of the angle between \(s\) and \(v_i\).</p>
<p>RBFs can be usd to form a linear basis with a set of <em>lobes</em> pointing in different directions; in other words, the only part of the basis function that changes between lobes is the lobe direction \(v_i\). Using our basis function above, we can approximate a radiance function \(f(s)\) as:</p>
\[f(s) \approx \sum_i b_i (s \cdot v_i)\]
<p>These lobes can be thought of as independent light sources. If you’re storing radiance in the basis, you can evaluate radiance by summing the lobes, adding together the radiance from all the lights. If you have a way to evaluate diffuse irradiance from one lobe, you can them evaluate diffuse for all of them – just add up each lobe’s diffuse contribution.</p>
<p>In the <a href="http://miciwan.com/EGSR2017/AmbientDice.pdf">Ambient Dice</a> paper, Iwanicki and Sloan present a SRBF with the lobes aligned with the vertices of an icosahedron (i.e. a twenty-sided die):</p>
\[B_i(s) = 0.35 \max((s \cdot v_i), 0)^2 + 0.25 \max((s \cdot v_i), 0)^4\]
<p>In their paper, Iwanicki and Sloan directly stored diffuse irradiance into the linear basis. For my use-case, I wanted to reconstruct both diffuse <em>and</em> specular irradiance; therefore, I stored radiance into the basis.<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></p>
<p>Given an SRBF basis storing radiance, the next step is to find fits that allow us to reconstruct diffuse or specular lighting from each lobe. Note that each lobe is a sort of area light – in fact, you could form a basis out of e.g. rectangular area lights in different directions and use that to encode radiance. For the Ambient Dice SRBF, I found a simple and accurate polynomial fit for diffuse irradiance (along with a far more complex analytic solution; see Chapter 8 of the thesis):</p>
\[\theta_{i}(s) = \cos^{-1}(s \cdot v_{i}) \\
I_{i\text{ approx}}(\theta_{i}) = 0.05981 + 0.12918 \cos(\theta_{i}) + 0.07056 \cos^2(\theta_{i})\]
<table>
<tbody>
<tr>
<td><img src="/assets/thesis/CosineLobeFitADSRBF.png" alt="Cosine Lobe fit for Ambient Dice" /></td>
</tr>
<tr>
<td>The polynomial fit (black) against the true value of the function (blue)</td>
</tr>
</tbody>
</table>
<p>With that fit, we can evaluate both radiance and irradiance from an Ambient Dice SRBF basis in any direction.</p>
<p>The remaining problem is to evaluate specular from those same lobes. I don’t claim to have an ideal solution here (nor, for that matter, do I claim that the Ambient Dice basis function is a great choice for representing specular, since it’s inherently fairly low-frequency); however, I do have a general approach that could be useful.</p>
<p>Firstly, note that evaluating specular for a mirror-like surface is the same as evaluating the radiance in the mirror reflection direction. This observation allows us to approximate mirror specular indirect and diffuse lighting.</p>
<p>Let’s say we want to use the basis to evaluate specular for a range of roughness values. Well, it turns out that rough specular is fairly accurately approximated by multiplying the diffuse lighting by the specular BRDF response (which should be 1 if it’s energy conservative). For values in the middle of the roughness range, I found that blending between the two extremes – mirror reflectance and BRDF-multiplied diffuse – with the help of a 2D lookup table (roughness vs. viewing angle) actually works fairly well.</p>
<p>I won’t go into the details of my specular fit here – that will come in <a href="/2019/06/25/ambient-dice-specular.html">a future post</a>, or you can look at Chapter 8 of <a href="/thesis">my thesis</a> (page 124). The part I want to emphasise is that, since all the RBF lobes share the same basis function, we only need one lookup table! We couldn’t do this for e.g. spherical harmonics since each SH band is evaluated differently – this is in fact a problem Chen and Liu ran into when using spherical harmonics for specular in the <a href="https://developer.amd.com/wordpress/media/2012/10/S2008-Chen-Lighting_and_Material_of_Halo3.pdf">Lighting and Material of Halo 3</a>.</p>
<p>The diffuse and specular fits in combination make a good-quality, efficiently-reconstructed method of storing low-frequency indirect diffuse and specular lighting. Although I haven’t tried, I’m sure you could get even better results with a 3D lookup table or different radial basis functions – for example, \((s \cdot v_i)^6\) or higher powers of cosine are an obvious choice to preserve higher-frequency information.</p>
<p><a href="/2019/06/25/ambient-dice-specular.html">In the next post</a>, I go into more detail about my fit for Ambient Dice specular.</p>
<script type="text/javascript" async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=default.js">
</script>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>The dot product is the element-wise sum of two vectors \(a\) and \(b\), and is equal to the length of \(a\) times the length of \(b\) times the cosine of the angle between them. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>Storing radiance rather than irradiance means more coefficients are required for runtime reconstruction; rather than only retrieving the coefficients for the non-zero basis functions in the sample direction, we need to retrieve coefficients for any lobe which may have influence over the hemisphere around the sample direction (which, in practice, usually means we need to use all of the lobes). <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Indirect lighting from baked lightmaps in Crytek Sponza with only single-scattering GGX specular materials Non-negative hemispherical Ambient Dice (nine lobes) (0.9ms per frame) Path-traced reference Non-negative spherical Gaussians (twelve lobes, \(\lambda = 8\)) (2.5ms per frame)So, I Wrote a Master’s Thesis2019-06-20T23:08:26+00:002019-06-20T23:08:26+00:00https://torust.me/2019/06/20/i-wrote-a-thing<p>My Master’s Thesis – Interactive Generation of Path-Traced Lightmaps – is finally <a href="/thesis">publicly available</a>! I spent a year writing and working on it, and I really hope that other people can find some use from it.</p>
<p>It can be rather dense in parts (an unfortunate side-effect of the word limit), so please feel free to contact me (see the page footer) if you have any questions – I can help explain any part of it in more detail.</p>
<p>I also plan/hope to extract parts of it out into longer, more casually-written blog posts – watch for that in the future. The first candidate for that is likely to be the <a href="https://www.activision.com/cdn/research/ambient_dice_web.pdf">Ambient Dice</a> specular reconstruction – it’s surprisingly easy to achieve better quality and performance than e.g. spherical Gaussian lightmaps for specular by using other basis functions and approximations, and I’m sure I’ve barely scratched the surface of what could be done.</p>
<p>Finally, you may notice my <a href="/portfolio">CV and Portfolio</a> has been updated, and that I’m looking for work. If you have any remote work, particularly in low-level, graphics, or engine development, then I’d love to hear from you. I’m based in Wellington, New Zealand, which puts me in a friendly time zone to Australia, much of Asia, and the west coast of the US. I’m also open to full-time employment in Wellington starting from September or October.</p>
<p><em><a href="https://www.cryengine.com/marketplace/sponza-sample-scene">Crytek Sponza</a> rendered using a hemispherical Ambient Dice lightmap (a technique developed for the thesis) for diffuse and specular</em>:</p>
<p><img src="/assets/thesis/Sponza-LookingUp-AmbientDiceHemi.jpg" alt="Sponza rendered using a hemispherical Ambient Dice lightmap for diffuse and specular" /></p>
<p><em>Path-traced reference:</em></p>
<p><img src="/assets/thesis/Sponza-LookingUp-PathTracedReference.jpg" alt="Sponza rendered from the same perspective by a path tracer" /></p>My Master’s Thesis – Interactive Generation of Path-Traced Lightmaps – is finally publicly available! I spent a year writing and working on it, and I really hope that other people can find some use from it.Running Average Encoding - Why It Works2018-10-01T22:51:00+00:002018-10-01T22:51:00+00:00https://torust.me/2018/10/01/running-average-derivation<p>In the <a href="/rendering/irradiance-caching/spherical-gaussians/2018/09/21/spherical-gaussians">previous post</a> I introduced an as-far-as-I-know novel method for performing progressive least squares optimisation with spherical basis functions. Here, I’ll go into more detail about how it works, and also derive my <a href="/2018/09/21/spherical-gaussians-old">original, approximate method</a> from the corrected version.</p>
<p>Many thanks to Peter-Pike Sloan for <a href="https://twitter.com/PeterPikeSloan/status/1044482721223856128">providing the first part</a> of this derivation.</p>
<p>We’ll be dealing with spherical integrals for the sake of this post, but everything is equally applicable to hemispheres by restricting the integration domain. For example:</p>
\[\int_S f(s)\]
<p>will be used as shorthand for ‘the integral over the sphere of the function \(f(s)\), where \(s\) is a direction vector. All integrals will be done in respect to \(s\).</p>
<p>\(f(s)\) is taken to mean the value of the function we’re trying to fit in direction \(s\); this value will usually be obtained using Monte Carlo sampling.</p>
<p>We’ll also assuming fixed basis functions parameterised only by their direction, such that \(B_i(s)\) is the value of the <em>i</em>th basis function in direction \(s\). The basis functions will be evaluated by multiplying with a per-basis amplitude \(b_i\) and summing, such that the result \(R\) in direction \(s\) is given by:</p>
\[R(s) = \sum_i b_i B_i(s)\]
<p>In the case of spherical Gaussians, \(b_i = \mu_i\), and \(B_i(s) = e^{\lambda_i (s \cdot \vec{p}_i - 1)}\), or the value of the <em>i</em>th lobe evaluated in direction \(s\).</p>
<p>Our goal is to minimise the squared difference between \(R(s)\) and \(f(s)\) so that the fit matches the original function as closely as possible. Mathematically, that can be expressed as:</p>
\[\min \int_S ( \sum_i b_i B_i(s) - f(s))^2\]
<p>To minimise, we differentiate the function with respect to each unknown \(b_i\) and then set the derivative to 0.</p>
\[E = \int_S ( \sum_i b_i B_i(s) - f(s))^2\]
\[\frac{dE}{b_i} = 0\]
<p>Let \(g(s) = \sum_k b_k B_k(s) - f(s)\). Therefore, \(\frac{d}{b_k} \begin{bmatrix} g(s) \end{bmatrix} = B_k(s)\) for each \(b_k\).</p>
\[\begin{align*}
\frac{dE}{b_i} &= \frac{d}{b_i} \begin{bmatrix} \int_S ( \sum_i b_i B_i(s) - f(s))^2 \end{bmatrix} \\
&= \frac{d}{b_i} \begin{bmatrix} \int_S ( g(s) )^2 \end{bmatrix} \\
&= 2 \int_S g(s) \frac{d}{b_i} \begin{bmatrix} g(s) \end{bmatrix} \\
&= 2 \int_S g(s) B_i(s) \\
&= 2( \sum_k b_k \int_S ( B_i(s) \cdot B_k(s) ) ) - 2 \int_S (B_i(s) \cdot f(s))
\end{align*}\]
<p>Therefore, by setting \(\frac{dE}{b_i} = 0\),</p>
\[\begin{equation} \label{LeastSquaresMinimiser}
\int_S (B_i(s) \cdot f(s)) = \sum_k b_k \int_S ( B_i(s) \cdot B_k(s) ) )
\end{equation}\]
<p>At this step, we now have a method for producing a Monte Carlo estimate of the raw moments \(B_i(s) \cdot f(s)\): as each sample comes in, multiply it by each basis function and add it to the estimate for each lobe. This is in fact what was done for the naïve projection <a href="(https://mynameismjp.wordpress.com/2016/10/09/sg-series-part-5-approximating-radiance-and-irradiance-with-sgs/)">used in The Order: 1886</a>. To reconstruct the lobe amplitudes \(b_i\) we need to multiply by the inverse of \(\int_S ( B_i(s) \cdot B_k(s) ) )\):</p>
\[\begin{bmatrix}
b_{1} \\
b_{2} \\
\vdots \\
b_{m}
\end{bmatrix}
=
\begin{pmatrix}
\int_S ( B_1(s) \cdot B_1(s) ) & \int_S ( B_1(s) \cdot B_2(s) ) & \dots & ( B_1(s) \cdot B_m(s) ) \\
\int_S ( B_2(s) \cdot B_1(s) ) & \int_S ( B_2(s) \cdot B_2(s) ) & \dots & ( B_2(s) \cdot B_m(s) ) \\
\vdots & \vdots & \ddots & \vdots \\
\int_S ( B_m(s) \cdot B_1(s) ) & \int_S ( B_m(s) \cdot B_2(s) ) & \dots & ( B_m(s) \cdot B_m(s) )
\end{pmatrix}^{-1}
\begin{bmatrix}
\int_S (B_1(s) \cdot f(s)) \\
\int_S (B_2(s) \cdot f(s)) \\
\vdots \\
\int_S (B_m(s) \cdot f(s))
\end{bmatrix}\]
<p>This is a perfectly valid method of performing least squares without storing all of the samples at every step, although it <a href="https://twitter.com/PeterPikeSloan/status/1044787585900400640">can be noisier</a> than if all samples were used to perform the fit. However, it does require a large matrix multiplication to reconstruct the \(b_i\) amplitudes, which is unsuitable for progressive rendering.</p>
<p>In the ‘running average’ algorithm, we want to reconstruct the \(b_i\) amplitudes as every sample comes in so that the results can be displayed at every iteration. There are therefore a few more steps we need to perform.</p>
<p>Let’s rearrange the above equation to solve for a single \(b_i\).</p>
\[\begin{align*}
\int_S ( B_i(s) \cdot f(s) ) &= \sum_k b_k \int_S ( B_i(s) \cdot B_k(s) ) ) \\
&= b_i \int_S B_i(s)^2 + \sum_{k, k \not= i} b_k \int_S ( B_i(s) \cdot B_k(s) )
\end{align*}\]
\[b_i \int_S B_i(s)^2 = \int_S ( B_i(s) \cdot f(s) ) - \sum_{k, k \not= i} b_k \int_S ( B_i(s) \cdot B_k(s) )\]
<p>We can bring the entire right hand side under the same integral due to the linearity of integration.</p>
\[\begin{align*}
b_i \int_S B_i(s)^2 &= \int_S ( B_i(s) \cdot f(s) - \sum_{k, k \not= i} b_k ( B_i(s) \cdot B_k(s) ) ) \\
&= \int_S ( B_i(s) \cdot ( f(s) - \sum_{k, k \not= i} b_k \cdot B_k(s) ) )
\end{align*}\]
<p>Finally, we end up with the following equation for \(b_i\):</p>
\[b_i = \frac{\int_S ( B_i(s) \cdot ( f(s) - \sum_{k, k \not= i} b_k \cdot B_k(s) ) )}{\int_S B_i(s)^2 }\]
<p>The two spherical integrals here which can be computed in tandem using Monte-Carlo integration. The estimate for \(b_i\) given a single sample in direction \(\omega\) with a value \(v\) (where \(v\) is an estimate of \(f(\omega)\)) is given by:</p>
\[b_i = \frac{B_i(\omega) \cdot ( v - \sum_{k, k \not= i} b_k \cdot B_k(\omega) )}{\int_S B_i(s)^2 }\]
<p>The average value of \(b_i\) across all samples will tend towards the true least-squares value.</p>
<p>Likewise, the estimator for \(\int_S (B_i(s))^2\) is given by averaging \(B_i(\omega)^2\).</p>
<p>To solve this for \(b_i\), we need to know the amplitudes \(b_k\) for all \(k\) where \(k \not= i\). We can approximate this by using the \(b_k\) values solved for in the previous iteration of the algorithm. As the number of samples increases, the \(b\) vector will gradually converge to the true value. The convergence could potentially be improved by seeding the \(b\) vector with the estimate from a low-sample-count run rather than with the 0 vector; in practice, the error seems to disappear fairly quickly.</p>
<p>Similarly, since \(v\) is often only an estimator for the function value \(f(\omega)\) and not the true value, high variance in its estimate can cause errors in the \(b\) vector. One possible strategy to counter this is to gradually increase the sample weights over time (e.g. with \(w = 1 - exp(-\frac{sampleIndex}{sampleCount})\)); however, in my implementation I haven’t found this to be necessary.</p>
<p>In this running average method, the integral in the denominator is calculated using Monte Carlo integration in the same way that \(b_i\) is. In fact, it turns out that computing both of them in lockstep improves the accuracy of the algorithm since any sampling bias in the numerator will be partially balanced out by the bias in the denominator. However, it’s also true that the integral may be wildly inaccurate at small sample counts and end up amplifying small values; therefore, to balance that out, I recommend clamping the estimator for the integral to at least the true integral. Alternatively, it’s possible to always use the precomputed true integral on the denominator and only estimate the \(b\) vector, although this results in slightly increased error.</p>
<hr />
<p><br /></p>
<p>My <a href="/rendering/irradiance-caching/spherical-gaussians/2018/09/21/spherical-gaussians-old.html">original algorithm</a> was created by experimentation. I thought it would be worth going through why it worked and the approximations it made. Note that none of this is necessary to understand the corrected equation – it’s purely for curiosity and interest!</p>
<p>Effectively, at each step, it solved the following equation:</p>
\[b_i = \frac{ B_i(\omega) \cdot (v - \sum_k b_k B_k(\omega) ) } { \int_S ( B_i(s) ) } + b_i\]
<p>If we rearrange that to get into a form vaguely resembling our proper solution above:</p>
\[\begin{align*}
b_i &= \frac{ B_i(\omega) \cdot (v - \sum_k b_k B_k(s) ) } { \int_S ( B_i(s) ) } + b_i \\
&= \frac{ B_i(\omega) \cdot (v - \sum_k b_k B_k(s) ) + b_i \int_S B_i(s) } { \int_S B_i(s) } \\
&\approx \frac{ B_i(\omega) \cdot (v - \sum_k b_k B_k(s) ) + b_i B_i(\omega) } { \int_S B_i(s) } \\
&\approx \frac{ B_i(\omega) \cdot (v - \sum_k b_k B_k(s) + b_i ) } { \int_S B_i(s) } \\
&\approx \frac{ B_i(\omega) \cdot (v - \sum_{k, k \not= i} b_k B_k(\omega) - b_i B_i(\omega) + b_i ) } { \int_S B_i(s) } \\
&\approx \frac{ B_i(\omega) \cdot (v - \sum_{k, k \not= i} b_k B_k(\omega) + (1 - B_i(\omega)) b_i) } { \int_S B_i(s) }
\end{align*}\]
<p>For the spherical integral of a spherical Gaussian basis function with itself, \(\int_S (B_i(s)) \approx 2 \int_S (B_i(s))^2\), since \(\int_S (B_i(s))^2 = \frac{\pi}{4 \lambda} (1 - e^{-4 \lambda})\) and \(\int_S B_i(s) = \frac{\pi}{2 \lambda} (1 - e^{-2 \lambda})\). Therefore,</p>
\[\begin{align*}
b_i &\approx \frac{ B_i(\omega) \cdot (v - \sum_{k, k \not= i} b_k B_k(\omega) + (1 - B_i(\omega)) b_i) } { 2 \int_S (B_i(s))^2 } \\
&\approx \frac{ B_i(\omega) \cdot (v - \sum_{k, k \not= i} b_k B_k(\omega))} { 2 \int_S (B_i(s))^2 } + \frac{ B_i(\omega) \cdot (1 - B_i(\omega)) b_i) } { 2 \int_S (B_i(s))^2 }
\end{align*}\]
<p>This is very close to our ‘correct’ equation above. In fact, it becomes equal when</p>
\[v - \sum_{k, k \not= i} b_k B_k(\omega) = (1 - B_i(\omega))b_i\]
<p>We can rearrange that a little further:</p>
\[\begin{align*}
v &= b_i - b_i B_i(\omega) + \sum_{k, k \not= i} b_k B_k(\omega) \\
&= b_i + \sum_{k} b_k B_k(\omega) - 2 b_i B_i(\omega)
\end{align*}\]
<p>Since \(v\) is an estimator for \(f(\omega)\) and we assume that, as the fit converges, \(f(s) \approx \sum_{k} b_k B_k(s)\), we’re left with:</p>
\[2b_i B_i(\omega) = b_i \\
B_i(\omega) = \frac{1}{2}\]
<p>In other words, using the original algorithm for a given sample, the error is mostly determined by how close \(B_i(s)\) is to \(\frac{1}{2}\). Since the influence of samples with higher basis weights \(B_i(s)\) is greater anyway, this turned out to be a reasonable approximation. However, given the option, I’d still recommend using the corrected algorithm!</p>
<script type="text/javascript" async="" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/MathJax.js?config=default.js">
</script>In the previous post I introduced an as-far-as-I-know novel method for performing progressive least squares optimisation with spherical basis functions. Here, I’ll go into more detail about how it works, and also derive my original, approximate method from the corrected version.Spherical Gaussian Encoding2018-09-21T03:37:26+00:002018-09-21T03:37:26+00:00https://torust.me/rendering/irradiance-caching/spherical-gaussians/2018/09/21/spherical-gaussians<blockquote>
<p>Update: This post has now been published in paper form as <a href="http://jcgt.org/published/0009/01/02/">“Progressive Least-Squares Encoding for Linear Bases”</a> in the open-access journal JCGT. Take a look there for a more formalised derivation and more details.</p>
</blockquote>
<p>Spherical Gaussians are a useful tool for encoding precomputed radiance within a scene. Matt Pettineo has <a href="https://mynameismjp.wordpress.com/2016/10/09/sg-series-part-1-a-brief-and-incomplete-history-of-baked-lighting-representations/">an excellent series</a> describing the technical details and history behind them which I’d suggest reading before the rest of this post.</p>
<p>Recently, I’ve had need to build spherical Gaussian representations of a scene on the fly in the GPU path tracer I’m building for my Master’s project. Unlike, say, spherical harmonics, spherical Gaussian lobes don’t form a set of orthonormal bases; they need to be computed with a least-squares solve, which is generally<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> done with access to all radiance samples at once. On the GPU, in a memory-constrained environment, this is highly impractical.</p>
<p>Instead, games like The Order: 1886 just projected the samples onto the spherical Gaussian lobes <a href="https://mynameismjp.wordpress.com/2016/10/09/sg-series-part-5-approximating-radiance-and-irradiance-with-sgs/">as if the lobes form an orthonormal basis</a>, which gave low-quality results that lack contrast. For my research, I wanted to see if I could do better.</p>
<p>After a bunch of experimentation, I found a new (to my knowledge) algorithm for progressively accumulating spherical Gaussian samples that gives results matching a standard least squares solve.</p>
<p>The algorithm is as follows, and supports both non-negative and regular solves. While the implementation focuses on spherical Gaussians, the algorithm should work for <em>any</em> set of spherical basis functions, provided those basis functions don’t overlap too much.</p>
<figure class="highlight"><pre><code class="language-swift" data-lang="swift"><span class="kd">struct</span> <span class="kt">SphericalGaussianBasis</span> <span class="p">{</span>
<span class="k">var</span> <span class="nv">lobes</span> <span class="p">:</span> <span class="p">[</span><span class="kt">SphericalGaussian</span><span class="p">]</span>
<span class="k">var</span> <span class="nv">totalAccumulatedWeight</span> <span class="p">:</span> <span class="kt">Float</span> <span class="o">=</span> <span class="mf">0.0</span>
<span class="k">var</span> <span class="nv">lobeMCSphericalIntegrals</span> <span class="o">=</span> <span class="p">[</span><span class="kt">Float</span><span class="p">](</span><span class="nv">repeating</span><span class="p">:</span> <span class="mf">0.0</span><span class="p">,</span> <span class="nv">count</span><span class="p">:</span> <span class="n">lobes</span><span class="o">.</span><span class="n">count</span><span class="p">)</span> <span class="c1">// Optional, can be precomputed at a slight increase in error.</span>
<span class="k">let</span> <span class="nv">nonNegativeSolve</span> <span class="p">:</span> <span class="kt">Bool</span>
<span class="k">mutating</span> <span class="kd">func</span> <span class="nf">accumulateSample</span><span class="p">(</span><span class="n">_</span> <span class="nv">sample</span><span class="p">:</span> <span class="kt">RadianceSample</span><span class="p">,</span> <span class="nv">sampleWeight</span><span class="p">:</span> <span class="kt">Float</span> <span class="o">=</span> <span class="mf">1.0</span><span class="p">)</span> <span class="p">{</span>
<span class="n">totalAccumulatedWeight</span> <span class="o">+=</span> <span class="n">sampleWeight</span>
<span class="k">let</span> <span class="nv">sampleWeightScale</span> <span class="o">=</span> <span class="n">sampleWeight</span> <span class="o">/</span> <span class="n">totalAccumulatedWeight</span>
<span class="k">var</span> <span class="nv">delta</span> <span class="o">=</span> <span class="n">sample</span><span class="o">.</span><span class="n">value</span>
<span class="k">var</span> <span class="nv">sampleLobeWeights</span> <span class="o">=</span> <span class="p">[</span><span class="kt">Float</span><span class="p">](</span><span class="nv">repeating</span><span class="p">:</span> <span class="mf">0.0</span><span class="p">,</span> <span class="nv">count</span><span class="p">:</span> <span class="k">self</span><span class="o">.</span><span class="n">lobes</span><span class="o">.</span><span class="n">count</span><span class="p">)</span>
<span class="k">for</span> <span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">lobe</span><span class="p">)</span> <span class="k">in</span> <span class="k">self</span><span class="o">.</span><span class="n">lobes</span><span class="o">.</span><span class="nf">enumerated</span><span class="p">()</span> <span class="p">{</span>
<span class="k">let</span> <span class="nv">dotProduct</span> <span class="o">=</span> <span class="nf">dot</span><span class="p">(</span><span class="n">lobe</span><span class="o">.</span><span class="n">axis</span><span class="p">,</span> <span class="n">sample</span><span class="o">.</span><span class="n">direction</span><span class="p">)</span>
<span class="k">let</span> <span class="nv">weight</span> <span class="o">=</span> <span class="nf">exp</span><span class="p">(</span><span class="n">lobe</span><span class="o">.</span><span class="n">sharpness</span> <span class="o">*</span> <span class="p">(</span><span class="n">dotProduct</span> <span class="o">-</span> <span class="mf">1.0</span><span class="p">))</span>
<span class="n">delta</span> <span class="o">-=</span> <span class="n">lobe</span><span class="o">.</span><span class="n">amplitude</span> <span class="o">*</span> <span class="n">weight</span>
<span class="n">sampleLobeWeights</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">weight</span>
<span class="p">}</span>
<span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="mi">0</span><span class="o">..<</span><span class="k">self</span><span class="o">.</span><span class="n">lobes</span><span class="o">.</span><span class="n">count</span> <span class="p">{</span>
<span class="k">let</span> <span class="nv">weight</span> <span class="o">=</span> <span class="n">sampleLobeWeights</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
<span class="k">let</span> <span class="nv">sphericalIntegralGuess</span> <span class="o">=</span> <span class="n">weight</span> <span class="o">*</span> <span class="n">weight</span>
<span class="c1">// Update the MC-computed integral of the lobe over the domain.</span>
<span class="k">self</span><span class="o">.</span><span class="n">lobeMCSphericalIntegrals</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+=</span> <span class="p">(</span><span class="n">sphericalIntegralGuess</span> <span class="o">-</span> <span class="k">self</span><span class="o">.</span><span class="n">lobeMCSphericalIntegrals</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="o">*</span> <span class="n">sampleWeightScale</span>
<span class="c1">// The most accurate method requires using the MC-computed integral, </span>
<span class="c1">// since then bias in the estimate will partially cancel out.</span>
<span class="c1">// However, if you don't want to store a weight per-lobe you can instead substitute it with the</span>
<span class="c1">// precomputed integral at a slight increase in error.</span>
<span class="c1">// Interpolate from 1 to the true spherical integral as the sample count increases to avoid excess noise</span>
<span class="c1">// in the early estimates.</span>
<span class="k">let</span> <span class="nv">sphericalIntegral</span> <span class="o">=</span> <span class="n">sampleWeightScale</span> <span class="o">+</span> <span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="n">sampleWeightScale</span><span class="p">)</span> <span class="o">*</span> <span class="k">self</span><span class="o">.</span><span class="n">lobeMCSphericalIntegrals</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
<span class="c1">// Alternatively, you can also use this for the spherical integral:</span>
<span class="c1">// let sphericalIntegral = max(self.lobeMCSphericalIntegrals[i], self.lobes[i].precomputedSphericalIntegral)</span>
<span class="c1">// The acceleration coefficient helps to avoid local minima, although it can make the solve noisier.</span>
<span class="c1">// 3.0 seems to be a good value in my tests; 1.0 means no acceleration.</span>
<span class="k">let</span> <span class="nv">accelerationCoefficient</span> <span class="p">:</span> <span class="kt">Float</span> <span class="o">=</span> <span class="mf">3.0</span>
<span class="k">let</span> <span class="nv">deltaScale</span> <span class="o">=</span> <span class="n">accelerationCoefficient</span> <span class="o">*</span> <span class="n">sampleWeightScale</span> <span class="o">*</span> <span class="n">weight</span> <span class="o">/</span> <span class="n">sphericalIntegral</span>
<span class="k">self</span><span class="o">.</span><span class="n">lobes</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">amplitude</span> <span class="o">+=</span> <span class="n">delta</span> <span class="o">*</span> <span class="n">deltaScale</span>
<span class="k">if</span> <span class="p">(</span><span class="k">self</span><span class="o">.</span><span class="n">nonNegativeSolve</span><span class="p">)</span> <span class="p">{</span>
<span class="k">self</span><span class="o">.</span><span class="n">lobes</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">amplitude</span> <span class="o">=</span> <span class="nf">max</span><span class="p">(</span><span class="k">self</span><span class="o">.</span><span class="n">lobes</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">amplitude</span><span class="p">,</span> <span class="nf">float3</span><span class="p">(</span><span class="mi">0</span><span class="p">))</span>
<span class="p">}</span>
<span class="c1">// Optional, slightly improves convergence:</span>
<span class="c1">// This step is more important when the lobes overlap significantly:</span>
<span class="n">delta</span> <span class="o">*=</span> <span class="mf">1.0</span> <span class="o">-</span> <span class="n">deltaScale</span> <span class="o">*</span> <span class="n">weight</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>I’ve called this a ‘running average’ since each new sample is evaluated against the Monte-Carlo estimate of the function based on the previous samples. If we want each lobe to be non-negative, we simply clamp it; the next sample to come in will then be evaluated against the set of non-negative lobes.</p>
<p>I provide a mathematical derivation of the method <a href="/rendering/irradiance-caching/spherical-gaussians/2018/10/02/running-average-derivation.html">here</a>.</p>
<p>So how does it look? Well, here are the results for the ‘ennis.hdr’ environment map using Halton-sequence sample directions and twelve lobes, where ‘Running Average’ is my new method. In these images, I’m using <a href="https://mynameismjp.wordpress.com/2016/10/09/sg-series-part-3-diffuse-lighting-from-an-sg-light-source/">Stephen Hill’s fitted approximation for a cosine lobe</a> to evaluate the irradiance for all encoding methods.</p>
<table>
<tr><td><b>Radiance</b></td><td><b>Irradiance</b></td><td><b>Irradiance Error (sMAPE)</b></td><td><b>Encoding Method</b></td></tr>
<tr><td valign="top"><img src="/assets/spherical-gaussians/radianceMCIS.png" /></td><td valign="top"><img src="/assets/spherical-gaussians/irradianceMCIS.png" /></td><td><center>N/A</center></td><td>Reference</td></tr>
<tr><td valign="top"><img src="/assets/spherical-gaussians/radianceSG.png" /><br />RMS: 4.24341</td><td valign="top"><img src="/assets/spherical-gaussians/irradianceSG.png" /><br />RMS: 0.44259</td><td valign="top"><img src="/assets/spherical-gaussians/irradianceErrorSG.png" /></td><td>Naïve</td></tr>
<tr><td valign="top"><img src="/assets/spherical-gaussians/radianceSGLS.png" /><br />RMS: 3.81043</td><td valign="top"><img src="/assets/spherical-gaussians/irradianceSGLS.png" /><br />RMS: 0.241267</td><td valign="top"><img src="/assets/spherical-gaussians/irradianceErrorSGLS.png" /></td><td>Least Squares</td></tr>
<tr><td valign="top"><img src="/assets/spherical-gaussians/radianceSGRA2.png" /><br />RMS: 3.80723</td><td valign="top"><img src="/assets/spherical-gaussians/irradianceSGRA2.png" /><br />RMS: 0.243617</td><td valign="top"><img src="/assets/spherical-gaussians/irradianceErrorSGRA2.png" /></td><td>Running Average</td></tr>
<tr><td valign="top"><img src="/assets/spherical-gaussians/radianceSGNNLS.png" /><br />RMS: 3.93677</td><td valign="top"><img src="/assets/spherical-gaussians/irradianceSGNNLS.png" /><br />RMS: 0.808848</td><td valign="top"><img src="/assets/spherical-gaussians/irradianceErrorSGNNLS.png" /></td><td>Non-Negative Least Squares</td></tr>
<tr><td valign="top"><img src="/assets/spherical-gaussians/radianceSGNNRA2.png" /><br />RMS: 3.93653</td><td valign="top"><img src="/assets/spherical-gaussians/irradianceSGNNRA2.png" /><br />RMS: 0.807047</td><td valign="top"><img src="/assets/spherical-gaussians/irradianceErrorSGNNRA2.png" /></td><td>Non-Negative Running Average</td></tr>
</table>
<p>It’s a marked improvement over the naïve projection, and, depending on the sample distribution, can even achieve results as good as a standard least squares solve. And it all works on the fly, on the GPU, requiring only a per-lobe mean and the total sample weight for all lobes to be stored. For best quality, I recommend also storing a per-lobe estimate of the spherical integral, as is done above; however, that can be replaced with the precomputed spherical integral at a small increase in error.</p>
<p>The precomputed spherical integrals referenced in the code above are the integral of each basis function (e.g. each SG lobe) squared over the sampling domain. If the samples are distributed over a sphere then this has a closed-form solution; otherwise it can be precomputed using Monte-Carlo integration. Note that I’ve omitted dividing by the sampling PDF since the factors cancel out with the sampling in my algorithm above.</p>
<figure class="highlight"><pre><code class="language-swift" data-lang="swift"><span class="kd">struct</span> <span class="kt">SphericalGaussian</span> <span class="p">{</span>
<span class="k">var</span> <span class="nv">amplitude</span> <span class="p">:</span> <span class="n">float3</span>
<span class="k">var</span> <span class="nv">axis</span> <span class="p">:</span> <span class="n">float3</span>
<span class="k">var</span> <span class="nv">sharpness</span> <span class="p">:</span> <span class="kt">Float</span>
<span class="k">var</span> <span class="nv">sphericalIntegral</span> <span class="p">:</span> <span class="kt">Float</span> <span class="p">{</span>
<span class="nf">return</span> <span class="p">(</span><span class="mf">1.0</span> <span class="o">-</span> <span class="nf">exp</span><span class="p">(</span><span class="o">-</span><span class="mf">4.0</span> <span class="o">*</span> <span class="k">self</span><span class="o">.</span><span class="n">sharpness</span><span class="p">))</span> <span class="o">/</span> <span class="p">(</span><span class="mf">4.0</span> <span class="o">*</span> <span class="k">self</span><span class="o">.</span><span class="n">sharpness</span><span class="p">)</span>
<span class="p">}</span>
<span class="k">var</span> <span class="nv">hemisphericalIntegral</span> <span class="p">:</span> <span class="kt">Float</span> <span class="p">{</span>
<span class="k">var</span> <span class="nv">total</span> <span class="o">=</span> <span class="mf">0.0</span> <span class="k">as</span> <span class="kt">Float</span>
<span class="k">let</span> <span class="nv">sampleCount</span> <span class="o">=</span> <span class="mi">2048</span>
<span class="k">for</span> <span class="n">_</span> <span class="k">in</span> <span class="mi">0</span><span class="o">..<</span><span class="n">sampleCount</span> <span class="p">{</span>
<span class="k">let</span> <span class="nv">direction</span> <span class="o">=</span> <span class="n">float3</span><span class="o">.</span><span class="n">randomOnHemisphere</span>
<span class="k">let</span> <span class="nv">dotProduct</span> <span class="o">=</span> <span class="nf">dot</span><span class="p">(</span><span class="k">self</span><span class="o">.</span><span class="n">axis</span><span class="p">,</span> <span class="n">direction</span><span class="p">)</span>
<span class="k">let</span> <span class="nv">weight</span> <span class="o">=</span> <span class="nf">exp</span><span class="p">(</span><span class="k">self</span><span class="o">.</span><span class="n">sharpness</span> <span class="o">*</span> <span class="p">(</span><span class="n">dotProduct</span> <span class="o">-</span> <span class="mf">1.0</span><span class="p">))</span>
<span class="n">total</span> <span class="o">+=</span> <span class="n">weight</span> <span class="o">*</span> <span class="n">weight</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">total</span> <span class="o">/</span> <span class="kt">Float</span><span class="p">(</span><span class="n">sampleCount</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>I originally posted <a href="/rendering/irradiance-caching/spherical-gaussians/2018/09/21/spherical-gaussians-old.html">a variant of this algorithm</a> that contained a few approximations. After that post, <a href="https://twitter.com/PeterPikeSloan/status/1044482721223856128">Peter-Pike Sloan pointed out on Twitter</a> that least squares doesn’t necessarily require all samples to be stored; instead, you can accumulate the raw moments and then multiply by a lobeCount × lobeCount matrix to reconstruct the result. I realised my method was effectively an online approximation to this, and was able to correct a couple of approximations to bring it to match the equation.</p>
<p>The results are now as good as a standard least-squares solve if the sample directions are randomly distributed, although it does deteriorate if the sample directions are correlated (as they would be, say, if you were reading pixels row by row from a lat-long image map). Conveniently, when we’re accumulating samples from path tracing the sample directions are usually stratified or uniformly random. <sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup></p>
<p>This method does require at least 32-bit intermediates for accumulation; half-precision produces obvious visual artefacts and biasing towards high-intensity samples.</p>
<p>While testing this, I used <a href="https://github.com/kayru/Probulator">Probulator</a>, a useful open-source tool for testing different lighting encoding strategies. This method has also been merged into Probulator, and the source can be viewed <a href="https://github.com/kayru/Probulator/blob/86351e5f3ed78f086837e215f028a344b058dfb5/Source/Probulator/ExperimentSG.h#L155">here</a>.</p>
<p>If you want to see this method running in a lightmap baking context, Matt Pettineo has integrated it into <a href="https://github.com/TheRealMJP/BakingLab">The Baking Lab</a>. You can find it under the ‘Running Average’ and ‘Running Average Non-Negative’ solve modes.</p>
<hr />
<p><br /></p>
<p>Below is a comparison from within <a href="https://github.com/TheRealMJP/BakingLab">The Baking Lab</a> of the indirect specular from nine spherical Gaussian lobes using 10,000 samples per texel. The exposure has been turned up to more clearly show the results. There are slight visual differences if you flick back and forward, but it’s very close! The main difference is that the running average algorithm can find different local minima per-texel, and so results appear noisier across the texels.</p>
<p>Running Average:</p>
<p><img src="/assets/spherical-gaussians/BakingLab-RunningAverage.png" alt="Baking Lab Indirect Specular Running Average" /></p>
<p>Least Squares:</p>
<p><img src="/assets/spherical-gaussians/BakingLab-LeastSquares.png" alt="Baking Lab Indirect Specular Least Squares" /></p>
<hr />
<p><br /></p>
<p>The code above has been factorised to try to limit the number of operations on colours. For example, the lines:</p>
<figure class="highlight"><pre><code class="language-swift" data-lang="swift"><span class="k">let</span> <span class="nv">deltaScale</span> <span class="o">=</span> <span class="n">accelerationCoefficient</span> <span class="o">*</span> <span class="n">sampleWeightScale</span> <span class="o">*</span> <span class="n">weight</span> <span class="o">/</span> <span class="n">sphericalIntegral</span>
<span class="k">self</span><span class="o">.</span><span class="n">lobes</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+=</span> <span class="n">delta</span> <span class="o">*</span> <span class="n">deltaScale</span></code></pre></figure>
<p>are more naturally expressed as:</p>
<figure class="highlight"><pre><code class="language-swift" data-lang="swift"><span class="k">let</span> <span class="nv">projection</span> <span class="o">=</span> <span class="k">self</span><span class="o">.</span><span class="n">lobes</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">amplitude</span> <span class="o">*</span> <span class="n">weight</span>
<span class="k">let</span> <span class="nv">newValue</span> <span class="o">=</span> <span class="p">(</span><span class="n">delta</span> <span class="o">+</span> <span class="n">projection</span><span class="p">)</span> <span class="o">*</span> <span class="n">weight</span> <span class="o">/</span> <span class="n">sphericalIntegral</span>
<span class="k">self</span><span class="o">.</span><span class="n">lobes</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">amplitude</span> <span class="o">+=</span> <span class="p">(</span><span class="n">newValue</span> <span class="o">-</span> <span class="k">self</span><span class="o">.</span><span class="n">lobes</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">amplitude</span><span class="p">)</span> <span class="o">*</span> <span class="n">sampleWeightScale</span></code></pre></figure>
<p>These two snippets aren’t exactly equivalent. In factorising the code, I approximated <code class="language-plaintext highlighter-rouge">highlight swift</code><code class="language-plaintext highlighter-rouge">projection * weight * weight / sphericalIntegral</code> as simply <code class="language-plaintext highlighter-rouge">projection</code>; it turns out that making this approximation helps to avoid error in the method. To make them equivalent, the first snippet would be:</p>
<figure class="highlight"><pre><code class="language-swift" data-lang="swift"><span class="k">let</span> <span class="nv">dampingTerm</span> <span class="o">=</span> <span class="mf">1.0</span> <span class="o">+</span> <span class="p">(</span><span class="n">deltaScale</span> <span class="o">*</span> <span class="n">weight</span> <span class="o">-</span> <span class="n">sampleWeightScale</span><span class="p">)</span>
<span class="k">self</span><span class="o">.</span><span class="n">lobes</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">amplitude</span> <span class="o">*=</span> <span class="n">dampingTerm</span>
<span class="k">let</span> <span class="nv">deltaScale</span> <span class="o">=</span> <span class="n">accelerationCoefficient</span> <span class="o">*</span> <span class="n">sampleWeightScale</span> <span class="o">*</span> <span class="n">weight</span> <span class="o">/</span> <span class="n">sphericalIntegral</span>
<span class="k">self</span><span class="o">.</span><span class="n">lobes</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+=</span> <span class="n">delta</span> <span class="o">*</span> <span class="n">deltaScale</span></code></pre></figure>
<p>Similarly,</p>
<figure class="highlight"><pre><code class="language-swift" data-lang="swift"><span class="n">delta</span> <span class="o">*=</span> <span class="mf">1.0</span> <span class="o">-</span> <span class="n">deltaScale</span> <span class="o">*</span> <span class="n">weight</span></code></pre></figure>
<p>is equivalent to:</p>
<figure class="highlight"><pre><code class="language-swift" data-lang="swift"><span class="k">let</span> <span class="nv">oldAmplitude</span> <span class="o">=</span> <span class="k">self</span><span class="o">.</span><span class="n">lobes</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">amplitude</span> <span class="o">-</span> <span class="n">delta</span> <span class="o">*</span> <span class="n">deltaScale</span>
<span class="n">delta</span> <span class="o">=</span> <span class="n">delta</span> <span class="o">+</span> <span class="n">oldAmplitude</span> <span class="o">*</span> <span class="n">weight</span> <span class="o">-</span> <span class="k">self</span><span class="o">.</span><span class="n">lobes</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">amplitude</span> <span class="o">*</span> <span class="n">weight</span></code></pre></figure>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>It’s possible to perform least squares on the raw moments (the naïve projection) by <a href="https://twitter.com/PeterPikeSloan/status/1044783992191385600">multiplying by the inverse of the Gram matrix</a>, as is alluded to later in this post. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>One note of caution: some low-discrepancy sequences (e.g. fixed-length ones like the Hammersley sequence) will not work well since successive samples are correlated, even though the sequence is well-distributed over the entire domain. One way around this is to uniformly randomly shuffle the samples. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Update: This post has now been published in paper form as “Progressive Least-Squares Encoding for Linear Bases” in the open-access journal JCGT. Take a look there for a more formalised derivation and more details.(Archived Original Version) Spherical Gaussian Encoding2018-09-21T01:37:26+00:002018-09-21T01:37:26+00:00https://torust.me/2018/09/21/spherical-gaussians-old<blockquote>
<p>Note: this was the originally posted version of the running average algorithm. See <a href="/rendering/irradiance-caching/spherical-gaussians/2018/09/21/spherical-gaussians.html">the updated post</a> for a more accurate version.</p>
</blockquote>
<p>Spherical Gaussians are a useful tool for encoding precomputed radiance within a scene. Matt Pettineo has <a href="https://mynameismjp.wordpress.com/2016/10/09/sg-series-part-1-a-brief-and-incomplete-history-of-baked-lighting-representations/">an excellent series</a> describing the technical details and history behind them which I’d suggest reading before the rest of this post.</p>
<p>Recently, I’ve had need to build spherical Gaussian representations of a scene on the fly in the GPU path tracer I’m building for my Master’s project. Unlike, say, spherical harmonics, spherical Gaussian lobes don’t form a set of orthonormal bases; they need to be computed with a least-squares solve, which generally requires having access to all radiance samples at once (although, as Peter-Pike Sloan points out on Twitter, <a href="https://twitter.com/PeterPikeSloan/status/1044482721223856128">that doesn’t always have to be the case</a>). On the GPU, in a memory-constrained environment, this is highly impractical.</p>
<p>Instead, games like The Order: 1886 just projected the samples onto the spherical Gaussian lobes <a href="https://mynameismjp.wordpress.com/2016/10/09/sg-series-part-5-approximating-radiance-and-irradiance-with-sgs/">as if the lobes form an orthonormal basis</a>, which gave low-quality results that lack contrast. For my research, I wanted to see if I could do better.</p>
<p>After a bunch of experimentation, I found a new algorithm for accumulating spherical Gaussian samples that’s almost as good as a least-squares solve if the sample directions are randomly distributed, or is slightly better than a naïve projection if the sample directions are correlated (as they would be, say, if you were reading pixels row by row from a lat-long image map.) Conveniently, when we’re accumulating samples from path tracing the sample directions are usually stratified or uniformly random.</p>
<p>One note of caution: some low-discrepancy sequences (e.g. fixed-length ones like the Hammersley sequence) will not work well if successive samples are correlated, even though the sequence is well-distributed over the entire domain.</p>
<p>The algorithm is as follows, and supports both non-negative and regular solves:</p>
<figure class="highlight"><pre><code class="language-swift" data-lang="swift"><span class="kd">struct</span> <span class="kt">SphericalGaussianBasis</span> <span class="p">{</span>
<span class="k">var</span> <span class="nv">lobes</span> <span class="p">:</span> <span class="p">[</span><span class="kt">SphericalGaussian</span><span class="p">]</span>
<span class="k">var</span> <span class="nv">lobeWeights</span> <span class="o">=</span> <span class="p">[</span><span class="kt">Float</span><span class="p">](</span><span class="nv">repeating</span><span class="p">:</span> <span class="mf">0.0</span><span class="p">,</span> <span class="nv">count</span><span class="p">:</span> <span class="n">lobes</span><span class="o">.</span><span class="n">count</span><span class="p">)</span>
<span class="k">let</span> <span class="nv">nonNegativeSolve</span> <span class="p">:</span> <span class="kt">Bool</span>
<span class="k">mutating</span> <span class="kd">func</span> <span class="nf">accumulateSample</span><span class="p">(</span><span class="n">_</span> <span class="nv">sample</span><span class="p">:</span> <span class="kt">RadianceSample</span><span class="p">)</span> <span class="p">{</span>
<span class="k">var</span> <span class="nv">currentValue</span> <span class="o">=</span> <span class="nf">float3</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
<span class="k">var</span> <span class="nv">sampleLobeWeights</span> <span class="o">=</span> <span class="p">[</span><span class="kt">Float</span><span class="p">](</span><span class="nv">repeating</span><span class="p">:</span> <span class="mf">0.0</span><span class="p">,</span> <span class="nv">count</span><span class="p">:</span> <span class="k">self</span><span class="o">.</span><span class="n">lobes</span><span class="o">.</span><span class="n">count</span><span class="p">)</span>
<span class="k">for</span> <span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">lobe</span><span class="p">)</span> <span class="k">in</span> <span class="k">self</span><span class="o">.</span><span class="n">lobes</span><span class="o">.</span><span class="nf">enumerated</span><span class="p">()</span> <span class="p">{</span>
<span class="k">let</span> <span class="nv">dotProduct</span> <span class="o">=</span> <span class="nf">dot</span><span class="p">(</span><span class="n">lobe</span><span class="o">.</span><span class="n">axis</span><span class="p">,</span> <span class="n">sample</span><span class="o">.</span><span class="n">direction</span><span class="p">)</span>
<span class="k">let</span> <span class="nv">weight</span> <span class="o">=</span> <span class="nf">exp</span><span class="p">(</span><span class="n">lobe</span><span class="o">.</span><span class="n">sharpness</span> <span class="o">*</span> <span class="p">(</span><span class="n">dotProduct</span> <span class="o">-</span> <span class="mf">1.0</span><span class="p">))</span>
<span class="n">currentValue</span> <span class="o">+=</span> <span class="n">lobe</span><span class="o">.</span><span class="n">amplitude</span> <span class="o">*</span> <span class="n">weight</span>
<span class="n">sampleLobeWeights</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">weight</span>
<span class="p">}</span>
<span class="k">let</span> <span class="nv">deltaValue</span> <span class="o">=</span> <span class="n">sample</span><span class="o">.</span><span class="n">value</span> <span class="o">-</span> <span class="n">currentValue</span>
<span class="k">for</span> <span class="n">i</span> <span class="k">in</span> <span class="mi">0</span><span class="o">..<</span><span class="k">self</span><span class="o">.</span><span class="n">lobes</span><span class="o">.</span><span class="n">count</span> <span class="p">{</span>
<span class="k">let</span> <span class="nv">weight</span> <span class="o">=</span> <span class="n">sampleLobeWeights</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
<span class="k">if</span> <span class="n">weight</span> <span class="o">==</span> <span class="mi">0</span> <span class="p">{</span> <span class="k">continue</span> <span class="p">}</span>
<span class="n">lobeWeights</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+=</span> <span class="n">weight</span>
<span class="k">let</span> <span class="nv">weightScale</span> <span class="o">=</span> <span class="n">weight</span> <span class="o">/</span> <span class="k">self</span><span class="o">.</span><span class="n">lobeWeights</span><span class="p">[</span><span class="n">i</span><span class="p">]</span>
<span class="k">self</span><span class="o">.</span><span class="n">lobes</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">amplitude</span> <span class="o">+=</span> <span class="n">deltaValue</span> <span class="o">*</span> <span class="n">weightScale</span>
<span class="k">if</span> <span class="p">(</span><span class="k">self</span><span class="o">.</span><span class="n">nonNegativeSolve</span><span class="p">)</span> <span class="p">{</span>
<span class="k">self</span><span class="o">.</span><span class="n">lobes</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">amplitude</span> <span class="o">=</span> <span class="nf">max</span><span class="p">(</span><span class="k">self</span><span class="o">.</span><span class="n">lobes</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">.</span><span class="n">amplitude</span><span class="p">,</span> <span class="nf">float3</span><span class="p">(</span><span class="mi">0</span><span class="p">))</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>The basic idea is fairly simple: at each step, we evaluate the sum of the spherical Gaussian lobes in the new sample’s direction. Then, we evaluate the difference between the sample’s value and the current value. Finally, we adjust each lobe’s amplitude towards the sample’s amplitude according to its weight. If we want the lobe to be non-negative, we simply clamp it; the next sample to come in will then be evaluated against the set of non-negative lobes. I’ve called this a ‘running average’; perhaps a better description is that is uses gradient descent to solve for the radiance.</p>
<p>So how does it look? Well, here are the results for the ‘ennis.hdr’ environment map using uniformly randomly shuffled sample directions and twelve lobes (‘Running Average’ is my new method):</p>
<table>
<tr><td><b>Radiance</b></td><td><b>Irradiance</b></td><td><b>Irradiance Error (sMAPE)</b></td><td><b>Encoding Method</b></td></tr>
<tr><td valign="top"><img src="/assets/spherical-gaussians/radianceMCIS.png" /></td><td valign="top"><img src="/assets/spherical-gaussians/irradianceMCIS.png" /></td><td><center>N/A</center></td><td>Reference</td></tr>
<tr><td valign="top"><img src="/assets/spherical-gaussians/radianceSG.png" /><br />RMS: 4.24835</td><td valign="top"><img src="/assets/spherical-gaussians/irradianceSG.png" /><br />RMS: 0.441404</td><td valign="top"><img src="/assets/spherical-gaussians/irradianceErrorSG.png" /></td><td>Naïve</td></tr>
<tr><td valign="top"><img src="/assets/spherical-gaussians/radianceSGLS.png" /><br />RMS: 3.80738</td><td valign="top"><img src="/assets/spherical-gaussians/irradianceSGLS.png" /><br />RMS: 0.228353</td><td valign="top"><img src="/assets/spherical-gaussians/irradianceErrorSGLS.png" /></td><td>Least Squares</td></tr>
<tr><td valign="top"><img src="/assets/spherical-gaussians/radianceSGRA.png" /><br />RMS: 3.81673</td><td valign="top"><img src="/assets/spherical-gaussians/irradianceSGRA.png" /><br />RMS: 0.221796</td><td valign="top"><img src="/assets/spherical-gaussians/irradianceErrorSGRA.png" /></td><td>Running Average</td></tr>
<tr><td valign="top"><img src="/assets/spherical-gaussians/radianceSGNNLS.png" /><br />RMS: 3.9339</td><td valign="top"><img src="/assets/spherical-gaussians/irradianceSGNNLS.png" /><br />RMS: 0.786142</td><td valign="top"><img src="/assets/spherical-gaussians/irradianceErrorSGNNLS.png" /></td><td>Non-Negative Least Squares</td></tr>
<tr><td valign="top"><img src="/assets/spherical-gaussians/radianceSGNNRA.png" /><br />RMS: 3.92593</td><td valign="top"><img src="/assets/spherical-gaussians/irradianceSGNNRA.png" /><br />RMS: 0.670631</td><td valign="top"><img src="/assets/spherical-gaussians/irradianceErrorSGNNRA.png" /></td><td>Non-Negative Running Average</td></tr>
</table>
<p>It’s a marked improvement over the naïve projection, and, depending on the sample distribution, can be imperceptibly different from a standard least squares solve. And it all works on the fly, on the GPU, requiring only a per-lobe mean and weight to be stored.</p>
<p>In these images, I’m using <a href="https://mynameismjp.wordpress.com/2016/10/09/sg-series-part-3-diffuse-lighting-from-an-sg-light-source/">Stephen Hill’s fitted approximation for a cosine lobe</a> to evaluate the irradiance for all encoding methods.</p>
<blockquote>
<p>Update: Matt Pettineo has integrated this new method into <a href="https://github.com/TheRealMJP/BakingLab">The Baking Lab</a>. If you want to take a look you can find it under the ‘Running Average’ and ‘Running Average Non-Negative’ solve modes.</p>
</blockquote>
<p>As I progress on my thesis, I hope to uncover more of the reasoning behind <em>why</em> it works so well, and I’m also hopeful it can find applications for this in other encoding schemes (<a href="https://research.activision.com/t5/Publications/Ambient-Dice/ba-p/10284641">Ambient Dice</a>, perhaps?).</p>
<p><em>While testing this, I used <a href="https://github.com/kayru/Probulator">Probulator</a>, a useful open-source tool for testing different lighting encoding strategies. The source code for the implementation of this method within Probulator is below.</em></p>
<p><em>Note that the original implementation of irradiance for the naïve projection in Probulator cheats a little bit since it uses a non-cosine BRDF to evaluate the spherical Gaussian lobes. If we’re also using the spherical Gaussian to encode radiance for specular lighting, compensating with the BRDF doesn’t really work; we want things to be consistent.</em></p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">class</span> <span class="nc">ExperimentSGRunningAverage</span> <span class="o">:</span> <span class="k">public</span> <span class="n">ExperimentSGBase</span>
<span class="p">{</span>
<span class="nl">public:</span>
<span class="kt">void</span> <span class="n">solveForRadiance</span><span class="p">(</span><span class="k">const</span> <span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">RadianceSample</span><span class="o">>&</span> <span class="n">_radianceSamples</span><span class="p">)</span> <span class="k">override</span>
<span class="p">{</span>
<span class="k">const</span> <span class="n">u32</span> <span class="n">lobeCount</span> <span class="o">=</span> <span class="p">(</span><span class="n">u32</span><span class="p">)</span><span class="n">m_lobes</span><span class="p">.</span><span class="n">size</span><span class="p">();</span>
<span class="kt">float</span> <span class="n">lobeWeights</span><span class="p">[</span><span class="n">lobeCount</span><span class="p">];</span>
<span class="k">for</span> <span class="p">(</span><span class="n">u32</span> <span class="n">lobeIt</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">lobeIt</span> <span class="o"><</span> <span class="n">lobeCount</span><span class="p">;</span> <span class="o">++</span><span class="n">lobeIt</span><span class="p">)</span> <span class="p">{</span>
<span class="n">lobeWeights</span><span class="p">[</span><span class="n">lobeIt</span><span class="p">]</span> <span class="o">=</span> <span class="mf">0.</span><span class="n">f</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">RadianceSample</span><span class="o">></span> <span class="n">radianceSamples</span> <span class="o">=</span> <span class="n">_radianceSamples</span><span class="p">;</span>
<span class="c1">// The samples should be uniformly randomly distributed (or stratified) for best results.</span>
<span class="n">std</span><span class="o">::</span><span class="n">random_shuffle</span><span class="p">(</span><span class="n">radianceSamples</span><span class="p">.</span><span class="n">begin</span><span class="p">(),</span> <span class="n">radianceSamples</span><span class="p">.</span><span class="n">end</span><span class="p">());</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">sampleIdx</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">sampleIdx</span> <span class="o"><</span> <span class="n">radianceSamples</span><span class="p">.</span><span class="n">size</span><span class="p">();</span> <span class="n">sampleIdx</span> <span class="o">+=</span> <span class="mi">1</span><span class="p">)</span> <span class="p">{</span>
<span class="k">const</span> <span class="n">RadianceSample</span><span class="o">&</span> <span class="n">sample</span> <span class="o">=</span> <span class="n">radianceSamples</span><span class="p">[</span><span class="n">sampleIdx</span><span class="p">];</span>
<span class="n">vec3</span> <span class="n">currentValue</span> <span class="o">=</span> <span class="n">vec3</span><span class="p">(</span><span class="mf">0.</span><span class="n">f</span><span class="p">);</span>
<span class="kt">float</span> <span class="n">sampleLobeWeights</span><span class="p">[</span><span class="n">lobeCount</span><span class="p">];</span>
<span class="k">for</span> <span class="p">(</span><span class="n">u32</span> <span class="n">lobeIt</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">lobeIt</span> <span class="o"><</span> <span class="n">lobeCount</span><span class="p">;</span> <span class="o">++</span><span class="n">lobeIt</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">float</span> <span class="n">dotProduct</span> <span class="o">=</span> <span class="n">dot</span><span class="p">(</span><span class="n">m_lobes</span><span class="p">[</span><span class="n">lobeIt</span><span class="p">].</span><span class="n">p</span><span class="p">,</span> <span class="n">sample</span><span class="p">.</span><span class="n">direction</span><span class="p">);</span>
<span class="kt">float</span> <span class="n">weight</span> <span class="o">=</span> <span class="n">exp</span><span class="p">(</span><span class="n">m_lobes</span><span class="p">[</span><span class="n">lobeIt</span><span class="p">].</span><span class="n">lambda</span> <span class="o">*</span> <span class="p">(</span><span class="n">dotProduct</span> <span class="o">-</span> <span class="mf">1.0</span><span class="p">));</span>
<span class="n">currentValue</span> <span class="o">+=</span> <span class="n">m_lobes</span><span class="p">[</span><span class="n">lobeIt</span><span class="p">].</span><span class="n">mu</span> <span class="o">*</span> <span class="n">weight</span><span class="p">;</span>
<span class="n">sampleLobeWeights</span><span class="p">[</span><span class="n">lobeIt</span><span class="p">]</span> <span class="o">=</span> <span class="n">weight</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">vec3</span> <span class="n">deltaValue</span> <span class="o">=</span> <span class="n">sample</span><span class="p">.</span><span class="n">value</span> <span class="o">-</span> <span class="n">currentValue</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="n">u32</span> <span class="n">lobeIt</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">lobeIt</span> <span class="o"><</span> <span class="n">lobeCount</span><span class="p">;</span> <span class="o">++</span><span class="n">lobeIt</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">float</span> <span class="n">weight</span> <span class="o">=</span> <span class="n">sampleLobeWeights</span><span class="p">[</span><span class="n">lobeIt</span><span class="p">];</span>
<span class="k">if</span> <span class="p">(</span><span class="n">weight</span> <span class="o">==</span> <span class="mf">0.</span><span class="n">f</span><span class="p">)</span> <span class="p">{</span> <span class="k">continue</span><span class="p">;</span> <span class="p">}</span>
<span class="n">lobeWeights</span><span class="p">[</span><span class="n">lobeIt</span><span class="p">]</span> <span class="o">+=</span> <span class="n">weight</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">weightScale</span> <span class="o">=</span> <span class="n">weight</span> <span class="o">/</span> <span class="n">lobeWeights</span><span class="p">[</span><span class="n">lobeIt</span><span class="p">];</span>
<span class="n">m_lobes</span><span class="p">[</span><span class="n">lobeIt</span><span class="p">].</span><span class="n">mu</span> <span class="o">+=</span> <span class="n">deltaValue</span> <span class="o">*</span> <span class="n">weightScale</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">m_nonNegativeSolve</span><span class="p">)</span> <span class="p">{</span>
<span class="n">m_lobes</span><span class="p">[</span><span class="n">lobeIt</span><span class="p">].</span><span class="n">mu</span> <span class="o">=</span> <span class="n">max</span><span class="p">(</span><span class="n">m_lobes</span><span class="p">[</span><span class="n">lobeIt</span><span class="p">].</span><span class="n">mu</span><span class="p">,</span> <span class="n">vec3</span><span class="p">(</span><span class="mf">0.</span><span class="n">f</span><span class="p">));</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>Note: this was the originally posted version of the running average algorithm. See the updated post for a more accurate version.