Spherical Gaussian Encoding
Spherical Gaussians are a useful tool for encoding precomputed radiance within a scene. Matt Pettineo has an excellent series describing the technical details and history behind them which I’d suggest reading before the rest of this post.
Recently, I’ve had need to build spherical Gaussian representations of a scene on the fly in the GPU path tracer I’m building for my Master’s project. Unlike, say, spherical harmonics, spherical Gaussian lobes don’t form a set of orthonormal bases; they need to be computed with a leastsquares solve, which is generally^{1} done with access to all radiance samples at once. On the GPU, in a memoryconstrained environment, this is highly impractical.
Instead, games like The Order: 1886 just projected the samples onto the spherical Gaussian lobes as if the lobes form an orthonormal basis, which gave lowquality results that lack contrast. For my research, I wanted to see if I could do better.
After a bunch of experimentation, I found a new (to my knowledge) algorithm for progressively accumulating spherical Gaussian samples that gives results matching a standard least squares solve.
The algorithm is as follows, and supports both nonnegative and regular solves. While the implementation focuses on spherical Gaussians, the algorithm should work for any set of spherical basis functions, provided those basis functions don’t overlap too much.
I’ve called this a ‘running average’ since each new sample is evaluated against the MonteCarlo estimate of the function based on the previous samples. If we want each lobe to be nonnegative, we simply clamp it; the next sample to come in will then be evaluated against the set of nonnegative lobes.
I provide a mathematical derivation of the method here.
So how does it look? Well, here are the results for the ‘ennis.hdr’ environment map using Haltonsequence sample directions and twelve lobes, where ‘Running Average’ is my new method. In these images, I’m using Stephen Hill’s fitted approximation for a cosine lobe to evaluate the irradiance for all encoding methods.
Radiance  Irradiance  Irradiance Error (sMAPE)  Encoding Method 
Reference  
RMS: 4.24341  RMS: 0.44259  Naïve  
RMS: 3.81043  RMS: 0.241267  Least Squares  
RMS: 3.80723  RMS: 0.243617  Running Average  
RMS: 3.93677  RMS: 0.808848  NonNegative Least Squares  
RMS: 3.93653  RMS: 0.807047  NonNegative Running Average 
It’s a marked improvement over the naïve projection, and, depending on the sample distribution, can even achieve results as good as a standard least squares solve. And it all works on the fly, on the GPU, requiring only a perlobe mean and the total sample weight for all lobes to be stored. For best quality, I recommend also storing a perlobe estimate of the spherical integral, as is done above; however, that can be replaced with the precomputed spherical integral at a small increase in error.
The precomputed spherical integrals referenced in the code above are the integral of each basis function (e.g. each SG lobe) squared over the sampling domain. If the samples are distributed over a sphere then this has a closedform solution; otherwise it can be precomputed using MonteCarlo integration. Note that I’ve omitted dividing by the sampling PDF since the factors cancel out with the sampling in my algorithm above.
I originally posted a variant of this algorithm that contained a few approximations. After that post, PeterPike Sloan pointed out on Twitter that least squares doesn’t necessarily require all samples to be stored; instead, you can accumulate the raw moments and then multiply by a lobeCount × lobeCount matrix to reconstruct the result. I realised my method was effectively an online approximation to this, and was able to correct a couple of approximations to bring it to match the equation.
The results are now as good as a standard leastsquares solve if the sample directions are randomly distributed, although it does deteriorate if the sample directions are correlated (as they would be, say, if you were reading pixels row by row from a latlong image map). Conveniently, when we’re accumulating samples from path tracing the sample directions are usually stratified or uniformly random. ^{2}
This method does require at least 32bit intermediates for accumulation; halfprecision produces obvious visual artefacts and biasing towards highintensity samples.
While testing this, I used Probulator, a useful opensource tool for testing different lighting encoding strategies. This method has also been merged into Probulator, and the source can be viewed here.
If you want to see this method running in a lightmap baking context, Matt Pettineo has integrated it into The Baking Lab. You can find it under the ‘Running Average’ and ‘Running Average NonNegative’ solve modes.
Below is a comparison from within The Baking Lab of the indirect specular from nine spherical Gaussian lobes using 10,000 samples per texel. The exposure has been turned up to more clearly show the results. There are slight visual differences if you flick back and forward, but it’s very close! The main difference is that the running average algorithm can find different local minima pertexel, and so results appear noisier across the texels.
Running Average:
Least Squares:
The code above has been factorised to try to limit the number of operations on colours. For example, the lines:
are more naturally expressed as:
These two snippets aren’t exactly equivalent. In factorising the code, I approximated highlight swift
projection * weight * weight / sphericalIntegral
as simply projection
; it turns out that making this approximation helps to avoid error in the method. To make them equivalent, the first snippet would be:
Similarly,
is equivalent to:

It’s possible to perform least squares on the raw moments (the naïve projection) by multiplying by the inverse of the Gram matrix, as is alluded to later in this post. ↩

One note of caution: some lowdiscrepancy sequences (e.g. fixedlength ones like the Hammersley sequence) will not work well since successive samples are correlated, even though the sequence is welldistributed over the entire domain. One way around this is to uniformly randomly shuffle the samples. ↩