The slides of the GDC tutorials are already online. I was not at GDC on Monday and Tuesday, so I just gave them a quick glance to see if there was anything interesting.
One that caught my attention was AMD’s presentation on high quality shadow filtering. It seems that most of the talk was dedicated to showing that Gaussian shadow filtering looks better than uniform, and that the former can be implemented much more efficiently by using Gather4.
In particular, they claim that with Gather4 it’s possible to sample a N x N kernel in (N-1)*(N-1)/4 samples, and that using bilinear filtering you need to at least use (N-1)*(N-1)/2. That has a bit of truth, but it’s not entirely correct.
Gather4 is a new texture sampling functionality available in D3D10.1. It was previously exposed in D3D9 as Fetch4 through one of these IHV-specific FOURCC codes.
Gather4 allows you to sample a single channel texture simultaneously fetching 4 adjacent texels instead of the bilinearly filtered result. These 4 values are returned to the shader as a single float4. This doesn’t reduce the bus traffic, but reduces the number of texture samples.
The argument of the talk is that sampling shadow maps with Gaussian convolution filters is one of the cases, where this functionality is useful. Let’s see if that’s correct.
With Gather4 you can weight the four texels independently, while with regular texture fetches you are more constrained; you can only bilinearly blend the 4 texels and scale the result. Ideally you would like to choose the bilinear coordinates and the scale so that:
![]()
where w_i are the desired filter weights, s is the scale, b_0, b_1 are the bilinear coordinates, and B_i are the basis functions of the bilinear interpolation:
![]()
However, that gives you 4 equations with 3 unknowns, and in general it’s not possible to solve it exactly. Instead you can approximate it in the least squares sense:
![]()
Note, that this is not a typical linear least squares problem, because the bilinear basis is not linear, but quadratic. I suppose there are better ways of solving it, but since the search space is fairly small (the 0,0 - 1,1 square) I just do a brute force search.
I wrote a little program that does that. Given a 2D convolution kernel it computes the optimal location of the texture samples, the corresponding weights and measures the error of the approximation. You can download it here.
I run it on a 4×4 Gaussian kernel like the one used in the presentation. Since the standard deviation is not given, I assume it was just 1. I don’t window the filter, but normalize the weights so that the sum is equal to 1. It should look as follows:
0.01808 0.04915 0.04915 0.01808
0.04915 0.13361 0.13361 0.04915
0.04915 0.13361 0.13361 0.04915
0.01808 0.04915 0.04915 0.01808
The output of my program is the following:
(-0.768555, -0.768555), 0.250000
(0.768555, -0.768555), 0.250000
(-0.768555, 0.768555), 0.250000
(0.768555, 0.768555), 0.250000
and the resulting average error is 0.000163. I tried with larger kernels and various deviations and in all cases the measured error is insignificant. Note that this technique does not only work on all GPUs, but it’s also faster, because the number of shader instructions is reduced by moving some computations to the texture sampler, and as a result bus traffic is significantly reduced because only the filtered result is returned.
Even if you could find a case where the errors became significant enough, the number of samples required to evaluate the filter exactly is not always (N-1)*(N-1)/2 as claimed in the talk, but approaches (N-1)*(N-1)/3 in the limit. You just have to notice that a pair of adjacent texture samples provide you with 6 degrees of freedom, which is equal to the number of texels touched by the pair of texture samples:
*---*---*
| x | x |
*---*---*
2 Trackbacks
[...] while ago I wrote about high quality shadow filtering, and in particular about how to approximate a Gaussian using bilinear taps. Today I just noticed [...]
[...] complete without some shadow links. This one is no exception. First, Ignacio has a good post about Gaussian filtering of shadow maps, though his observation is (as with almost everything) previously known. This pair [...]