Vous êtes sur la page 1sur 37

Global Illumination in a Nutshell

(a very big nutshell)

As much as possible, I'm going to have to try and trim this down, but there's a
limit to how much one can simplify the concept of Global Illumination (GI). I
will try to work from more basic to more complicated, but in practice, it isn't
quite fair to say that. Personally, I find path tracing simpler than raytracing even
though it qualifies as a more advanced topic, but maybe that's just me. I'm going
to assume for now that everyone who'd actually read this has somewhat of an
idea about graphics programming and the math involved. And they're probably
familiar with basic lighting computations and at least the basic concepts. Of
course, my rhetoric may precede my judgment in how much people might know,
but I'll try to make it introductory, anyway.
Let's first consider "what is the goal of illumination models when we're
rendering?" As much as one might tell you otherwise, we're not really trying to
simulate light as it really works. We're trying to get something that looks good
enough to fool the eye. And there are a few well-known, well-accepted models to
handle these operations. Though far fewer can really handle all sorts of light
transport. The first illumination method that we all probably learn is raytracing
and/or raycasting. Even the simple lighting we perform on our beloved polygonpushing engines to make them look pretty is really just plain old raycasting,
albeit not necessarily performed at every pixel. Raycasting and raytracing are
essentially the same, except that raycasting is what is referred to as only "the first
hit" in the process of raytracing.
Of course, there's a simple problem with raytracing (or, more accurately, the
common Whitted raytracer that we associate with the term); it is simply a solver
for direct illumination. That leads us to defining what "Global Illumination"
really is. Direct illumination as solved by a Whitted raytracer only solves a single
interaction - that between objects and emitters of light. And more than that, the
only type of light emitter it handles natively is point light sources. In real life,
there's nothing in existence that really emits a significant amount of light from a
source that is infinitesimally small. Anything that emits light in extents that we
can really see actually takes up some amount of space. We can fake this by
simply having a small distribution of point lights occupying an area, but we'd
need many of them to get a decent result. This is not to say that simple direct
illumination has no place when we're trying to create realistic renders. Pixar's
Renderman, for instance, is strictly direct illumination and yet it has consistently

produced quite fooling results. But to produce convincingly realistic rendering


does mean that however we accomplish it, we need to simulate indirect lighting
in some way or another. In old fashioned polygon rendering and simple
raytracing, we fake this by using a constant "ambient" lighting term. All is not
well with just a flat constant value. If nothing else, it makes a scene look dead. If
one were to look at indirect illumination by itself with the direct illumination
factor taken out, it can look almost flat in many cases, but there are subtle local
variations, and more importantly, hue variations in light sources and objects
within a scene are a major contributor to local variations in the indirect
illumination.
Note that when I say "fooling the eye", we do have to take into account matters
of relative significance. Our eyes are actually sensitive enough to capture light
flux as small as 5 individual photons passing through to strike the retina. But this
does not mean that we won't fool a viewer until we achieve that level of
precision. One of the things our brains do for our vision is adjust to varying light
conditions. We most certainly would not be able to see a 5-photon flux in broad
daylight. And yet even though the sunlight beating down is so bright as to blind
us, when we look at objects in daylight, the lighting appears fairly even. Even
with all that, we can look at a television screen and look at a comparatively dim
and dull image of the sun, but be convinced that it is, in fact bright. This is in
spite of the fact that no display technology in the world can come close to
covering the entire gamut of human visual perception. And in the end, this is
what affects how we reach our goal of fooling the eye. If we were to get too
caught up in accurately simulating optics, we'll have results that may be
numerically correct but aesthetically wrong.
See the world through the eyes of a polygon...
So what can we do to step up from the direct illumination of a raytracer? Well,
perhaps the first thing we might think is "raytracing is no good for indirect
lighting, let's look somewhere else." Which takes us to a GI algorithm which is
probably both the most underrated and most overrated at the same time radiosity. Radiosity is a lighting technique that is actually borne out of finite
element methods for radiant heat transfer. As a finite element method itself, one
would actually break up the scene into very small elements. Radiosity takes a
somewhat backwards approach than one might immediately think. Rather than
having paths from lightsources to surfaces, radiosity simulates diffuse light
transport in terms of visibility between surfaces. i.e. Rather than putting lighting
in terms how much light reaches a surface, we put it in terms of how much light
the surface can see. Although, to be complete about it, there are "shooting"

variants to radiosity, which do show their strengths over collecting types of


radiosity in the fact that they affect more surfaces earlier.
A traditional radiosity implementation puts the question of visibility of light in
terms of a massive matrix called the form factor matrix. Actually, the term "form
factor" refers to individual pairings of surfaces and their respective fractions of
light leaving one surface that reaches the other. The "form factor matrix" is
simply the matrix that contains the full set of form factors. This matrix is step by
step reduced, solving for an eigenvector until an equilibrium solution is reached.
That is, there is the same amount of light leaving every surface as there is hitting
it. In practice, this is very expensive as the scene gets larger and/or the element
sizes are reduced. More importantly, let's not forget that we don't really need to
go so far - we only need to get a good enough result that it fools the eye.
The form factor between two surfaces is expressed in the form --

F is the form factor between surfaces i and j. A(i) is the area of surface patch i.
"v" is any vector connecting the two surfaces. Phi(i) and Phi(j) are both the
angles between v and the respective surface normals. Note that the only thing that
affects the transmission intensity is simply the cosines of the angles between light
transport direction and the surface normals. This is just the same as Lambert's
law which we all know and love. And that is one of the main weak points of
radiosity -- it only works with lambertian surfaces. Of course, this does give us
one little advantage - the lighting we calculate by radiosity is view-independent,
so it only needs to be calculated a single time for a single scene. If we were to
take the full set of all pairings and put them into a matrix, what we have is simply
--

The "rho" terms are the reflectivity of each surface - in simpler terms, rho(i) is
essentially the color of surface i. Some sense of reality does need to be taken into
account when choosing these colors. There is no such thing in real life as a

perfectly black or a perfectly red or perfectly blue surface. Any surface that looks
blue will still reflect some amount red and green. Of course, in practice, an RGB
color model often causes the color bleeding from indirect lighting to be overly
exaggerated or extremely underrepresented. Serious implementations use spectral
color models to remedy this, but regardless, this matrix, if multiplied by the nvector of all the radiosities of the n patches in a scene, will produce a result that
is supposed to equal the n-vector of the emissivities of all patches in the scene.
Well, there are better, more straightforward ways to compute radiosity, and
generally, the most common approach (or set of approaches) is progressive
refinement. This is where a simple approximation of lighting is made throughout
a scene, and then all lightsources that were used in this approximation are taken
out of consideration, and the lighting approximation is treated as a set of
lightsources by which to compute another lighting approximation. The new
lighting approximation then gets used as a set of new lightsources, and the
process repeats itself. What is happening is that we end up iteratively improving
the result and converging ever slowly towards an ideal solution. Of course, we
can really stop plenty of iterations short of complete convergence and still have a
result that looks good, and that's the main reason progressive refinement
solutions are preferred.
The one type of radiosity form factor computation method that probably gets
mentioned most is hemicubes. A hemicube is exactly what it sounds like; half of
a cube. Really, it's no different if we use a hemicube or a hemisphere. The
advantage of using a cube is the fact that when we render the scene from a patch's
point of view, we can use the normal linear camera model we're so used to and
maintain a certain, *almost* constant level of detail no matter what direction we
sample in. With hemicubes, we simply render the scene from the viewpoint of
every patch. Any sources of light, whether it is from an actual lightsource or a
light reflecting off of a surface in the scene, will contribute to the lighting of the
patch. The image we see is weighted in accordance to the relative cosines and
relative distances of every pixel on the hemicube (this weighting is the same for
any hemicube). This weighting not only corrects for the direction of incoming
light, but also corrects for the "flatness" of the hemicube shape, which causes
some distortion in the sampled image. The sum of these weighted pixel values
throughout the image is the amount of light we see at this patch - it makes for a
very simple approximation of the double-integral above.

1 iteration

32 iterations

320 iterations

After one iteration, the results look pretty much like ordinary direct lighting.
That's to be expected since only the surfaces that could see the emitting area light
in the ceiling receive any light in the first pass. After 32 iterations, we see some
color bleeding as a result of light reflecting off of the red and green walls. In
addition, the previously gray walls look a little closer to white. The ceiling,
which previously was dark, is now visible as a result of diffuse interreflections
from other walls. Also, the yellow box is more visible on all sides, and its shadow
has softened. However, with only 32 iterations, the edges and corners seem to
pose problems. This is because they are immediately adjacent to other patches
that would pretty much block out entire faces of the hemicube, thus preventing
you from getting a reasonable sampling of everything else in the environment,
but given enough iterations, it can self-heal. Also, at 1 as well as 32 iterations, we
see some noise near corners and shadows on the floor, as well as some banding
on the vertical walls. This is due to the fact that hemicubes can take up a good
deal of memory, so in general, they are very low resolution. Low resolution
implies that you are subject to a good deal of aliasing artifacts. Many of these
problems disappeared on their own after more iterations, but as a rule, it's not
good to depend on luck and apply some filtering to your hemicube input [Hugo
Elias writes of an excellent hemicube filtering technique that not only reduces
overall computation time, but doesn't affect the overall energy input from a
sample], and perhaps also some smoothing passes on the patches themselves. In

this particular scene, the reflectivities of the surfaces were sufficiently large (and
the scene itself is sufficiently small) that a mere 320 iterations is fairly effective.
There are some shadowing artifacts still remaining, which are a result of the
model itself -- the colored walls are separate planes that clip into the existing
box, which has had faces deleted. Even so, JPEG artifacts aside, the result after
320 iterations, is quite good (for a full resolution PNG file, click here). In a
normal scene without very strong lighting, though, one should really expect
several more iterations to be necessary before a good result is apparent. On a
side note, I would like to mention that if the images look dark to you, it's
probably because they've been gamma-adjusted for a setting of 2.20. I rendered
and viewed these on an LCD screen of pretty high contrast ratio.
Many people often start out in their quests to solve GI by implementing radiosity.
Why? Because it's quite simple to take the results, apply it to a very dense
polygon mesh and then render it in a simple OpenGL or D3D render window and
then view the result from any viewpoint. I did say earlier, however, that radiosity
only handles Lambertian surfaces. This isn't quite the whole truth. The fact is that
what it handles is light being reflected, absorbed, or scattered equally with
respect to all directions. In the case of surface properties, this means Lambertian
surfaces. But radiosity can also work with volumes and participating media.
Again, it does have to be an equal-scattering medium, or what is referred to as an
isotropically scattering medium. Note that with a surface, it is assumed that light
can only be reflected or absorbed, which limits the scattering range to a
hemisphere (or hemicube) about the normal to the surface. In a volume, it is
possible for light to pass straight through without being affected at all, or only be
slightly scattered. Since we're talking about an isotropic volume, we know that
all outcomes are equally likely. This means that for volumes, we don't sample
form factors with hemicubes, but whole cubes. Also, there are variations on
ordinary radiosity that allow for specular-diffuse interactions, but specularspecular is a bit more of an issue. Some of the earliest global illumination
solvers actually used raytracing to solve specular components and radiosity to
handle diffuse components. Even with that, there is still not enough to say that
we have a very comprehensive global illumination solver.
I left my heart in Monte Carlo
Those of us who have walked into graphics starting out with realtime rendering
techniques and knowing little of BRDFs and BSDFs outside of Phong and Blinn
models would probably have been quite taken aback at the sight of a double area
integral earlier on. Eeep!! What the hell is this?!? Scary math!! Frankly, we're
just getting started. The reason this scary math even exists is because GI in itself
is the process of evaluating an integral. Just wrap your mind around that for a

moment. How is that the case? Well, first think about the term "global
illumination". Global, as in, from everywhere of every type, in every possible
way. In short, there is light bouncing around every which way, and so no matter
where you are in the world, there is light coming into that point from every
conceivable direction. Now in a perfect world where we have infinite memory
and infinite processor power, we could gather information about all the light in
every direction and add it all up including not only direct lighting, but also light
that has bounced around an infinite number of times... and every other source of
light in between.
Well, in practice, we don't have infinite memory or infinite processor power, so
we have no choice. First of all, we have to limit the number of bounces we can
consider (and only a few are really necessary in most cases anyway). We can't
really sample lighting in *every* direction. We can, however, sample lighting
information in many, many directions and come close to the same result. The
idea being that the more directions we sample along, the tighter the "grid" of
solid angles which we use to evaluate the total set of all lighting. Well, if we're
trying to sum values of a function which is evaluated at several sample points...
and we're trying to space those samples closer and closer together and take more
and more total samples... What do you know? That's the same as plain old
numerical integration.
However, numerical integration is hellacious in terms of computational
resources. It would take forever if we tried to evaluate actual integrals. There is
something to save us, though -- Monte Carlo integration. It may sound
surprising, but Monte Carlo methods allow us to transform problems of
numerical quadrature into statistical problems -- more specifically, numerical
integration will become an expected value problem. This may sound somewhat
surprising, but it's true. So before I go on to explain Monte Carlo integration in a
rendering context, let me first try and give you an example in a simple integration
problem.
Let's consider this example case. It doesn't matter what f(x) is as long as we can
evaluate it. It doesn't matter what the value of "a" is as long as it's larger than the
highest value reached by the function in the domain we're trying to integrate
within. In this case, the domain we're integrating along is x = 0 to b. What we've
done is outlined a box where x is ranged [0..b], and y is ranged [0..a]. Now with
normal numerical quadrature, I'd make a whole bunch of thin vertical rectangles
or trapezoids or some other figure with vertical sides going straight up from the
x-axis to whatever the curves value is at that point. I'd compute the area of all
these really thin rectangles or whatever they are and just sum them up to get an

approximation of the area underneath f(x) from x = 0..b. If I want a really good
approximation, I'd use really thin
rectangles.
Now for the Monte Carlo
solution. Consider that box
we've marked out. Now we
know the area of that box. It's
simply a * b. How does that
help? Well, let's say I pick a
point p within that box. Any
point, doesn't matter where as
long as it's inside the box.
What's the probability that p is
underneath the curve? Well, we know that there exists some area, S, though we
don't yet know how much yet, underneath the curve. And similarly, there is it's
converse, T, which is the amount of area above the curve within that box. Okay,
we don't know what S and T are, but we do know this. The area under the curve
plus the area above the curve should be equal to all the area that exists in the
domain (the entire area of the box). That is, S+T = a*b. So out of all that area,
a*b, the area under the curve occupies a fraction of it which is equal to S / (a *
b). That's the answer to our probability question for p... What chance do we
have of selecting a random point underneath the curve within the constraints of
that arbitrary box? Simply, (area under the curve) / (area of the box). And this is
the same probability for every point that we might want to check, since they're all
essentially independent attempts.
As I said, though, Monte Carlo integration is an expected value problem. So
what we do is select a whole bunch of randomly selected points inside that box.
Some of them will be above the curve, and some will be below. With enough
iterations, we should expect that (# of points under curve) / (total # of points
tested) = (area under the curve) / (area of the box)... or some reasonable
approximation thereof. Sounds pretty neat? It is. Those of us who weren't really
fans of statistics are probably going to think that selecting points at random is
slow and doesn't make much sense. In theory, you could just sample on a regular
grid and it would be faster, and make a lot more sense. Well, that's fine for an
integration problem as small as this one. Where Monte Carlo sampling really
shows its strength is in its scaling. When moving up to larger problems and
higher dimensional problems, random sampling gives you better results with
fewer samples. Even more important than the scaling is the fact that by using
random sampling, we actually decrease the risk of losing information. One,
because structures themselves can be irregular, and two, because a regular grid is

fixed, any information that you miss will always be missed. The problem with a
perfect grid is that, you end up with axially-biased results -- and this bias does
show itself in the form of aliasing artifacts. This same problem showed itself
with hemicube form factors at a low resolution. As you might expect, though,
random sampling is NOT biased. Which is why we love it so much.
While I'm on the subject of speed with Monte Carlo integration, I should bring up
one more thing. When we think of random in computer world, we really think
pseudo-random number generation. And in fact, for Monte Carlo methods, a
PRNG isn't really the best choice. In fact, pseudo-random numbers are actually...
a little too random.

On the left is a scatter plot of sample points from a pseudo-random number


generator. On the right is the same number of sample points generated by a
quasi-random number generator. Notice that with the PRNG, we have many
regions where sample points "clump" together, and many areas where sampling is
very sparse. This would cause some undersampling in some areas and
oversampling in areas where it may not be needed. By contrast, the QRNG
sample space is fairly even all over the box, and almost every local region
throughout the sample space is well-covered. It should be quite obvious from

inspection alone that using a QRNG over a PRNG provides a far better
convergence rate for Monte Carlo methods (1/N convergence as opposed to
1/sqrt(N)).
The sight of that QRNG scatter plot should hopefully light a bulb in your head.
It looks almost like a grid, but it's uneven. Actually, the popular term is a grid
that's been "jittered." While it isn't really a jittered grid, jittered grid sampling or
"stratified" sampling is also an equally popular sampling method, and often a lot
faster to generate than QRNs... On the other hand, a spherical grid is easy to
generate and any old random-number generator can be used to jitter that grid.
Although, again, the preferred grid generator would be a geodesic grid, to
prevent oversampling at the poles.
So how does this work in a rendering context? Well, the first difference is that
we don't have a simple yes/no question like we had with the 2-d area example.
Instead, we have lighting in so many places and in so many directions, and in
each of these directions, we're dealing with amounts of lighting. So we're
actually sampling values that are eventually summed together. Notably, we're not
performing a simple 1-d integration to compute area, but actually integrating over
a surface area in 3-d space (and maybe even in 4-d, although, that's a later topic).
The other big difference is that there is known importance to certain sampling
directions. We know that light striking a surface at very sheer angles is not likely
to have significant effect unless the source is of very high intensity. And that
raises the third difference -- the function we're integrating is not explicitly
known, which is why we use the word "sampling" as opposed to "evaluating".
Tracing rays again... and again... and again and again...
Just because a basic Whitted raytracer didn't solve anything but direct lighting,
doesn't mean that we must immediately dismiss raytracing altogether. The thing
about raytracing is that is based on singular unidirectional tracing of a single ray
bounced around the scene for each and every pixel. Yes, I know, there's multiple
rays per pixel if we take antialiasing into account, but that is essentially not all
that different from having 1 ray per pixel in a very high resolution image. But
that's what Monte Carlo methods are for. Rather than considering 1 ray bounce
at every intersection between a ray and a surface, we consider a whole bunch. A
whole... quasi-random... importance-weighted bunch. And now, we have a new
name for it -- Distribution Ray Tracing. In older literature, you might have also
heard it referred to as "Distributed Ray Tracing", but the only thing I don't like
about that name is that it sound like a ray tracing implementation for a distributed
computing platform. In other cases, it might also simply be called "Monte Carlo
Ray Tracing" or "Stochastic Ray Tracing."

Anyway, if we think about it, distribution raytracing and hemicube radiosity (or
one pass of each) aren't all that different. Hemicube radiosity is much like the
unrolling recursion phase of distribution raytracing -- when all the results we've
calculated come back to the point we're trying to solve for and get summed
together. The difference is that we're not assuming a single simple surface
model. Now that we have control over the sampling ranges, we can distribute
them according to an arbitrary BRDF for the surface. And this means we have a
nice simple solution for all kinds of light transport. Yes, indeedy. And if you'd
read my weighted color accumulation article, you'd also realize that if you don't
want to try and complicate your implementation with arbitrary sampling
distributions, you can instead weight your samples and simply reject samples that
are too low in
weight.
The diagram on
the left should
give you the
general idea.
We start from
the camera...
Shoot a ray
from the eye
through some
pixel. It strikes
a surface, and
that surface has
some BRDF to
it, which means
that we have
some distribution of rays to sample along. The sum of the lighting computations
that we acquired through all these rays will be the result for this pixel. Anyway,
some of these rays we shot out will strike other surfaces, some of them will do
nothing, and some of them may even go straight to a lightsource (which isn't
really in the diagram). Those rays that struck another surface? Well, they spawn
their own distribution of extra rays bouncing around the scene, which can in turn
spawn another set of rays. As you can probably guess, this can ultimately
amount to a lot of rays through the scene.
Those gray lobes represent our BRDF-weighted range of ray distribution
(although, in reality, lobes caused by specularity would be centered about the
reflection vector). Depending on how finely specular vs. diffuse a surface is, this
lobe may be tighter. For instance, a glossy surface would have a medium-width

lobe to it. A diffuse surface would have a perfect hemisphere for its lobe. A
perfect mirror would have no lobe at all (or more accurately, a lobe so tight that
all samples go in one direction). Anisotropically specular surfaces would have a
lobe where the width may be wider in one axis than in another.
-- On a side note, for diffuse surfaces, we often like to have cosine-weighted distributions
rather than perfectly even spherical around the normal because sheer-angled light is
assumed not to contribute very much. A simple way to generate these at random is to
first generate an arbitrary random vector (which can be in any direction)... Normalize this
vector so it is now a random direction on the unit sphere. Then add the normal vector of
the surface, and re-normalize the resulting vector. The only degenerate case would be those
where the initially-generated random vector on the unit sphere happens to be close to or
equal to the opposite of the normal vector. Those cases can just be thrown out for a
repetition. --

Interestingly enough, this simple, albeit very computationally costly, variant on


raytracing can model all types of light transport and can handle both area and
point lightsources. Obviously, though, some types of transport take more effort
than others. Glossy surfaces don't require very many sample directions at all to
get a good result. A lambertian surface, otoh, just might require a lot depending
on the scene. A mirror, of course, only requires one sample, but a foggy mirror is
another matter. Then of course, there's caustics which, in general, require
hideously many samples to get good results. Normally, refraction only involves
2 samples since it spawns a refracted ray and a reflected ray, but that will change
if one intends to implement wavelength-dependent refraction. While it does
involve a little more work than normal to handle participating media and
volumetric scattering, it is possible to handle this as well. But again, while
participating media does not require very many rays at each sampling location in
the volume, a great number of samples need be made to get a good result (i.e.
*very* high recursion depth), which in turn means that many total rays are
sampled in the scene. In practice, though, nobody likes to take the exhaustive
route to participating media and instead use bidirectional methods, which I'll get
into later.
Even more interesting is the fact that the distribution doesn't just have to be going
through space in accordance with the BRDF. For instance, depth-of-field effects
are generally achieved by starting with a distribution across an imaginary lens.
Yet another trick we can play with is motion blur. In a scene with a moving
camera or moving objects and so on, we can achieve an excellent motion blur
effect by not only distributing the rays in space, but also in time. Randomly
selected times also mean that objects are in a different position relative to the
viewer which means that the same directions will behave differently. As you can
probably expect, though, it's infinitely slower to do all of these things in addition

to already monte carlo sampling in space at every intersection. That's just part of
life.
Well, it sounds nice and all in spite of the very slow render times that plague the
process... but there is one far larger problem that plagues Monte Carlo methods
including distribution raytracing -- noise. Because of the fact that we can't
guarantee that adjacent pixels will sample the similar regions of the scene with
similar weighting, and so there is always a little bit of variation. The obvious
answer is to simply sample a lot more directions at every intersection and perhaps
also recurse further. If you can afford the extra time to render, then by all means,
this is the best approach. But the thing to watch out for is the fact that this noise,
while it may be glaringly obvious to the eye, is often actually very small
numerically. And with very small numerical differences, that most likely means
that your sampling coverage is already such that the convergence is pretty far
along, and will therefore be in a range of very slow convergence rate. The
earliest samples (i.e., the primary rays) make the biggest difference. There are
also things we can do to the actual image to help smooth the result. Similarly, the
samples themselves can not only be directionally biased according to the BRDF,
but also be weighted when averaging the results so that every pixel has a
somewhat more even sampling with respect to direction -- sort of cheating away
the unevenness. In the end though, distribution raytracing, while it can do just
about everything, the real future will lie in better performing algorithms.
Entrusted to a hobbit -- One equation to handle it all.
Ummm... well, that may be pushing it, but it was still a really nice and useful
paper, albeit not completely embraced in a hurry by the academic community.
I'm referring to a classic written by James T. Kajiya, which appeared in the
SIGGRAPH proceedings in 1986. The famous "Rendering Equation" paper. The
rendering equation is not so much a magic universal equation that handles all
types of rendering, but rather a metric of rendering schemes. It's what rendering
schemes aspire to solve... It does not innately handle all optical phenomena, but
rather, is written and used under the assumption that said optical phenomena are
modeled by the algorithms. In short, we use it to measure up graphics models
against each other.
The rendering equation, using Kajiya's notation is expressed as follows --

I(x,x') is the intensity of light transported between the arbitrary points x and x'.
g(x,x') is the visibility between x and x'. If the points are occluded from
eachother's view, then this function will be zero-valued. Otherwise, it will vary
with the inverse square of the distance between the two points.
epsilon(x,x') is the total potential amount of light transfer emittance from point x'
to point x.
rho(x,x',x") is the scattering term or the BRDF/BSDF term with respect to x' and
x". It is the intensity of light scattered towards point x from point x' which
arrived from point (or along the direction) x". That is, if there was light reaching
x' from x", how much of that light reaches x?
S is quite simply, the scene. The integration taking place over the space of the
scene means that we are evaluating this for all points on all surfaces in the
scene... or we'd like to, anyway.
If that all looks confusing, I'll simplify -- What this equation says is that the light
transport intensity from x' to x is equal to the sum of the light emitted from x'
towards x and the total light coming from everywhere else in the scene that was
scattered, reflected, or emitted towards point x', and then from x' to x. And in
fact, radiosity, for instance, can be derived from this by reducing to all cases
where reflectivity and/or scattering is a constant.
We all walk our own path
In Kajiya's paper, aside from the equation itself, he brings up a new method for
solving the rendering equation -- what he called "Path Tracing". If we consider
all that sampling we do with Monte Carlo methods as analogous to "searching"
through the entire space of light bounces that might have occurred, then we have
an interesting comparison here. Distribution raytracing can be thought of in
terms of a breadth-first search of illumination space. And this makes sense -- at
every bounce we're sampling in several directions; not unlike a search tree that
branches widely at each recursion depth. Path tracing, on the other hand, can be
thought of much like a depth-first search... almost, anyway. Kajiya describes it
with the following diagram --

The digram on the left is representative of distribution raytracing. And the one
on the right is indicative of path tracing. Note that the selection of which path is
really arbitrary. And this is where I really find the simplicity in path tracing -- we
just select any old path to sample along. Okay, I'm going overboard, yes... We
can't forget that we are sampling -- we still have to remember that we can save a
lot of time and unnecessary effort by importance sampling.
Monte Carlo Path tracing, in spite of everything else you might hear, is still pretty
much the premier, state-of-the-art rendering scheme out there (albeit subject to
many further optimizations beyond the normal. While Jensen's photon mapping
is a great performer that produces great looking results, scaling to larger scenes
will put it noticeably behind MCPT in both performance, results, and especially
memory usage (of course, in Jensen's implementation, a distribution raytracer is
used in the final gathering pass, but a path tracer can be used as well). This is the
case primarily because path tracing, like ray tracing, begins as a pixel-scanning
process, as opposed to photon mapping, which works in 3-d space. In the
process of MCPT, the first hit is really handled pretty much the same as with
distribution ray tracing in that we randomly select directions to sample for
indirect lighting. But that's where the similarities end. Remember that this is
path tracing. In a pure path tracer, we'd only consider one stochastically selected
sampling direction at any point (including the first hit) but shoot several initial
rays through each pixel, averaging all the results. In a more generalized Monte
Carlo path tracer meant to handle more situations of light transport, we shoot
fewer intial rays (enough to get some decent antialiasing), but sample several
stochastically selected directions at the first hit for indirect lighting. The main
idea behind path tracing is that secondary rays have less effect than primaries,
and the further rays have even less. So the difference is that as those random

samples intersect other points in a scene, we only consider its direct lighting and
one stochastic sampling path from that point.
This means that when sampling, we consider two ray types. The "shadow rays"
and the "indirect rays." The shadow ray is the one with which, direct lighting for
that sample point is computed. This ray is sent to a randomly selected point on
an area light surface and then treated as if a point light. If the light happens to be
a point light, then of course, the shadow ray is a direct vector to that point. If
occluded or refracted away from the light, then we receive no direct lighting at
that point, but could receive indirect lighting. And that's what those "indirect
rays" are for. We send out quasi-random or jittered-grid rays to bounce around
the scene, and at each bounce we consider that point's direct lighting (with
shadowing) and one indirect ray and the overall sum of inputs from all these
paths is the indirect lighting.
Perhaps another visualization is in order.

Okay, so... trying to


explain from the diagram.
From the camera, we
shoot a ray through a
pixel... or rather, some
part of a pixel. This is
only a single sampling
iteration, and we would
actually consider a few
stochastically chosen rays
through the space of each
pixel in order to prevent
aliasing. That ray we've
shot out hits a surface at
point y1. At point y1, we
calculate direct
illumination using
whatever BRDF that
surface happens to have.
Next we shoot a ray in a
stochastically chosen
direction from y1. Now
as I said, we would
actually choose many of
these at y1 and only at y1
because it's our first hit.
For the sake of clarity,
though, we'll continue
through the recursion of
only one of these paths.
This random ray happens
to hit some other surface
at point y2. So we
calculate direct lighting at y2, checking if the ray to the light is occluded by
anything. Since this ray we shot out is an indirect lighting sample, its actual
lighting contribution with respect to y1 will be dependent on the BRDF of the
surface we hit at y1.
Then from y2, we select another random sample... this time only one. And like
we did before, this random ray will probably strike some other surface (at y3, in
this case). And as expected, we calculate direct lighting with shadowing at y3.

And as expected, we can keep going recursing as far as we feel like doing direct
lighting at point after point of rayhit. When we feel like stopping, the very last
sample will be just direct lighting.
What we're essentially doing is tracing through the path of photons that get
reflected to the eye through the pixel y0. Starting from the eye instead of from
the lightsources, really means we're "back-tracking" along that path. At each
point, we're considering both the in-scattering and adding them together to get an
estimate of how much light would get scattered from the surface along the
direction we came from. Since each surface has its own BRDF or BSDF, there is
only so much probability that scattering can take place in a particular direction.
This goes not only for in-scattering, but out-scattering as well. For example, the
sample ray shot from y2 to y3 has a certain probability of actually being selected
in the first place. This is how we weight the in-scattering. The further inscattering from y2 to y1 is also weighted in that respect. However, the
probability that the direct lighting at y2 will scatter to y1 also needs to be taken
into account. That's the out-scattering factor, and it's simply handled by scaling
down the direct lighting factor according to its relative probability. Of course,
actually computing this weighted direct lighting is trivial in the sense that it
would be the same as computing direct lighting at y2 as if y1 is the eye. I only
bring it up in this manner because it is a more accurate way of looking at the
problem.
In a purely diffuse scene, like you'd simulate with radiosity, the computations are
quite simple. Every sampling direction is cosine weighted and every scattering
direction is of equal probability. At any given point, the out-scattering factor will
always be 1/pi.
I'm sure it still sounds pretty slow. And yes indeed, it is slow. This is not
realtime rendering here, but at the very least, there is a lot we can do to make it
faster... and I'll try to get into that, too. It's certainly faster than distribution
raytracing simply because you're not sampling countless rays at each sample
point. And as mentioned before, it scales very well with the scene size and
complexity.
In fact, in the very most straightforward implementations of path tracing, you
would randomly decide whether to shoot an indirect ray or perform direct
lighting when you sample (and of course, stop and unroll recursion when you
happen to decide on direct lighting). What I've actually brought up is "path
tracing with direct lighting", which is one of the things that can improve the
convergence rate of path tracing with fewer total samples per pixel. If we refer
back to the rendering equation, you'll see that there is an emittance term as well

as a scattering term within the integral. The emittance term is our direct lighting
factor. And the scattering term is the indirect lighting factor. Notably, in the
integral is another instance of I, which in turn has its own emittance and
scattering terms to take into account. Raw path tracing accounts for both because
of the fact that the recursing can stop randomly.
And for the part you've all been waiting for... a sample path tracing image --

This is one of the reference images Henrik Jensen used to compare his photon
mapping results with path tracing results. This particular image was apparently
rendered with 1000 samples per pixel, although it wasn't made clear how many

samples were initial eye rays through the pixels and how many were stochastic
indirect lighting samples at the first hit. Notice there is some noise throughout
the image. While this is actually a pretty noticeable amount of noise, it's not
insurmountable with some good image post-processing either way. As you might
expect, the ceiling has the most noise of any other part of the image, and this is
because it receives no direct lighting. All of its illumination is indirect and so the
smoothness of the result is dependent on convergence of the indirect sampling to
produce smooth results.
Knowing what's really important...
There's plenty we can do to speed up Monte Carlo methods. Probably the first
one that people want to implement (although most likely the most involved) is
importance sampling. The idea is simply to direct more sample rays towards
directions that will make a bigger difference. What makes it complicated is that
it's not always obvious when, where, and what is really important. Now,
normally, if you implement BSDF-weighted sampling, that is importance
sampling, but not all there is to it. BSDF-weighted sampling is still good enough
and general enough to cover all cases, though, so I (as well as the majority of
others) don't advise going further unless performance is really important to you.
There is such a thing, though, as specializing the importance to the scope of a
scene -- or scene-level importance(as opposed to point-level importance). In fact,
this is one of the places where photon mapping can be used to accelerate other
Monte Carlo methods. If you look back at the Jensen sample pic, you'll notice
that for the most part the lighting and shadows are smooth everywhere. The
exceptions are those two caustics on the floor and blue wall. Which suggests that
in areas near the objects that cause caustics (refractive and reflective), we need to
sample towards those objects more than we normally might. Caustics are a highfrequency lighting phenomenon for the most part, so tight sampling is necessary.
We could cover for it by just taking a lot of samples at the first hit, but that
would be too costly. Of course, then that raises the question of when we should
even bother biasing our samples. It seems like the larger brighter caustic happens
to fall within the umbra of the glass ball's shadow. That's a simple enough test to
determine, but then it doesn't explain the small spot caustic on the blue wall,
which is actually caused by reflections off the reflective ball that get refracted
through the glass ball. In fact, that caustic doesn't even appear to be in the
penumbra. The umbra test would prove rather fruitless in this case. For that
matter, if you can imagine filling the Cornell Box partway with turbulent water,
every point below the water surface would be in the umbra of its shadow, so
every point below would need some heavier sampling towards various points on
the water surface. Unfortunately, since we're talking about hypothetically filling
the box with water, the water surface would occupy the entire width of the box,

so how tight the sampling can get is rather questionable. It also doesn't account
for the fact that there will also be reflective caustics above the water surface.
Similarly, a reflective caustic, like the famous cardioid caustics caused by ring
objects is a completely different case. We could say that points within the inner
radius would be subject to caustics, but that doesn't really work in the same case
if the reflective ring were raised above the ground surface. Again reflective
caustics can include more unique cases like a spot of light on a faraway wall that
comes from sunlight reflecting off of a watch. It's a lot tougher to say that there
is importance in those cases.
In short, importance sampling is one of those things that we can say is quite
obvious by human intuition, but not so obvious to quantify. Technically, using
BRDF/BSDF weighted sampling covers all the ground that classifies importance
sampling since it uses a probability distribution function to control the sampling
rather than using a purely unbiased sampling (in most cases, though,
BRDF/BSDF weighted sampling is all you really ought to do). In any case, it is
much easier to use an unbiased sampling, and moreover, it always takes more
total samples to make a biased sampling look like it is unbiased. In reality, a
biased sampling relative to a point can still be unbiased relative to the actual
scene. This is why, for instance, Photon Mapping separates irradiance, volume,
and caustics into separate passes and photon maps. In each pass, photons are
exclusively shot to the areas of the scene that need it.

25 samples/pixel

125 samples/pixel

625 samples/pixel

All the images amove have 5 initial eye rays for antialiasing, and so 25
samples/pixel has only 5 indirect sample rays at the first hit. Notably, because of
the importance sampling, even the very low-sample count examples still look
fairly decent, and caustics are already showing up as early 25 samples per pixel.
For the most part, it doesn't really make much of a difference if we made all the
25 sample rays initial eye rays or if all the 25 were secondary samples -- the latter
will show aliasing artifacts, but that's about the only difference we might see on
the whole. The reason I chose, for this scene, to do only 5 eye rays per pixel is
simply because I only cared to have enough eye rays to battle aliasing. The other
thing was the camera angle was such that you can't really see very much direct
lighting, so I simply placed more weight on indirect lighting samples. Either
way, I would have gotten a pretty good sampling radius assuming the importance
sampling was working nicely. But having them all be eye rays means that every
ray has to have the first hit computed for it. With more indirect rays, a large
percentage of the rays bouncing through the scene trace back to the same first hit.
So if nothing else, I did bring down my render times.
If you want a full-quality render, here you go -- 1536 samples/pxl, Render Time
27:31 on P4C 2.8 Ghz, with 6 initial eye rays, and 256 secondary samples.
Slow, but satisfying. That tends to be the story of life when dealing with socalled photorealistic rendering. But the fact of the matter is that we'd be a lot
worse off in speed if we chose to be completely unbiased. Then of course, we
have to hope and pray that our biased sampling doesn't *look* biased. Since
importance changes the probability of sampling in a particular direction, it means
that samples that randomly fall out of that probability would ignore the biasing.
But this means that in order not to appear biased, we need to have the
probabilities set pretty much juuust right. Either that, or just use lots of samples.
Sometimes, it's good to shoot yourself
Aside from importance sampling, there are other techniques you can use to save
computation time and improve overall performance. Much as it might be
counterintuitive, Russian Roulette is one of those techniques. Russian Roulette is
also one of the much easier techniques to implement, and is often advised,
especially when working with techniques where more samples not only increases
computational cost, but also memory usage (e.g. photon mapping, irradiance
caches). Russian Roulette is a method by which you randomly decide whether or

not to continue recursing further down a path. It can save you a lot of
computation time by reducing the average recursion depth of paths.
It especially proves valuable in photon mapping where you can greatly reduce the
number of photons shot and the number of photons stored. Which in turn saves
you a great deal of time when balancing the kd-trees and in final gathering.
In general, the most preferred method for Russian Roulette is derived from Eric
LaFortune's 1994 paper - "Using the Modified Phong Reflectance Model for
Physically Based Rendering." In this one, a random scalar (f) is generated, and
compared against the luminance. If f is greater than the sum of diffuse and
specular luminance, then the sampling is stopped. If f is greater than diffuse
alone, then a sample weighted by importance with the specular model alone is
generated. Otherwise, you generate a sample importance weighted by the diffuse
model alone. Of course, some basic depth limiting is also applied so that we
don't recurse infinitely. The main advantage of this particular method is how
much better it integrates with respect to the existing BRDF-weighted importance
sampling and adapts to local surface conditions. Overall variance is reduced as
compared to basic random yes/no.
Superman is not here
If unaware of some of the history behind the nomenclature, it can be kind of
amusing to notice how all these processes are named after cities. Monte Carlo
methods, Manhattan distance, Las Vegas algorithms, etc. And in this section, I'm
going to be talking about Metropolis light transport sampling. And in case you're
wondering, this is named after Nicholas Constantine Metropolis (who, in fact,
coined the term "Monte Carlo" methods, while working on the "Manhattan"
Project). Metropolis sampling can be thought of as a further step forward from
importance sampling. The idea being that you take some path that you're already
sampling and alter it, and if that altered path is important, you keep it as a formal
sample, otherwise, you forget it.
Now doesn't that sound easy? Well, that's because you're crazy. It's damn hard!
The idea of taking the probability that a path makes a significant contribution
and thereby choosing whether or not to keep the path is the easy part. The hard
part is developing a good mutation strategy. One of the key points of Monte
Carlo methods is that the variance keeps decreasing with more samples, but that's
because all the sample directions we look at are independent of each other. In the
case of Metropolis, you have a whole lot of correlated samples, which causes
higher variance and higher bias. So the important part of this is to choose your

mutations judiciously. Bear in mind that you're not changing a path along which
to integrate, you're generating a variation of the same path integration.
If your acceptance probabilities are very low on average, then you don't end up
keeping many variations except maybe exceptionally long ones, which generally
leads to multiple paths affecting the same pixel on the image -- what does that
look like? Noise. Similarly, we also don't want to have our mutations be too
small. Otherwise, we end up oversampling a few lights or a few interreflections.
Ideally, we'd want to create mutations that result in, for example, changing from
a specular reflection to a diffuse reflection, or an increase or decrease in path
length. Let's take an example -- We have a path made up of a series of QMC ray
samples that happens to go through the points A,B,C,D,and E in the world (A is
the camera). Now we do a simple mutation by adding a vertex "z" in the path
between C and D. So now the new path is A->B->C->z->D->E. Now if this new
variation is accepted, it's another value summed up in the path integral up until
point C in the path. Similarly, it's possible to have another point in between
points A and B, which results in the same path integral being used for part of the
lighting solution for another pixel on screen, if such a path mutation is possible.
Other possibilities include bidirectional mutation strategies where subpath
sections are joined together to create a new path variation. While we cannot alter
path vertices when you are dealing with absolutes like perfectly specular
reflection or refraction, we can alter other sections. In a more naive sense, you
can attempt to alter a path at locations following a specular reflection, but you'd
find that the BSDF yields a single valid scattering direction, and find that the
acceptance probability of any variation is absolutely zero, so it gets rejected
anyway.
The main thing that makes this modification of paths so fast is the fact that when
you go through path tracing steps, is that path integral is known up to each point.
At any given point as you start unrolling the recursion down a sample path, you
have the path integral computed up to that point, so a variation in the path such as
adding or removing or replacing a vertex adds an extra term and/or removes an
existing one up to that point in the path. Because you only have a single
weighted lighting computation per mutation that you actually keep, it's fairly
cheap. In essence, what Metropolis Light Transport is, is an interesting and
inexpensive way to take path tracing... reuse paths as much as possible... and
turn it into distribution raytracing. In the end, this means that for each path we
do try to take, we get more than one effective path actually sampled, and more
paths traced means less noise... in theory, anyway. One of the weaknesses is that
it still has inherent noise issues in darker regions. The end result of having all
these correlated sample paths is that we don't go down in noise as quickly as we'd
like.

Diagram time, again

-Now, consider the picture above. We've got our eye, our image plane, some
objects and a lightsource. The blue path represents a single path considered in a
simple MCPT renderer -- like the MCPT w/ direct lighting diagram, we'd
probably still be considering direct lighting at these sample points, but for now,
I'm concentrating exclusively on the path traced through the world... the last step
of the path would be the last point where direct lighting is done, so I included a
last-point-to-light vector. The dashed red arrows represent some sample
mutations. Let's say our original path is going through the vertices Eye->A->B>Light... now in one of the mutations, we've removed the vertex B. So it
becomes Eye->A->Light. In practice, this sort of path would be rejected because
it's really just a simple direct lighting sample for the pixel. Even if we were not
doing the direct lighting anyway, we'd reject this because the stochastic
choosing/not choosing of direct lighting introduces too much high-intensity noise
-- it's much better for the sake of noise to have explicit direct lighting than
implicit -- implicit direct lighting has its place from a theoretical standpoint for
extreme cases of indirect lighting and also for statistical studies of variance in
specific contexts, but not if you're trying to shoot for render quality above all
else. In another mutation, we considered adding a vertex v, so we now have Eye>A->v->B->Light. This is a much more likely one than the former, and
assuming the BRDFs of these objects says it's okay, we'll most likely use this
one. The thing that makes it cheap is that we can take our original path and
remove a ray sample (the A->B sample) and insert two new ones (A->v and v>B). The sampling of Eye->A and B->Light are already done. Another
possibility is that I could take A out of my path, and instead end up with an Eye>v->B->Light path. Nice thing about doing this is, all these previous
calculations tracing back along that path from the light up to vertex v can now be
used as a contribution to another pixel in the image.

Bear in mind that this removing and adding of vertices is just one of many
mutation processes described in Veach's thesis. The extra case I mentioned of
taking vertex A out of the path and using the altered path on a new pixel in the
viewport is another one (referred to as lens subpath mutations). Among others
are bidirectional mutations, which are good for further exploration of the sample
space. Lens perturbations, caustic mutations, multi-chain mutations, etc. There
are even strategies that effectively replace the whole path. Technically, if you
limit the path segment replacement such that the very last vertex is always the
very last vertex (and it is not occluded from the light, then this accomplishes the
same goal as bidirectional mutations, and some people do implement this instead.
This is why, even though it's a relatively old paper, there are still hardly any full
implementations of MLT that achieves everything that the original paper by
Veach portrayed. As of writing this, to this day, there are only 2 or 3 full
implementations (and that's including Veach's own implementation), along with
dozens and dozens of partial implementations where someone handled a few
mutation strategies that cover a few key problems that MLT is meant to solve in
less computational time. To date, though, nobody has really met the performance
levels that Eric Veach achieved in his renderer, although many come close. Many
people look at MLT and think it's a perfect solution for the performance issues,
but the actual time you'll spend on renders won't be anything miraculous. Okay,
you might spend a day to render an image that would otherwise take 2 days with
MCPT, but you're not going to go all the way down to minutes and seconds.
No! They're particles, not waves!
Ah, yes... here we go coming down to everybody's favorite. The thing that made
Henrik Wann Jensen a hero to us all -- Photon Mapping. Yes, it's a damn fast
method, and it's often believed that photon mapping is the end-all be-all catch-all
rendering algorithm and it does everything short of helping you lose weight.
Photon mapping is a global illumination algorithm which is derived from the
older particle tracing technique, which itself is just a further extension from
forward raytracing. I should note that occasionally, people in various books get
forward and backward raytracing mixed up simply because doing forward
raytracing is not feasible in practice, so backwards raytracing is the only type we
usually perform.
The idea behind photon mapping is to store where light hits surfaces as it
bounces around through a scene and use density estimates to get a smooth
reconstruction of lighting. Now a direct visualization of these density estimates
will result in something with a lot of low-frequency variance. And while it's easy
to get rid of high-frequency noise with some post-processing, it's very hard to get
rid of lower-frequency noise. Well, in order to deal with this, we actually get

indirect illumination by going through a final gather pass and getting irradiances
from many places in addition to the usual direct lighting. In Jensen's own
implementation, this final gather is a distribution raytracing pass, but a
pathtracing or MLT gather pass is perfectly feasible, and in fact, many 2-pass GI
algorithms use a final gather pass this way. Now everybody accepts that photon
mapping is very fast... and the fast part happens to be bouncing the light around
the scene. Unfortunately, we can't really say the same about the final gathering
and the density estimation. If not for this, a 25-minute photon map render would
instead take about 25seconds.
In general, the density estimation is done by searching for the k nearest
neighbouring photons from a point and seeing how large of a sphere can contain
all these k photons. Well, normally we store these photons in a kd-tree to make
this k-nearest-neighbours search easy. A kd-tree is not that different from a BSP
tree except that instead of using real planes from the world geometry, you use
imaginary axis-aligned planes, and the axis alignment of the plane you choose is
a given as a function of the depth in the tree (normally, anyway, though this isn't
a requirement). Well, to make sure that this search is also reasonably fast, we
also have to balance the kd-tree. This operation, as it so happens... does not run
reasonably fast at all. That kd-tree is storing hundreds of thousands if not
millions of photon strike locations. And because any planar split is
deterministically aligned with respect to its depth, we have to maintain that
feature of the tree when balancing. So it's not quite as easy as balancing a BSP
tree because we have to rotate three ways at a time. It proves to be incredibly
slow in the end. While there are other options for the data structure, (e.g., loose
octrees, BVTs), kd-trees are generically the most often chosen because that is
what Jensen himself used. While I have often found in practice that normal
octrees are infinitely easier to work with and it does save a reasonable amount of
render time, it is also very prone to aliasing issues, and it also doesn't scale quite
as well with photon map resolution -- while you have the option of only
subdividing down to x photons per node, this means you have an exhaustive
search inside each node, and even if you allow the subdivision to go extremely
deep, you run into precision and memory issues.
Then we come down to our final gathering, which is another slow part. Now
certain things we can directly visualize -- like caustics. This is partly because
caustics are only visible on primarily diffuse surfaces. But even more so because
caustics are such a high-frequency lighting phenomenon and the little bit of noise
we might see is something to expect even in the real world. Also, it's very hard
to get a good convergence rate on caustics with regular Monte Carlo sampling,
and that's why photon mapping tends to separate caustics into a separate caustics
photon map which is directly visualized with a raycast and density estimation.

That also means that in order to get good covariance characters within a caustic,
you need a LOT of photons to be mapped within small spaces to get some
halfway decent idea of the overall form. It's perfectly normal even to have the
caustics map carry more than twice the total number of photons as the global
photon map.
Now one of the bad things about having this separation into a caustics map, a
global map, a volume map, and so on is that you end up having to do that many
more passes to support more features, which greatly takes away from the overall
speed advantage of photon mapping in the first place. One of the really great
things about it, though, is that it makes photon mapping itself very flexible in
how you choose to use it. In fact, before Jensen published the idea of using
photon mapping by itself as a global illumination estimator, he had used it to
build importance maps for a scene when performing other Monte Carlo
illumination algorithms. One of the things that is done in the global photon map
is classification of the different types of photon strikes -- that being, direct
illumination, indirect illumination, and shadow photons. Direct illumination
photons are generally the obvious ones where a photon's strike on a surface
happens to be the first in a stochastically selected path from the light. Beyond
that point, any photon strikes along this path would be indirect illumination
photons. Shadow photons are not really true photon strikes per se, but a followup
along a direction where a direct illumination photon strikes an object. These
shadow photons hit the world at a "would-be" point if the object that the previous
photon hit wasn't there. The reason we have these shadow photons is for a few
reasons. One, the fact that we use density estimates to get irradiance estimates,
having a lack of photons is not enough to say that we're in shadow and the end
result would be an artificially soft and slightly overbrightened shadows. The
other thing (and this helps us in the speed department), is the fact that generally,
you will see portions of a scene that is either entirely in shadow or entirely
illuminated, so having the shadow photons helps reduce the number of shadow
ray samples we need in the final gather pass.
Well, going over with a diagram or two... We have a light and a scene. We trace
thousands of rays from the lightsource and bounce them around throughout the
scene. The blue rays would be our direct lighting rays. Each of these carries
some amount of energy (luminance,color) with them. These will be
stochastically bounced around within the scene generating indirect ray samples -the red rays in the diagram. Obviously, the red rays would have some different
energy based on the color and BRDF of the surface it hits. Wherever a ray hits,
we store info about position, incident direction, energy, surface normal, type, etc.
Actually, if using a kd-tree, the position can be determined through the tree
traversal. Also, one of the optimizations involving irradiance caching means we

can also store a local radiance estimate to speed things up, but that's another
matter. Of course, our tracing of indirect rays applies the usual Monte Carlo
sampling principles like BRDF/BSDF weighted sampling, Russian roulette, and
so on.
The black dashed
rays are the shadow
rays. Where the blue
direct lighting rays
hit the sphere, we
shot a shadow ray
along that same
direction until the ray
hit some other object.
Obviously, we don't
create any shadow
photon points for
shadow rays that go
outside the scene.
The points labeled U
and P represent
shadow photons
within the umbra and
penumbra (or at least close to the penumbra, if not in it). The main idea is that
around that point U, chances are that all the nearby photons are going to be other
shadow photons, and knowing that, we can assume that if a ray sample in the
final gathering phase hits the surface here at point U, we can guarantee that the
point is in shadow and doesn't need any shadow samples or direct lighting
samples. A reconstruction of the indirect lighting alone would be enough. Now
point P, on the other hand is pretty near the edge of the sphere's shadow, which
means that there is a potential for many of the nearby photons to include both
shadow and direct and indirect lighitng photons. Basically, if the local area
doesn't have enough shadow or illumination photons, or if the nearby photons
includes a mixture of shadow and direct illumination photons, then only we
actually take shadow ray samples during the final gather.
Well, that's the easy part... And also the very fast part. Shooting hundreds of
thousands of rays through a scene is pretty fast. Granted, this doesn't include the
caustics, although we could simulate very simplistic caustics without the aid of a
specific caustics photon map. However, for the most part, it's safer to assume
that there is some complexity and high frequency components that would warrant
the use of a caustics map. There's a bit of setup work involved in that, though,

where you would want to create a few lights that would direct all their photons to
surfaces that could cause caustics. For more simplistic renderings of caustics, we
can consider the approach of simply taking all photons resulting from LS+D
paths directly into the caustics map. As a general rule, the global photon map
contains hits along L(S|D)*D paths not including the LS+D paths.
The slow parts of photon mapping tends to be the organizing and ordering of the
kd-tree and the final gathering phase. Technically, it's not 100% necessary to use
a kd-tree, but Jensen considers this ideal for the speed of nearest neighbour
searches. There are cases, such as the limited GPU implementations of caustics
photon mapping where a kd-tree is simply out of the question. Also, there are
studies going into realtime photon mapping using very rough radiance estimates
with very few photons, where again kd-trees are a poor choice because the act of
balancing the tree is incredibly slow. Then there's the final gathering phase,
which, in order to deal with the sampling noise, uses distribution raytracing or
monte carlo path tracing or some other raytracing scan using the local radiance
estimates wherever secondary rays hit in order to compute the indirect lighting
component. Typically, these two portions put together can consume some 2-5
orders of magnitude more time than the initial photon tracing through the scene
(especially the final gather, which is resolution dependent).
Part of the reason photon mapping is so fast is because the initial phase is not
resolution dependent. Unfortunately, this is also the thing that makes it very hard
to scale up to large scene volumes. In cases where the size of a scene relative to
its lights is large, it takes many photons to get a detailed result. While few
photons and some artificial increase in apparent brightness is sufficient for lowfrequency lighting, it's not sufficient for parts of a scene which exhibit highfrequency lighting characteristics like caustics or reflective paths off of glossy
surfaces. It also means that anything within a scene that inherently carries some
high frequency surface details demands more photons to acquire good covariance
characters. For small scenes, this is usually fine. For large scenes with few
lights, it often doesn't scale well. Which is not to say that photon mapping isn't
of use outside small scenes, but to say that photon mapping by itself is not a
guaranteed universal win.
Where it serves its purpose exceptionally well is in filling an irradiance cache for
further computation. An irradiance cache is, as the name suggests, a map over
the scene that that caches irradiance information. The main purpose it serves is
to save explicit lighting samples by reducing them to data sampling problems.
While there are plenty of ways to fill in such a map, photon map sampling works
out as one of the better methods because of the fact that it not only gives us direct
lighting, but also indirect lighting, caustics, and shadow information. While an

irradiance cache wasn't part of the original design of a photon mapper, it's now
relatively a given for any photon map renderer as it's almost always a massive
performance win.

LS+D paths (direct visualization)

Global and Caustics photon map (direct


visualization)

Scene after final gather (render time : 40:21


minutes P3-1.2 GHz)

Final gathering shot -- caustics closeup.

This scene was rendered using ~820,000 photon strikes (200,000 photons shot)
stored for the global photon map and ~560,000 stored (485,000 photons shot) in
the caustics map. A high quality higher resolution (960x720) render is also
available. Unlike the renders above which have no gamma correction, this image
is gamma corrected for a gamma of 2.2. This render took 1:30:55 hours,
approximately 2.25x as long as the lower resolution (512x384) renders in the
table above, while being about 3.5x the resolution. This suggests that the biggest
chunk of time is in the construction of the irradiance cache and/or the final
gathering, both of which are resolution dependent portions of the renderer.
Indeed having somewhat over a million points in the kd-tree is probably mildly
above average for a scene of this size given a max recursion depth of 5 for every
photon's path. Notably, there are still errors on the far wall, and this is largely
due to the scene setup which didn't account for reflective caustics, and so it
shows up as smatterings of high photon density. Since the search range for the
global map was for the nearest 350 photons, the fact that we see small dots on the
wall suggest caustics-level density in that area, but not enough photons to cover
the problem. In practice, problems like this would probably be post-processed
away and treated as trash, rather than trying to rework the scene to suit the
particular problem. Only for academic demonstrations and studies do we really
care about things like that -- in most cases, the caustic on the floor provides

enough information to make a relatively convincing render, and it is the point that
inherently draws a viewer's attention.
In feature film cases, things like photon maps are often saved for specific
subtasks (usually caustics rendering more than anything else). Among the earlier
known cases was the first of the Final Fantasy movies : The Spirits Within
(though I guess some would argue it's more fair to call it the first CG movie to
carry the "Final Fantasy" name), which pretty much used only a caustics photon
map. This has been changing little by little over the years. Where 5 or 6 years
ago, film CG would almost never resort to any form of GI and even those that did
might at most pre-sample a single indirect lighting bounce and live with the
precalculated result for several frames, more recent films will use limited GI
solutions almost exclusively. Computing power has gotten cheaper, but we're
still a ways off.
There has been a great deal of talk in recent years about a resurgence of
raytracing and the realtime execution of GI approximations. A lot of groups,
particularly the Saarland RPU group, have demonstrated hardware prototypes
which show promising results. Especially given their limitations of being very
low clock speed and very poor memory bandwidth, yet still performing on parity
with mid-range single-core PC CPUs. In reality, we're probably a very long way
away, though I admit this rather begrudgingly as I am a raytracing psychopath.
In terms of raw computational power, a lot of existing GPUs are far enough
along to have some impressive results -- if the architectures were only suited to
the task. Raytracing by nature is recursive and relies a great deal on sampling
information from an entire scene, whereas GPUs are designed specifically to do
the exact opposite -- work on linear calculations with no self-dependencies
worrying only about a single scene element at a time. Even more importantly,
scene complexity is always climbing, and sampling data from the scene means
that a raytracer requires random access within large datasets, and modern
memory architectures are nowhere near good enough to handle this. This is
generally the limiting factor when large renderfarms do CG work for films -- the
computational power is there, but there is massive latency and hideously poor
bandwidth when moving data around. Even if a single PC acquires the
computational resources to do the work, we will basically be memory accessbound rather than CPU-bound, and unfortunately, memory evolves at an
unacceptably pitiful rate.
This Man... is an Imposter!
In spite of everything, there is always a move to try and get global illumination
effects to be feasible in realtime. It's still the holy grail of graphics, even if the

holy grail itself is an incomplete item. Of course, it's silly to think we can
actually do anything that ranks in the same league as any of the aforementioned
techniques. So what do we do? We cheat.
The most classical and obvious of cheats that I'm sure everyone has seen is the
constant ambient lighting component. The idea is simply to say that lighting
bouncing around in the scene makes everything in the scene just a little brighter.
And if you were to look at things like the progressive radiosity renders, it's
probably the most obvious conclusion to draw. But as we all know, it's also very
wrong, or rather, greatly oversimplified -- it occasionally looks passable, or
whatever "flatness" it ends up creating can be overcome by other forces. The
problem is that, in reality these sorts of interreflections that we're trying to
simulate are the result of the objects in a scene and where they are, what shape
they are, what their reflectance characteristics are like, and so on. Among the
nicer tricks out there include directionally dependent ambient. This is usually
done with a low-resolution cubemap, but it's also just as well using 6 colors
(which is more compact) for each axial direction. Probe points are placed in a
scene which precalculate samplings and average them to get a rough estimate of
light bouncing from every direction. The colors are blended using a surface
normal, and that gives you a directionally-dependent ambient lighting. Various
games have used this technique to varying success (Half-Life 2 being a good
example), and while it works nicely for dynamic objects moving about (e.g.
characters) in a scene, it doesn't help much for the scene itself.
The classical approach for scene lighting is basic lightmapping. This is handy
because higher quality lighting solutions can be applied and saved off in a
texture. If someone chooses to apply radiosity or photon mapping or lord knows
what on the scene, it can be saved off. Of course, among the obvious problems
with lightmapping include the fact that it is, by nature, low detail, and it is
secondly only applicable for static lighting and static scenery. While a good deal
of the detail aspects can be dealt with (for instance, using 3 lightmaps for each of
the tangent space directions and then blending between them based on the normal
map normal), the static lighting part is simply the nature of the beast, even
though there is nothing stopping you from additionally including dynamic
lighting on top of lightmaps. However, with technology constantly advancing,
more games tend to show that entirely dynamic lighting is technically feasible for
a lot of cases, and so the public at large tends to expect it of all titles (or at least
enough of it to fool them that the scene is dynamically lit). You might notice,
though, that you can't lightmap a character or some other dynamic object. Well,
nonetheless it works out nicely because the environment covers the majority of
the screen, which means that most of what the user looks at looks very nicely lit
(though not completely so).

Winning a great deal of interest these days tends to be the use of spherical
harmonics bases. Spherical harmonics are little more than a Fourier series
projected into spherical coordinates. Like the Fourier transform you're hopefully
familiar with, a sum of harmonics down the frequency range captures
progressively higher and higher frequency lighting components. While such a
transform can get pretty massive, the parts we're usually interested in are the lowfrequency components. The "constant ambient" term we're familiar with is
effectively equivalent to the constant term in the Fourier series here. In actual
implementation, spherical harmonics coefficients are captured by convolving the
basis functions with the lighting results. In the realtime application, lighting
information is transformed into the same basis space. This is what makes it very
convenient for realtime application in that the pre-process can apply whatever
offline solution as with lightmaps, but end up with per-vertex or per-pixel
evaluation of dynamic lighting given that lighting model (or at least a few
frequency bands of it). The most common choice is to use the first 3 orders of
the basis, which comes out to 9 coefficients to store either per vertex or in a
series of textures. Of course, as you might expect, this is byh nature, low
frequency, which means low detail, and because it depends on a prior offline preprocess, it is limited to static geometry. Though it can be used individually for
geometry that moves through the scene, it still demands that the object is rigid
geometry (i.e. must not deform in any way), and the results are also less complete
by nature than a scene result because you cannot simulate the full solution for
that object in the scene at any point. Of course, spherical harmonics are only of
the precomputed radiance transfer solutions in exists. There is recent research
on wavelets in the PRT domain as wavelets are generally able to cover more of
the frequency range with fewer coefficients. However, PRT in general still has
the same limitations of anything else that may be pre-calculated.
Other approaches in existence involve the idea of sampling PRT information at
grid points in space in a scene and interpolating between the coefficients as an
object moves about. This is the so-called "irradiance volume" approach where
you are essentially filling what is most likely spherical harmonics samplings for
various points in the volume occupied rather than specific objects. This is
actually quite effective in practice, but speed is still an issue when also being
combined with quality. Also, it's most particularly useful strictly for the indirect
lighting components. Using it for shadows and direct lighting often doesn't work
out very well in practice, but that is still acceptable. It remains to be seen though,
how well it integrates with all types of scenery, though. While such a technique
fits quite easily with indoor lighting problems, it is more difficult to effectively
author an ideal setup for outdoor lighting, particularly because outdoor scenery
is, by nature, open volumetric space. And this is the sort of problem that comes
up even if someone were to come up with some ideal solution that is also very

esoteric. It will ultimately not prove useful if it requires too much of content
creators.
In general, though, whatever we choose to do to try and simulate diffuse
interreflections globally is a good thing in the end results. Direct illumination
models alone cannot produce images of convincing enough illumination
characteristics to be of any use as a method towards photorealism. It's been used
to evoke certain imagery or serve alternative functions (e.g. hard shadows in
Splinter Cell provided hiding space). In the long run, there may be some point at
which the raw computational power necessary to explicitly compute a lot of these
quantities is available, but even by that point, it is more than 100% unlikely that
memory architectures will have evolved far enough. An often missing point in
the talk of "realtime global illumination" is that the amount of data to be dealing
with and the random access patterns is not a trivial hurdle to overcome. I am
often left wondering whether it will ever happen. Even as architectures evolve,
so too will the content, and it's entirely possible that memory will never catch up.
In Conclusion, Ladies and Gentlemen...
So I guess this is a very long winded way of saying that Global Illumination is
about illumination from all sources, emitted, reflected, scattered, etc. So if I
wanted to be really concise about it, I'd just say Global Illumination is direct
lighting + indirect lighting, but then you wouldn't learn anything, would you?
More importantly, "indirect" demands some definition to those uninitiated in the
ways of the pixel. As I've said before, I'm personally a fan of Monte Carlo Path
Tracing, not so much for performance reasons, but because it's so utterly simple.
It's not without its flaws, of course, which is why you may differ in your
opinion. Whatever it is, at least this document is just one source of information
by which you can form an opinion (assuming you don't already have some
knowledge on the matter to begin with).
GI is not something to fear or revere, really. It's just another example of us trying
to work out how to play digital god.

Vous aimerez peut-être aussi