Basic theory and algorithms of ray intersections and discussion of generating secondary rays.
1 Overview
Raytracing tries to solve the same problem as rasterization—that is, create a set of pixel colors to represent a mathematical description of stuff—but goes about it in reverse. Where rasterization asks the questions what pixels are contained within this object raytracing asks instead what objects are visible within this pixel?
Raytracing is currently slower than rasterization, though it admits almost unlimited amounts of parallelism causing some people think it will become faster in years to come. Currently its primary advantage is that the process does not depend on raster structure and so it can easily model reflection, transparency, and similar optical properties.
The speed of a raytracing system depends largely on the quality of the spatial hierarchy used. Conceptually, the idea here is to collect a group of nearby objects and find a bounding box for them; if a box is not visible from a particular pixel, none of the objects within it are either. Such hierarchies are not a topic for this document.
1.1 Primary and Secondary Rays
A ray is a semi-infinite line; it is typically stored as a point called the ray origin, ro; and a direction vector rd. Then the ray itself is the set of points {ro+trd∣t∈R+}, and the goal of raytracing is to find the point in the ray on the surface of another object which is closest to the ray origin (that is, with minimal t).
Ray tracing creates one or more rays per pixel. Those rays are intersected with objects in the scene and then, generally, several secondary rays are generated from those intersection points, and intersected with the scene again, and then more are generated, etc., until you get tired of shooting rays, at which point you do some sort of direct illumination and call it good.
There are a number of ways to generate rays from pixels. For all of them we need the camera position and orientation, given by the eye position e and the forward, up, and right directions f, u, and r. The ray origin is generally just e. The most common choice for ray direction is designed to replicate the standard rasterization perspective look: given normalized device coordinates for a pixel (x,y), its ray direction is fcos(θ)+(xr+yu)sin(θ), where θ is the field of view. Fish-eye (xr+yu+1−x2−y2f) and cylindrical (yu+cos(x)f+sin(x)r) projections are also used in some settings.
2 Rays and Ray-* Intersections
In general, the intersection of a ray ro+trd with some object g(p)=0 is found as the minimal t for which
g(ro+trd)=0. Constrained minimization of this type is a widely-studied problem in numerical analysis, optimization, and many branches of engineering and science. However, most raytracers use only three particular solutions: ray-plane, ray-AABB, and ray-sphere intersections.
Some of the geometry used in the ray-plane and ray-sphere intersection routines.
2.1 Ray-Plane Intersection
Planes can be described in a number of ways; principle among them are implicit equations Ax+By+Cz+D=0, point-and-normal form (n, p), and three-point form (p0, p1, p2). We will use the point-normal version, noting that
n≡(A,B,C)≡(p1−p0)×(p2−p0) and that the point (−AD,0,0) is on the plane.
The distance between the ray origin ro and a plane is (ro−p)⋅n∥n∥1. The distance the ray travels toward the plane per unit t is rd⋅n∥n∥−1. Setting these equal to one another we get t=rd⋅n(p−ro)⋅n. If t is positive, we can use that t in the ray equation to find the intersection point p′=trd+ro. If t is not positive there is no intersection.
Ray-Plane Intersection
Input
Ray with origin ro and direction rd
Plane with normal n through point p
Output
Either no intersection
Or intersection distance t and point p′
Process
Let t=rd⋅n(p−ro)⋅n
If t>0, intersection found at depth t is p′=trd+ro.
Otherwise, no intersection
2.2 Ray-Sphere Intersection
Given a sphere with center c and radius r, we first evaluate if the ray originates inside the sphere (∥c−ro∥2<r2) or not. We then find the t value of the point where the ray comes closest to the center of the sphere,
tc=∥rd∥(c−ro)⋅rd. If the ray origin is outside and tc is negative, there is no intersection. Otherwise we proceed to find the squared distance of closest approach
d2=∥ro+tcrd−c∥2. If d2>r2, which can only happen if the ray originates outside the sphere, then there is no intersection; otherwise we find how far from the point of closest approach the point of intersection is as
toffset=∥rd∥r2−d2. If the origin is inside, the point of intersection is
ro+(tc+toffset)rd; otherwise, it is
ro+(tc−toffset)rd.
Ray-Sphere Intersection
Input
Ray with origin ro and unit-length direction rd
Sphere with center c and radius r
Output
Either no intersection
Or intersection distance t and point p′
Process
let inside be (∥c−ro∥2<r2)
let tc=∥rd∥(c−ro)⋅rd
This is the distance along the ray to c′, the rays’ closest approach to c, but we don’t need c′, only tc
if not inside and tc<0, no intersection
let d2=∥ro+tcrd−c∥2
if not inside and r2<d2, no intersection
let toffset=∥rd∥r2−d2
This is the difference between t and tc; rays intersect spheres twice (once entering, once exiting) so tc±toffset are both intersection points
if inside, intersection found at depth t=tc+toffset is
c′′=trd+ro
otherwise, intersection found at depth t=tc−toffset is
c′′=trd+ro
2.3 Ray-AABB Intersection
It is rare to want to create images of axis-aligned bounding boxes (AABBs), but it is easy to find ray-AABB intersections and easy to find the AABB that contains a set of objects, so most raytracers try AABB intersections for sets of nearby objects before checking for intersections with the objects within the AABB.
AABBs consist of six axis-aligned planes. For the axis-aligned case, the ray-plane intersection becomes quite simple because the normal has only one non-zero element; thus, for the plane, e.g., x=a, the t value of intersection is
tx=a=rdxa−rox. The ray then intersects the AABB if and only if there is some positive t between all six planes; for minimum point (a,b,c) and maximum point
(A,B,C), we have
[0,∞)∩[tx=a,tx=A]∩[ty=b,ty=B]∩[tz=c,tz=C]=∅.
It is usually not important to know the t-value for an AABB intersection, but if needed it is simply the smallest t in the interval defined above.
3 Inverse Mapping and Barycentric Coordinates
Once you have found an intersection with some object it is generally desirable to know where you hit it so that you can apply texture mapping, normal interpolation, or the like. This process is called inverse mapping.
3.1 Inverse Sphere Mapping
For a sphere, if the point of intersection is p then the normal simply points from the center to the point of intersection, n=r1(p−c). From that it is easy to derive that the longitude is atan2(nx,nz) and the latitude is
atan2(ny,nx2+nz2).
3.2 Inverse Triangle Mapping
For a triangle, the typical inverse mapping gives you the Barycentric coordinates of the point of intersection. Barycentric coordinates are three numbers, one per vertex of the triangle, which state how close to each of the three vertices the point in question is. In particular, given an intersection point of p and vertices p0, p1, and p2, the barycentric coordinates
(b0,b1,b2) satisfy the two properties that
they sum to 1, and
p=b0p0+b1p1+b2p2.
Every point in the same plane as the triangle has a unique set of Barycentric coordinates, and all three coordinates are positive if and only if the point is within the triangle’s bounds.
There are many techniques for inverse mapping a triangle. The one I present here is not the most common, but is easy and efficient as long as you compute and store the information only once. First, observe that since bi is the nearness to point pi, it is also the distance from the edge joining the other two points. This distance can be found directly by using a dot product with a vector perpendicular to this edge.
Finding Barycentric coordinates. If e2⋅(p2−p0)=1, then
b2=e2⋅(p−p0), and similarly for b1. Since
b0+b1+b2=1, b0 is simply
1−b1−b2. Thus, assuming that correctly-scaled e1 and e2 are precomputed, we can compute the barycentric coordinates using just six multiplies and nine adds.
in that image, b1=e1⋅(p−p0) because e1 points directly away from the edge between pi=1. Similarly,
b2=e2⋅(p−p0) and
b0=1−b1−b2. It thus suffices to find e1 and e2 in order to find the barycentric coordinates. This may be done as a1a2e1e2=p2−p0×n=p1−p0×n=a1⋅p1−p01a1=a2⋅p2−p01a2 These two ei vectors can be precomputed and stored along with the normal n in each triangle data structure to allow rapid ray-triangle intersections. Note that this also suffices for the inside-outside test needed to turn a ray-plane intersection into a ray-triangle intersection; a point is inside a triangle if and only if all three barycentric coordinates are between zero and one.
Because
p=b0p0+b1p1+b2p2,
we can use the barycentric coordinates to find all the information stored in each vertex interpolated to any point on the interior of the triangle: p=b0p0+b1p1+b2p2. You can then use that information just as you would interpolated fragment values during rasterization.
4 Secondary Rays
Shadows, reflection, and transparency are easily achieved using secondary rays: Once you find an intersection point p you then generate a new ray with p as its origin and intersect that ray with the scene.
Care should be taken that roundoff errors in storing p do not cause the the secondary ray to intersect the object from which it originates. This cannot be done simply by using more precise numbers; instead you’ll need to do one of the following:
Bias the secondary rays away from the object that emits them. Effectively this means ignoring ray intersections with t<ϵ for some small positive ϵ. If your ϵ is too small, you’ll still get some self-intersections causing shadow acne or the like. If your ϵ is too large, opaque objects will render as invisible when seen from up close.
Ignore intersections with the (part of the) object that emitted the ray. For sphere reflections and shadows this means ignoring the sphere. For sphere transparency it means ignoring the same entering/exiting intersection of the sphere. For triangle meshes is means ignoring the triangle and its immediate neighbors.
Require particular directional interfacing. If all of your geometry is in the form of non-intersecting closed surface meshes then you can classify each ray as internal or external and each intersection as either entering or exiting and ignore intersections that don’t match the ray type. The conditions on this make it unusual as-is, but a variant combining it with biasing can work well.
4.1 Shadows
A point is in shadow relative to a particular light source if the ray (p,ℓ) intersects anything closer than the light source itself.
4.2 Reflection
A mirrored object’s color is given by the ray tracing result of the ray with origin p and direction 2(n⋅e)n−e. A partially mirrored object mixes that color result with a standard lighting computation at p. One reflection ray might generate another; a cutoff number of recursions is necessary to prevent infinite loops.
4.3 Transparency
Transparency is somewhat more complicated, relying on Snell’s Law. For it to make sense, every surface needs to be a boundary between two materials, which is not trivially true in the case of triangles nor intersecting objects. However, assuming that we know that the ray is traveling from a material with index of refraction n1 for index of refraction n2, we can derive a rule for finding the transmitted ray.
The cosine of the entering ray is (e⋅n), meaning that it’s sine is 1−(e⋅n)2, or ∥−e−(e⋅n)n∥2. The sine of the outgoing vector thus needs to be
n2n11−(e⋅n); if that is greater than 1, we have total internal refraction and use the reflection equation instead; otherwise, the cosine of the outgoing ray is
1−(n2n1)2(1−(e⋅n)). Putting this all together, we have
let a=e⋅n
let b=n2n11−(e⋅n)
if b≥1, return 2(n⋅e)n−e)
let
c=1−n22n12(1−(e⋅n))
return n2n1(−e−(e⋅n)n)−cn
4.4 Global Illumination
The standard lighting models pretend like the world is divided into a small number of light emitters and a vast supply of things that emit no light. This is obviously not true; if nothing but light sources gave off photons, we could only see the light sources themselves. With global illumination, you try to discover the impact of light bouncing off of the wall, floor, and other objects by creating a number of secondary rays sampling the diffuse reflection of each object.
Ideally, the distribution of rays should parallel the chosen model of diffuse lighting (see Section~) and should be so dense as to be intractable for any reasonable scene. Much of the research in photo-realistic rendering is devoted to finding shortcuts and techniques that make this process require fewer rays for the same visual image quality.
5 Photon Mapping and Caustics
Photon mapping is raytracing run backwards: instead of shooting rays from the eye, you shoot them from the lights. The quantity of light reaching each point in the scene is recorded and used when the scene is rendered, either by raytracing or rasterizing. Since many photons leaving a light source never reach the eye, this is an inefficient way of creating a single picture; however, it allows lens and mirror-bounced light (called caustics) to be rendered, and it can be more efficient than viewer-centric global illumination if many images are to be made of the same static scene.
6 Sub-, Super-, and Importance-Sampling
Sub-sampling is shooting fewer rays than you have pixels, interpolating the colors to neighboring pixels. Super-sampling is shooting several rays per pixel, averaging colors to create an anti-aliased image. Importance sampling shoots fewer rays per pixel into boring areas and more into important areas. Image-space importance sampling shoots more rays into areas of the scene where neighboring pixels differ in color; scene-space importance sampling shoots more rays toward particular items.