The code of the vertex and fragment shaders are specified by the programmer. The vertex shader has built-in inputs gl_VertexID
and gl_InstanceID
and built-in outputs gl_Position
and gl_PointSize
. The fragment shader has built-in inputs gl_FragCoord
, gl_FrontFacing
, and gl_PointCoord
and built-in output gl_FragDepth
. The fragment shader also has a single programmer-named color output which is used to display the frame buffer. In between these two shaders the primitives are assembled, clipped, projected, and rasterized, with values interpolated to each resulting fragment.
The vertex shader may take additional programmer-specified inputs called attributes.
The values of these attributes are pulled from special arrays in graphics memory called buffers.
The set of values to run the vertex shade on, together with how sets of vertices are to be assembled into primitives, is specified by the specific draw command used as discussed below.
The vertex shader may produce additional outputs called varyings.
These are automatically interpolated by the rasterizer and their interpolated values (still called varyings
) are provided as additional inputs to the fragment shader.
Both shaders have access to global values called uniforms
that are the same for all vertices and fragments in a given draw command. Sending values to the buffers tends to be significantly slower than rendering from the buffers that are there, so there’s a preference for making the buffers static, with values specified once and rendered many times; changing the uniforms each frame can create per-frame motion with a static buffers. Uniforms are also used for large data like textures.
To run a shader program, the GPU needs to know
There are multiple ways to provide this data, each of which has multiple steps. We illustrate the first two with an example based on the following simple three-triangle bowl object:
Listed in rough order of likelihood to be what you want, from most likely to least likely, these are:
For most polygonal approximations of smooth surfaces:
For each scene object,
Array | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Position | 0 | 0 | -1 | ½ | -1 | 0 | ½ | 1 | 0 | -1 | 0 | 0 |
Normal | 0 | 0 | 1 | -⅓ | ⅔ | ⅔ | -⅓ | -⅔ | ⅔ | 0 | ⅔ | |
Index | 0 | 1 | 2 | 0 | 2 | 3 | 0 | 3 | 1 |
For each scene object,
gl.drawElements
This works well for almost any object type. It is a bit less efficient than the next option for drawing points or for drawing other primitives that do not share vertex attributes (such as flat-shaded polyhedra).
For most disconnected points or flat-shaded polygons:
For each scene object,
Array | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Position | 0 | 0 | -1 | ½ | -1 | 0 | ½ | 1 | 0 | 0 | 0 | -1 | ½ | 1 | 0 | -1 | 0 | 0 | 0 | 0 | -1 | -1 | 0 | 0 | ½ | -1 | 0 |
Normal | 0 | 0 | 1 | -⅓ | ⅔ | ⅔ | -⅓ | -⅔ | ⅔ | 0 | 0 | 1 | -⅓ | -⅔ | ⅔ | 0 | ⅔ | 0 | 0 | 1 | 0 | ⅔ | -⅓ | ⅔ | ⅔ |
For each scene object,
gl.drawArrays
This works well for objects that do not share vertex attributes, such as points or flat-shaded polyhedra. If vertices and their attributes are used for multiple primitives, as is the case for most virtually all polygonal approximations of smooth objects, the previous option is more efficient.
For many copies of the exact same object:
If you have several multiple copies of the same scene object in the scene such that you can easily compute their placement using the same uniform
s coupled with an integer telling you which copy you’re drawing, then use one of the previous two options but use the gl.drawElementsInstanced
or gl.drawArraysInstanced
methods instead of the non-instanced options.
This is generally much faster than using the non-instanced options repeatedly, but unless you have identical objects positioned in some kind of fixed grid or pseudo-random scattering it is unlikely to be useful.
For very many distinct objects:
For a set of scene objects that will have the same set of vertex attributes,
Many an array of attribute values for each vertex of all scene objects, one after the other.
For example, if you have a 12-vertex sphere and a 30-vertex knob you’d put the vertices of the sphere in indices though and of the knob in indices through , where is the number of values per vertex. Technically you can interleave vertices of different objects, but doing so has no advantage and might impeded cache performance.
Make an array of primitive connectivity for all scene objects, one after the other.
For example, if you have a 20-triangle sphere and a 50-triangle knob you’d put the vertices of the sphere in indices 0 though 59 and of the knob in indices 60 through 209. You cannot interleave the triangle indices: they have to be grouped by scene object.
Make a vertex array object on the GPU to collect the next steps
Send each attribute values array to the GPU as an array buffer
Send the connectivity array to the GPU as an element array buffer
Bind that vertex array object
For each scene object, call gl.drawElements
with offset
of the index of the first entry in the index array and count
of the number of index values.
For example, the 50-triangle knob above would use offset
of 60 and count
of 150.
This works well for almost any object type. It’s a bit more confusing to the programmer and makes for harder-to-maintain code, but it uses fewer buffers on the GPU and can be marginally faster and use slightly less GPU memory.
There’s also a similar shared-array, offset-and-count option for gl.drawArrays
, gl.drawElementsInstanced
, and gl.drawArraysInstanced
.
For saving GPU memory given several small attributes:
WebGL assumes all attributes are 4-vectors and at least nominally expands smaller attributes to that size automatically. If you have multiple attributes that collectively take up less than 4 floats per vertex (for example, a 2D texture coordinate and a 1D shininess parameter) you can save some time and space by combining them into a larger vector when providing the buffer.