25 May 2015

Game Performance: Geometry Instancing

Posted by Shanee Nishry, Games Developer Advocate

Imagine a beautiful virtual forest with countless trees, plants and vegetation, or a stadium with countless people in the crowd cheering. If you are heroic you might like the idea of an epic battle between armies.

Rendering a lot of meshes is desired to create a beautiful scene like a forest, a cheering crowd or an army, but doing so is quite costly and reduces the frame rate. Fortunately this is possible using a simple technique called Geometry Instancing.

Geometry instancing can be used in 2D games for rendering a large number of sprites, or in 3D for things like particles, characters and environment.

The NDK code sample More Teapots demoing the content of this article can be found with the ndk inside the samples folder and in the git repository.

Support and Extensions

Geometry instancing is available from OpenGL ES 3.0 and to OpenGL 2.0 devices which support the GL_NV_draw_instanced or GL_EXT_draw_instanced extensions. More information on how to using the extensions is shown in the More Teapots demo.


Submitting draw calls causes OpenGL to queue commands to be sent to the GPU, this has an expensive overhead which may affect performance. This overhead grows when changing states such as alpha blending function, active shader, textures and buffers.

Geometry Instancing is a technique that combines multiple draws of the same mesh into a single draw call, resulting in reduced overhead and potentially increased performance. This works even when different transformations are required.

The algorithm

To explain how Geometry Instancing works let’s quickly overview traditional drawing.

Traditional Drawing

To a draw a mesh you’d usually prepare a vertex buffer and an index buffer, bind your shader and buffers, set your uniforms such as a World View Projection matrix and make a draw call.

To draw multiple instances using the same mesh you set new uniform values for the transformations and other data and call draw again. This is repeated for every instance.

Drawing with Geometry Instancing

Geometry Instancing reduces CPU overhead by reducing the sequence described above into a single buffer and draw call.

It works by using an additional buffer which contains custom per-instance data needed by your shader, such as transformations, color, light data.

The first change to your workflow is to create the additional buffer on initialization stage.

To put it into code let’s define an example per-instance data that includes a world view projection matrix and a color:


struct PerInstanceData
 Mat4x4 WorldViewProj;
 Vector4 Color;

You also need to the structure to your shader. The easiest way is by creating a Uniform Block with an array:


#define MAX_INSTANCES 512

layout(std140) uniform PerInstanceData {
        mat4      uMVP;
        vec4      uColor;
    } Data[ MAX_INSTANCES ];

Note that uniform blocks have limited sizes. You can find the maximum number of bytes you can use by querying for GL_MAX_UNIFORM_BLOCK_SIZE using glGetIntegerv.


GLint max_block_size = 0;
glGetIntegerv( GL_MAX_UNIFORM_BLOCK_SIZE, &max_block_size );

Bind the uniform block on the CPU in your program’s initialization stage:


#define MAX_INSTANCES 512
GLuint shaderProgram; // Compiled shader program

// Bind Uniform Block
GLuint blockIndex = glGetUniformBlockIndex( shaderProgram, "PerInstanceData" );
glUniformBlockBinding( shaderProgram, blockIndex, BINDING_POINT );

And create a corresponding uniform buffer object:


// Create Instance Buffer
GLuint instanceBuffer;

glGenBuffers( 1, &instanceBuffer );
glBindBuffer( GL_UNIFORM_BUFFER, instanceBuffer );
glBindBufferBase( GL_UNIFORM_BUFFER, BINDING_POINT, instanceBuffer );

// Initialize buffer size
glBufferData( GL_UNIFORM_BUFFER, MAX_INSTANCES * sizeof( PerInstanceData ), NULL, GL_DYNAMIC_DRAW );

The next step is to update the instance data every frame to reflect changes to the visible objects you are going to draw. Once you have your new instance buffer you can draw everything with a single call to glDrawElementsInstanced.

You update the instance buffer using glMapBufferRange. This function locks the buffer and retrieves a pointer to the byte data allowing you to copy your per-instance data. Unlock your buffer using glUnmapBuffer when you are done.

Here is a simple example for updating the instance data:

const int NUM_SCENE_OBJECTS = …; // number of objects visible in your scene which share the same mesh

// Bind the buffer
glBindBuffer( GL_UNIFORM_BUFFER, instanceBuffer );

// Retrieve pointer to map the data
PerInstanceData* pBuffer = (PerInstanceData*) glMapBufferRange( GL_UNIFORM_BUFFER, 0,
                NUM_SCENE_OBJECTS * sizeof( PerInstanceData ),

// Iterate the scene objects
for ( int i = 0; i < NUM_SCENE_OBJECTS; ++i )
    pBuffer[ i ].WorldViewProj = ... // Copy World View Projection matrix
    pBuffer[ i ].Color = …               // Copy color

glUnmapBuffer( GL_UNIFORM_BUFFER ); // Unmap the buffer

And finally you can draw everything with a single call to glDrawElementsInstanced or glDrawArraysInstanced (depending if you are using an index buffer):

                NUM_SCENE_OBJECTS );

You are almost done! There is just one more step to do. In your shader you need to make use of the new uniform buffer object for your transformations and colors. In your shader main program:

void main()
    gl_Position = PerInstanceData.Data[ gl_InstanceID ].uMVP * inPosition;
    outColor = PerInstanceData.Data[ gl_InstanceID ].uColor;

You might have noticed the use gl_InstanceID. This is a predefined OpenGL vertex shader variable that tells your program which instance it is currently drawing. Using this variable your shader can properly iterate the instance data and match the correct transformation and color for every vertex.

That’s it! You are now ready to use Geometry Instancing. If you are drawing the same mesh multiple times in a frame make sure to implement Geometry Instancing in your pipeline! This can greatly reduce overhead and improve performance.