The OpenGL perspective transformation matrix

This document aims to clarify the mapping of eye or view space (they mean the same thing here) vertex positions, to normalised device coordinates (NDC's). Once coordinates are in NDC, only values that lie in the range from -1 to 1 will end up being displayed. In otherwords, coordinates that are outside the NDC x,y,z range -1 to +1 must be clipped. Clipping is another topic entierly and is not covered here. NDC values still need scaling and translation to absolute screen pixel coordinates via a viewport. See glViewport for more information

What assumptions are we using for this example ?

Stating assumptions at the beginning helps clarify ambiguities later on, so here goes.
  1. The line going down the centre of the view frustum (Direction of Projection or DOP) is already the same as the Z axis because we aren't mad, nor do we have a squint. Only mad people and people modeling squints would want a DOP that wasn't the Z axis or a COP that wasn't at the origin. A Z-axis aligned DOP is specified by a view frustum whose front clip plane is an equal distance either side of and above and below the COP. Or in other words left = -right and top = -bottom. This simply means for each pair of numbers, they are the same absolute scalar value, but one has the opposite sign to the other. I.e. if top = 0.5 then bottom must = -0.5. This means that the element of our OpenGL projection matrix in the first row, third column (r+l)/(r-l) becomes zero and the same happens to the element in the second row and third column (t+b)/(t-b).

  2. Column major, post multiplication notation is used. This diagram explains the resulting 4 element vector that is obtained by post-multiplying an input 4-element vector through a 4x4 matrix.
  3. The vantage point is at the origin in eye/view space looking directly down the negative Z axis (also in eye/view space). This is because OpenGL uses a Right-Handed Coordinate System. The viewing direction is along the Z axis if assumption 1 is true.

  4. Far and Near clipping planes are values that represent the distance to each one in eye/view space. The Near clip plane cannot be at zero, which is where our vantage point is in eye/view space. Also for practicality and sensibleness in this example far is greater than near and near is greater than zero. I.e. they are both positive. However, as we are looking down the negative Z axis, you will see near and far written below as -n and -f, but this is just due to them denoting distances along the -ve Z axis. n and f themselves are positive values.

And now the sums ...

Using the 4x4 homogeneous perspective projection matrix as specified by OpenGL 1.2 as shown...



We are going to take a point on the near clip plane at 0,0,-n,1 whose distance in the negative Z direction from the viewspace origin is denoted by the variable 'n'. So in other words this is a point in the middle of the scene (and eventually in the middle of our viewport that gets rendered onto the screen). We'll be showing the workings for transforming it through the projection matrix into OpenGL's clip space. After we have got a 4-element clip space coordinate, we are going to divide x,y and z by the fourth w element. This division by w is what "stretches" out the truncated pyramidal canonical view frustum into the nice easy-to-interpolate-in-hardware cube that ranges from -1 to +1 in all axes. Coordinates in this final space are called Normalised Device Coordinates (or NDC's). NDC's still aren't ready for rasterising on the screen in absolute screen pixel coordinates, but the step to get there from NDC's is far clearer in the OpenGL documentation than this, less easy to understand, step.




The OpenGL transform Order


Post-multiplying the position coordinate column vector, , (first shown above) gives us this:
(Thats essentially the dot product of the third and fourth row of the matrix and the input vector to give us the last two elements of our output vector (our answer). The zeros in the matrix mean that the first two elements in the output vector cancel out to become zero)
Now factor out n from the 3rd element.



Simplify the numerator. hint, (n-f) = -1*(f-n)
Factor out -1 from the numerator of the third element.

Cancel the (f-n) term in the numerator and denominator of the third element, Divide the first three elements by the last element to give a new 3 element NDC vector ...and voila!
This proves that the near plane, -n, in eye/view space maps exactly to NDC 0,0,-1 after perspective projection to clip space and then w division to NDC space. It is left as an exercise for the reader to repeat these steps for a point in eyespace on the far plane at 0,0,-f,1 and show that the result is 0,0,1