Why the clip space in OpenGL has 4 dimensions?

| | August 4, 2015

I will use this as a generic reference, but the more i browser online docs and books, the less i understand about this.

const float vertexPositions[] = {
    0.75f, 0.75f, 0.0f, 1.0f,
    0.75f, -0.75f, 0.0f, 1.0f,
    -0.75f, -0.75f, 0.0f, 1.0f,

in this online book there is an example about how to draw the first and classic hello world for OpenGL about making a triangle.

The vertex structure for the triangle is declared as stated in the code above.

The book, as all the other sources about this, stress the point that the Clip Space is a 4D structure that is used to basically decide what will be rasterized and rendered to the screen.

Here I have my questions:

  • i can’t imagine something in 4D, i don’t think that a human can do that, what is a 4D for this Clip space ?
  • the most human-readable doc that i have read speaks about a camera, which is just an abstraction over the clipping concept, and i get that, the problem is, why not using the concept of a camera in the first place which is a more familiar 3D structure? The only problem with the concept of a camera is that you need to define the prospective in other way and so you basically have to add another statement about what kind of camera you wish to have.
  • How i’m supposed to read this 0.75f, 0.75f, 0.0f, 1.0f ? All i get is that they are all float values and i get the meaning of the first 3 values, what does it mean the last one?

4 Responses to “Why the clip space in OpenGL has 4 dimensions?”

  1. TL;DR it’s not 4D space, it’s 3D plus a scaling number which is virtually always 1. If it is 1, you can ignore it and the first three numbers are x,y,z. If not it gets more complicated.

    Here’s a simple explanation. Vertices in 3D should only have three components

    v = |y|

    If we want to manipulate them (e.g. rotation, scaling etc.) we use a matrix. The most common example of course is the Model-View-Projection (MVP) matrix which transforms world coordinates into clip space. Like this:

    ⌈m11 m12 m13⌉ ⌈x⌉
    c = |m21 m22 m23| * |y|
    ⌊m31 m32 m33⌋ ⌊z⌋

    However this has a big flaw: you can’t do translation. If [x,y,z] is zero, no matter what m is the result will always be zero, so we can’t have an MVP which includes translation. Obviously we’d like that. The solution is to add a 1 to the end of our vectors and expand the matrix to 4×4:

    ⌈cx⌉ ⌈m11 m12 m13 tx⌉ ⌈x⌉
    |cy| = |m21 m22 m23 ty| * |y|
    |cz| |m31 m32 m33 tz| |z|
    ⌊ 1⌋ ⌊ 0 0 0 1⌋ ⌊1⌋

    (If you look at any orthogonal MVP matrix – e.g. from glOrtho() – you’ll find the 4th row is 0 0 0 1. Sometimes it is even left implicit.) If you work through the maths you will see that that is the same as

    ⌈cx⌉ ⌈m11 m12 m13⌉ ⌈x⌉ ⌈tx⌉
    |cy| = |m21 m22 m23| * |y| + |ty|
    ⌊cz⌋ ⌊m31 m32 m33⌋ ⌊z⌋ ⌊tz⌋

    The 4th component is called w, and while it doesn’t have to be 1, it nearly always is (before a transformation anyway; afterwards it is usually re-homogenised by dividing the the whole vector by w so it is 1 again). It’s kind of a hack to allow transformation matrices to include translation.


    I believe the original motivation was for perspective projections, which are impossible with 3D coordinates. There are other transformations you can only do with 4D vectors, but translation is the easiest to understand.

  2. Read the introduction of books you read, you’ll be surprised ;)

    http://arcsynthesis.org/gltut/Basics/Intro%20Graphics%20and%20Rendering.html under Rasterization Overview

    The “w” value (where the first 3 values are x, y and z) basically says what the dimensions of the clip space are. Because this is 1 scalar value, all 3 dimensions of the clip space are equal (and that’s why the clip space is a cube). Every vertex has it’s own clip space in which it exists (and basically needs to “fit” in, otherwise it CLIPS :D), there is not 1 “world” that is the clip space(though all clip spaces are in the same “world” I think, even I’m having trouble with this ;P).

    So if your vertex has for example the coordinate [1,1,1], if the clip space is 1 then the vertex is in the top right near corner of the screen (when all is default, I don’t know if the directions can be altered). But if the vertex has a clip space of 2, then the coordinate [1,1,1] will be somewhere let’s say, 3 quarters across the screen to the right, 3 quarters across the screen to the top, and the third dimension you can guess yourself.

    I think having let’s say a clip space of 5 would mean the locations within that clip space range from -5 to 5 on every dimension, instead of the cube being 5x5x5. But that’s probably because simply put: all x y and z coordinates are divided by the clip space dimension, so basically your vertices undergo this:

    x = x / w

    y = y / w

    z = z / w

    And that’s what makes it all possible. I think the reason this exists is for easy comparisons. If the coordinates have been divided by the clip space dimension, then the coordinate that has 1 or more components with a value higher than 1, exists outside the clip space. So if your clip space is let’s say 1024, but the coordinate is [2000,3,-100] then the x(2000) component is outside the clip space (which only ranges from -1024 to 1024).

    computingwise it’s easy to tell if something is inside the clipspace if all you have to do is (very crudely put ofc): (x/w)<1 && (x/w)>-1 then render. Also, I suppose having all clip spaces of all vertices the same size (so every clip space cube ranging from -1 to 1 in every dimension) makes it easier for whatever comes after the normalisation process, seeing as from that moment on all coordinates are floats ranging from 0 to 1 (disregarding that which has been clipped off).

  3. There is also one more reason that I see and that was not mentioned in the previous answers.

    Translation matrices are 4×4 so that you can also translate the object around “the world”.
    Because with a 3×3 matrix you can rotate and scale a 3d coordinate, but you can translate a 3d coordinate only with a 4×4 matrix, from here the need to express the 3d coordinates in a 4d vector.

  4. The magic term is “Homogeneous coordinates” which are used in systems where perspecive is a factor. Check the wiki for an overview, but it’s a long course of study to really understand it (which I don’t).

Leave a Reply