17 Aug 2013

Matrices are not Transforms

and Column-/Row-major Does Not Imply Pre-/Post-multiply

Confusion, or, at the very least, imprecise use of terminology, seems to occur often with regard to each of these four concepts and their inter-use:

Matrices
Transforms
Column-major and Row-major storage
Pre-multiplication and post-multiplication

Confusion is often the case among those who are simultaneously learning both theory (say computer graphics or computer vision) and application (say Direct3D or OpenGL, and C).

Let’s turn mud into water.

Linear Transformations

A linear transformation, a.k.a. a linear transform, a.k.a. a linear map is a class of mathematical object. Specifically it is a class of functions between vector spaces.

Function f is a linear transformation if and only if the following is true for any scalar value α, and vector values x and y, all sharing a common field (e.g. ℝ).

f (x + y) = f (x) + f (y)  
f (α * x) = α * f (x)

You do not need a matrix to define a linear transform, nor is it mentioned anywhere in the definition of one. In computer graphics, linear transforms in projective spaces are what translate and rotate objects, not matrices.

Matrices

A matrix is a collection of objects ordered over two dimensions. Most of the time these objects are real numbers or floating point approximations thereof.

3  0.0  2.8  1.0
8  1.0  5.5  5.7
8  1.0  5.5  5.7
1  3.9  4.5  2.2

This example of a collection of numbers (a 4-row by 4-column matrix) can be used for anything. The collection no semantics about it, other than to some extent an ordering, and any semantics inherent in each individual number. But the matrix as an object itself has no semantics or clear use without context.

1.0  0.0  3.0  9.7

Note that vectors can alternatively be represented as single-row or single-column matrices, such as seen above, and is useful in generalizing certain vector-matrix operations.

Defining Computable Transformations

There are fields of study devoted to computable numbers, computational complexity, computing machines, and so on, which amount to the bridge between pure mathematics and the physical world with concrete output.

As it turns out, matrices are a useful, machine representable form when the time comes to actually compute outputs of a linear transformation. How does one use a matrix to compute a linear transform? Well there is no unique answer to this question. In fact there are at least two unique answers.

If you represent your vector as a 1-row matrix, say v below, and have some matrix A, the following function f is a linear transformation using “standard” matrix-matrix multiplication.

f (v) = v * A

If you instead represent your vector as a 1-column matrix, say u, the following function g is also a linear transformation.

g (u) = A * u

Are f and g the same linear map? No! And this is precisely a counter-example showing why matrices are not the same as linear transformations. A linear transformation is a function. How you compute that function is not specified. If you want to use a matrix, you may, but a matrix by itself isn’t sufficient to define a linear map (nor always necessary).

Note that by convention, f is an example of “pre-multiplication” while g is an example of “post-multiplication”.

Machine Storage

Accessing the elements of a matrix have a standard convention in mathematics. In each of the following expressions, i is a row number and j is a column number.

A_ij
A[i][j]
A[i,j]
A_ij

For example, A[3,1] refers to the object at the 3rd row and 1st column of matrix A.

Computer memory is addressed in only one dimension, not two, thus compilers (or user-defined matrix types) must decide how to store the two-dimensional structure of a matrix in one dimension. For a matrix of size n*m there are (nm)! different ways to store the matrix in a one-dimensional list. However there are two different, common ways called “row-major” and “column-major”.

In row-major storage, each row is packed into a list row-at-a-time. In column-major storage, each column is packed into a list, column-at-a-time. Using the 4x4 matrix presented earlier, these storage formats result in the following lists.

Row-major

4.3  0.0  2.8  1.0  2.8  1.0  5.5  5.7  2.8  1.0  5.5  5.7  1.1  3.9  4.5  2.2

Column-major

4.3  2.8  2.8  1.1  0.0  1.0  1.0  3.9  2.8  5.5  5.5  4.5  1.0  5.7  5.7  2.2

If a programming language or user-defined matrix type adheres to the standard mathematical notation (or any other notation for that matter) for accessing elements in a matrix, then as far as correctness goes it is completely irrelevant whether or not your matrix is stored row-major or column-major. Assuming you’ve already adopted a high-level language (are not writing in assembly language), then the distinction is an implementation detail of your compiler or matrix type.

Because storage is an implementation detail, it is irrelevant to whether or not your matrix is used for transformation, and irrelevant to which side you multiply your vector on in your map’s matrix multiplication.

However, there are specific practices that arise in the course of optimizing for hardware caches.

Column-/Row-major Does Not Imply Pre-/Post-multiply

“OpenGL uses column-major matrices.”

This is often uttered. What can we infer from this statement?

Not much, by itself. A matrix is not a linear transformation. We need more context about how this matrix will be used if we want to pick it apart and identify acting elements of the map.

Imagine a column-major, 4x4 matrix A used to compute a transformation in a projective space. We need to know if standard matrix multiplication is used (it almost always is), and whether our homogeneous vectors are to be pre-multiplied or post-multiplied. OpenGL by default assumes (or suggests) vectors are 4x1 (column) matrices post-multiplied. From this we know we can inspect the 4th column of A to extract the translational effects of g in our projective space.

A Detour into Syntax in C

Where is the first element of the fourth column located?

In mathematical notation:

A₁₄

In C notation:

A[3][0]

What? The indices are switched! And my values are off by one!

C notation assumes row-major storage, but the OpenGL matrix is stored in column-major storage. C does not have syntax for accessing a column-major matrix, however the transpose of a column-major matrix is the same as if the original matrix were stored row-major matrix, thus we access the transpose of this matrix at the C-level by swapping the indices.

Lastly note that C uses 0-based indexing vs. 1-based indexing commonly used in mathematics, thus to convert from 1-based to 0-based you must subtract 1 from each index.

Flipping Switches

For a single linear map in a 3d projective space, where are the translational elements of a 4x4 matrix located in the low-level list?

Pre-multiply, row-major:      every fourth element.
Pre-multiply, column-major:   last four elements.
Post-multiply, row-major:     last four elements.
Post-multiply, column-major:  every fourth element.

Knowing whether the matrix is column-major or row-major is insufficient for knowing how to infer a linear map. Pre-multiplication and post-multiplication are both perfectly reasonable choices for building a map, yet each choice would result in generally a very different map.

When Cache is King

Is knowing whether a matrix is column-major or row-major helpful in deducing what the
linear map is? Sometimes. In practice, especially for large matrices, there is a right way and a wrong to store and compute linear maps, with respect to optimizing performance. General purpose hardware caches are generally more optimized for sequential access than non-sequential access (e.g. via the use of cache “lines” to exploit spatial locality).

If your linear map is defined using post-multiplication, computing the output of the map will be faster if the matrix is stored row-major. On the other hand, if you need to often read or write basis vectors for the map’s range (a.k.a. its column space), using column-major storage will be more performant.

Only with some context, assumptions, or prior knowledge, can you know how to interpret a matrix, with its storage format, as its intended linear map.

Direct3D and OpenGL Serialize Their Linear Maps the Same

There is some useful question & answer on this subject at StackOverflow: Confusion between C++ and OpenGL matrix order.

One of the answers given comes very close to accurately describing the difference (or lack thereof) between Direct3D and OpenGL. However, it is not that their matrices are serialized the same; rather, it is that the acting elements of their linear maps are serialized to the same locations.

At least there is agreement among some rivals.

la guarida de DuckMaestro