How Does cblas_dgemm Handle Matrix Operations in C?

  • Thread starter KStolen
  • Start date
In summary, BLAS dgemm in C is a function used in linear algebra that performs a matrix multiplication of two double-precision real matrices. It is part of the Basic Linear Algebra Subprograms (BLAS) library and is commonly used in scientific computing and data analysis applications. The dgemm function can be easily implemented in C and offers efficient and optimized performance for large matrix operations. Understanding BLAS dgemm in C is essential for developers and researchers working with linear algebra and numerical computations.
  • #1
KStolen
14
0
Hi guys, I'm having trouble understanding how this routine works.

cblas_dgemm is a BLAS function that gives C <= alpha*AB + beta*C
where A,B,C are matrices and alpha, beta are scalars.

Code:
 void cblas_xgemm (
     const enum CBLAS_ORDER Order,
     const enum CBLAS_TRANSPOSE TransA,
     const enum CBLAS_TRANSPOSE TransB,
     const int M,
     const int N,
     const int K,
     const SCALAR alpha,
     const TYPE * A,
     const int lda,
     const TYPE * B,
     const int ldb,
     const SCALAR beta,
     TYPE * C,
     const int ldc)

It takes 14 parameters, listed http://www.psatellite.com/matrixlib/api/lapack.html" .

http://en.wikipedia.org/wiki/General_Matrix_Multiply" on wikipedia.

I don't understand what the "major stride" (lda,ldb,ldc) is or how it works, despite the explanations given on both sites.

Here are my example matrices:

[itex]A = \begin{bmatrix}1
&1 &1 &1 \\ 1
&1 &1 &1 \\ 1
&1 &1 &1 \\ 1
&1 &1 &1
\end{bmatrix}
[/itex]

[itex]
B = \begin{bmatrix}1
&1 &1 &1 \\ 1
&1 &1 &1 \\ 1
&1 &1 &1 \\ 1
&1 &1 &1
\end{bmatrix}
[/itex]

[itex]
C = \begin{bmatrix}0
&0 &0 &0 \\ 0
&0 &0 &0 \\ 0
&0 &0 &0 \\ 0
&0 &0 &0
\end{bmatrix}
[/itex]

I want to be able to take the bottom quarter submatrices of A and B, multiply them and add them to C such that

[itex]
C = \begin{bmatrix}0
&0 &0 &0 \\ 0
&0 &0 &0 \\ 0
&0 &2 &2 \\ 0
&0 &2 &2
\end{bmatrix}
[/itex]

Here's an excerpt from my code:
Code:
//where n is the matrix size, in this case 4
void Multiply(int n, int blockSize, double** a, double** b, double** c)
{
       cblas_dgemm(CblasRowMajor,CblasNoTrans, CblasNoTrans, n, n , n , 1.0, a[0], n, b[0], n, 1.0, c[0], n);
}

This code successfully multiplies A by B and gets C, a matrix filled with 4's.

I'm thinking something like:

Code:
//where n is the matrix size, in this case 4
void Multiply(int n, int blockSize, double** a, double** b, double** c)
{
       cblas_dgemm(CblasRowMajor,CblasNoTrans, CblasNoTrans, 2, 2 , 2 , 1.0, a[1], n, b[1], n, 1.0, c[1], n);
}

but that returns

[itex]
C = \begin{bmatrix}0
&0 &0 &0 \\ 2
&2 &2 &2 \\ 0
&0 &0 &0 \\ 0
&0 &0 &0
\end{bmatrix}
[/itex]

instead.

I think it's because I don't understand the lda, ldb and ldc parameters. Is it possible to get what I want here, through only using cblas_dgemm? Obviously, I could iterate over the arrays with loops but I'd prefer not to have to do that.
 
Last edited by a moderator:
Physics news on Phys.org
  • #2
Well I realized what I was doing wrong, so I thought I'd put the description here for anyone who needs to know.

lda, ldb and ldc (the strides) are not relevant to my problem after all, but here's an explanation of them :

The elements of a matrix (i.e a 2D array) are stored contiguously in memory. However, they may be stored in either column-major or row-major fashion. The stride represents the distance in memory between elements in adjacent rows (if row-major) or in adjacent columns (if column-major). This means that the stride is usually equal to the number of rows/columns in the matrix.

Matrix A =
[1 2 3]
[4 5 6]
Row-major stores values as {1,2,3,4,5,6}
Stride here is 3

Col-major stores values as {1, 4, 2, 5, 3, 6}
Stride here is 2


Matrix B =
[1 2 3]
[4 5 6]
[7 8 9]

Col-major storage is {1, 4, 7, 2, 5, 8, 3, 6, 9}
Stride here is 3


However, my issue was very simple in the end. I just didn't understand pointers. When you reference an array in C, what you are really referencing is the memory location of the first element of that array. Hence, to perform the operation I required, simply do this:

Code:
//where n is the matrix size, in this case 4
//if blocking, replace M,N,K with blockSize
void Multiply(int n, int blockSize, double** a, double** b, double** c)
{
       cblas_dgemm(CblasRowMajor,CblasNoTrans, CblasNoTrans, 2, 2 , 2 , 1.0, a[2]+2, n, b[2]+2, n, 1.0, c[2]+2, n);
}
 
Last edited by a moderator:
  • #3
KStolen said:
The elements of a matrix (i.e a 2D array) are stored contiguously in memory. However, they may be stored in either column-major or row-major fashion. The stride represents the distance in memory between elements in adjacent rows (if row-major) or in adjacent columns (if column-major). This means that the stride is usually equal to the number of rows/columns in the matrix.
I'm reasonably sure that the elements of a two-dimension array are stored in row-major order in C and other C-based languages. If I remember correctly, Fortran does things differently, storing elements of a matrix in column-major order.
 
  • #4
That's right Mark. Because BLAS is written in Fortran, matrix row/column order is something you should be aware of when passing it matrices from C.
 
  • #5


First of all, it's great that you are trying to understand how the cblas_dgemm function works and how to use it effectively in your code. Let's break down the function and its parameters to better understand its purpose and how to use it.

The cblas_dgemm function is part of the Basic Linear Algebra Subprograms (BLAS) library, which is a collection of highly optimized functions for linear algebra operations. These functions are commonly used in scientific computing and are designed to work efficiently with different hardware architectures. The cblas_dgemm function specifically performs a matrix-matrix multiplication operation, where the result is stored in a third matrix.

Now, let's look at the parameters of the function. The first three parameters (Order, TransA, TransB) specify the layout and orientation of the input matrices A and B. The Order parameter determines whether the matrices are stored in row-major or column-major order. This is important because it affects how the elements of the matrix are accessed in memory. TransA and TransB specify whether the matrices should be transposed before the multiplication operation. This is useful when working with non-square matrices.

Next, the parameters M, N, and K specify the dimensions of the matrices A, B, and C respectively. In your example, all three matrices have a size of 4x4, so these parameters would be set to 4.

The next two parameters (alpha and beta) are scalars that are used to scale the matrices A and B and the result matrix C, respectively. This allows for more flexibility in the operation being performed.

The next three parameters (A, lda, B) specify the input matrices and their leading dimension. The leading dimension (lda) is the number of elements between consecutive rows (if row-major order is used) or columns (if column-major order is used) in the matrix. This is where the "major stride" comes into play. In your example, you are using row-major order and the matrices A, B, and C are all stored in contiguous memory, so the leading dimension is simply the number of columns in the matrix (in this case 4).

Finally, the last three parameters (beta, C, ldc) specify the output matrix C and its leading dimension. Again, the leading dimension (ldc) is the number of elements between consecutive rows (if row-major order is used) or columns (if column-major order is used) in the matrix.
 

FAQ: How Does cblas_dgemm Handle Matrix Operations in C?

What is BLAS dgemm in C?

BLAS (Basic Linear Algebra Subprograms) is a set of low-level matrix and vector operations widely used in scientific computing. BLAS dgemm (Double precision General Matrix Multiplication) is a specific function within the BLAS library that performs matrix multiplication for double-precision matrices in C.

Why is understanding BLAS dgemm important for scientists?

BLAS dgemm is a fundamental operation used in many scientific computing applications, such as machine learning, signal processing, and computational physics. Understanding how it works can help scientists write more efficient and accurate code.

What are the parameters of BLAS dgemm and how do they affect the output?

The parameters of BLAS dgemm include the matrix dimensions (m, n, k), the values of the matrices (A, B), and the output matrix C. These parameters determine the size and values of the matrices being multiplied, which ultimately affect the resulting matrix C.

How does BLAS dgemm differ from other matrix multiplication functions?

BLAS dgemm is a low-level function designed for efficient matrix multiplication on a single processor. It differs from other matrix multiplication functions in that it does not handle parallel computing or memory allocation, making it more lightweight and suitable for use in scientific computing applications.

What are some tips for optimizing BLAS dgemm performance?

Some tips for optimizing BLAS dgemm performance include using matrices with dimensions that are multiples of the processor's cache size, avoiding unnecessary memory allocations, and using compiler optimizations. Additionally, using parallel computing techniques can greatly improve performance on multi-core processors.

Similar threads

Replies
1
Views
2K
Replies
7
Views
2K
Replies
23
Views
7K
Replies
1
Views
2K
Replies
2
Views
3K
Back
Top