- #1
swartzism
- 103
- 0
Right now, I'm putting together a program to perform a lower-triangular matrix matrix multiply, which should be straightforward, but I'm getting crap results. I'm using BLAS's strmm (s for single precision) and the call is of the form:
The full documentation of the subroutine's arguments can be found here: http://software.intel.com/sites/pro...GUID-0E088C97-E200-44A2-AA53-7DDBD070F061.htm
For this, the main things you need to know are N (dimension of A and B: NxN), A and B. A and B are the lower-triangular square matrices. So I set those matrices correctly (I checked) and then I call strmm in the exact manner I listed above, and it returns the exact same B matrix.
Does anyone have any experience with using ?trmm or any other BLAS routines? If so, any help would be greatly appreciated.
Code:
strmm(&side, &uplo, &transa, &diag, &N, &N, &alpha, *A, &N, *B, &N)
The full documentation of the subroutine's arguments can be found here: http://software.intel.com/sites/pro...GUID-0E088C97-E200-44A2-AA53-7DDBD070F061.htm
For this, the main things you need to know are N (dimension of A and B: NxN), A and B. A and B are the lower-triangular square matrices. So I set those matrices correctly (I checked) and then I call strmm in the exact manner I listed above, and it returns the exact same B matrix.
Code:
Matrix dimension is set to 5 Ingoing matrix B:
2.000 0.000 0.000 0.000 0.000
2.000 2.000 0.000 0.000 0.000
2.000 2.000 2.000 0.000 0.000
2.000 2.000 2.000 2.000 0.000
2.000 2.000 2.000 2.000 2.000
Resulting matrix B:
2.000 0.000 0.000 0.000 0.000
2.000 2.000 0.000 0.000 0.000
2.000 2.000 2.000 0.000 0.000
2.000 2.000 2.000 2.000 0.000
2.000 2.000 2.000 2.000 2.000
Does anyone have any experience with using ?trmm or any other BLAS routines? If so, any help would be greatly appreciated.
Code:
// System headers
#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>
#include <stdint.h>
#include <time.h>
// MKL header
#include "mkl.h"
int main(int argc, char **argv)
{
char side = 'L', // B := alpha*op(A)*B
uplo = 'L', // Matrix is lower triangular
transa = 'N', // op(A) = A
diag = 'U'; // Matrix is not unit triangular
MKL_INT N = 5, NP; // M, N, lda, ldb
float alpha = 1.0; // Scaling factor
float A[N][N]; // A matrix (NxN)
float B[N][N]; // B matrix (NxN)
int matrix_elements; // Matrix size in elements
int i, j; // Counters
time_t t0, t1; // Timers
printf("\nMatrix dimension is set to %d \n\n", (int)N);
t0 = time(NULL);
matrix_elements = N * N;
// Set A and B to 0.0
for (i = 0; i < N; i++)
for (j = 0; j <= i; j++)
{
A[i][j] = 0.0;
B[i][j] = 0.0;
}
// Initialize the matrices to be lower triangluar
for (i = 0; i < N; i++)
for (j = 0; j <= i; j++)
{
A[i][j] = 2.0;
B[i][j] = 2.0;
}
// Display the input
printf("\nIngoing matrix B:\n");
if (N>10)
{
printf("NOTE: B is too large, print only upper-left 10x10 block...\n");
NP=10;
}
else
NP=N;
printf("\n");
for (i = 0; i < NP; i++)
{
for (j = 0; j < NP; j++)
printf("%7.3f ", B[i][j]);
printf("\n");
}
strmm(&side, &uplo, &transa, &diag, &N, &N, &alpha, *A, &N, *B, &N);
t1 = time(NULL);
// Display the result
printf("\nResulting matrix B:\n");
if (N>10)
{
printf("NOTE: B is too large, print only upper-left 10x10 block...\n");
NP=10;
}
else
NP=N;
printf("\n");
for (i = 0; i < NP; i++)
{
for (j = 0; j < NP; j++)
printf("%7.3f ", B[i][j]);
printf("\n");
}
printf ("\nElapsed wall clock time: %ld\n", (long)(t1 - t0));
return 0;
}