Is Transposing Matrix Efficient for Fortran Matrix-Vector Multiplication?

In summary, the conversation discusses efficiency in matrix times vector multiplication in Fortran. The speaker initially explains their method of computing the matrix times vector using nested loops, but then mentions that they have read about Fortran ordering arrays in column-major order and wonders if there is a more efficient way to do the multiplication. The suggestion is made to transpose the matrix and use the built-in matmul function, and the conversation also touches on the concept of data locality and its impact on efficiency. The speaker plans to compare different methods and share their results.
  • #1
Telemachus
835
30
Hi there. I wanted to ask this question, which is about efficiency in matrix times vector multiplication in fortran. When I have some matrix ##\hat A## and vector ##\vec{x}##, and I want to compute the matrix times vector ##\hat A \vec{x}=\vec{b}## in Fortran, what I do is, I build the array ##A(i,j)## and ##x(j)##, and then I tell Fortran to do this:

Fortran:
      do i=1,msize
       rsumaux=rzero
       rsum=rzero
       do k=1,msize
        rsumaux=A(i,k)*x(k)
        rsum=rsum+rsumaux
       enddo
       b(i)=rsum
      enddo

And this is just fine, it works, it give the result I want. But the thing is that I have read that Fortran orders the arrays in column-major order, which means that arrays like ##A(i,j)##, and let's suppose that A is a 3x3 matrix, then these are ordered in memory like: A(1,1) A(2,1) A(3,1) A(1,2) A(2,2) A(3,2) A(1,3) A(2,3) ,A(3,3).

So, the way I am doing the matrix multiplication isn't the most efficient way of doing it. Should I fix it by using that:
##\left (\hat A \vec{x}\right)^T=\vec{b}^T=\vec{x}^T\hat A^T##, would this be the right way to do it? should I store the matrix transposed in order to make my code more efficient and then code it this way:?

Fortran:
      do i=1,msize
       rsumaux=rzero
       rsum=rzero
       do k=1,msize
        rsumaux=A(k,i)*x(k)
        rsum=rsum+rsumaux
       enddo
       b(i)=rsum
      enddo

Thanks in advance.
 
Technology news on Phys.org
  • #2
I would try it both ways and see which one is faster. At least in 'modern' dialects of fortran, like fortran90, you can use the built-in matmul function and see how that compares.

jason
 
  • Like
Likes Telemachus and FactChecker
  • #3
If the colums are indeed in colum-major order, accessing them that way will be much faster.
The processor will put a cache line of 64 bytes with 8 double precision numbers at once. If you scan the matrix by colums, you can use all of the 8 numbers in the cache line. If you scan it by rows you can use only one of the 8 numbers.
Of course if everything fits in the 1st level cache there won't be any difference. The 1st level cache is 32 kb on modern intel/amd processors, so 4096 double precision values, or a 64x64 matrix is where you should start to see a difference.

Another potential bottleneck is the rsum = rsum + rsumaux statement. the addition depends on the previous value of rsum, so this can't be started until the previous addition is done. This might run faster:
Code:
do j=1,msize
  b(j) = 0
  do k=1,msize
     b(k) = b(k) + A(k,j) * x(j) 
  enddo
enddo
 
  • Like
Likes Telemachus
  • #4
What Willem2 is discussing in part is called data locality: memory is much slower than the CPU, which is why in the X86 architecture there are L!, L2, etc. caches to speed up access. However, if the cache has to be flushed and rewritten every iteration of the loop because the "next" item is never in the cache, the reverse is true - caching activity can actually degrade the overall efficiency of the loop. Because the cpu idles a lot of time waiting for cache flushes and re-fetches,

For large datasets the speed up can be dramatic when caching is present and used effectively.https://en.wikipedia.org/wiki/Locality_of_reference
If you want to pursue this further consider: google for data locality algorithms

For x86 this is a good read, and is still valid:
https://www.akkadia.org/drepper/cpumemory.pdf
 
  • Like
Likes Telemachus
  • #5
jasonRF said:
I would try it both ways and see which one is faster. At least in 'modern' dialects of fortran, like fortran90, you can use the built-in matmul function and see how that compares.

jason
I'm using fortran77, I've never tried Fortran90, nor anything newer. I like the syntaxes of fortran77, I know it's pretty archaic, but one advantage is that I know exactly what the computer is doing. A question that arises is if the same problem will arise when calling the matmul function, it would be nice to see how that works, I expect to have the same kind of issue. I will certainly compare the different ways of computing the matrix and how much cpu time it takes, maybe this week (I am working with other things, but efficiency is important to me, so I will certainly do it).

When I have some results I'll let you know. Thank you all for all the enriching comments. I have some technical doubts, but I think I have to read a little bit from the links posted by Jim Mcnamara before asking. Thank you very much, I really enjoy when assisted by experts like you, and this help has a tremendous value in so many senses to me, intellectually, professionally too, and at many other levels. So thanks, really.
 

FAQ: Is Transposing Matrix Efficient for Fortran Matrix-Vector Multiplication?

What is the syntax for performing matrix times vector in Fortran?

In Fortran, the syntax for performing matrix times vector is:
MATMUL(A, x), where A is the matrix and x is the vector.

Can I multiply a matrix by a scalar in Fortran?

Yes, you can multiply a matrix by a scalar in Fortran using the same syntax as matrix times vector:
MATMUL(scalar, A), where scalar is the scalar value and A is the matrix.

What happens if the dimensions of the matrix and vector do not match?

If the dimensions of the matrix and vector do not match, Fortran will return an error. The number of columns in the matrix must match the number of elements in the vector.

Can I store the result of matrix times vector in a new variable?

Yes, you can store the result of matrix times vector in a new variable by using the = symbol. For example:
C = MATMUL(A, x), where C is the new variable.

Is there a built-in function for performing matrix times vector in Fortran?

Yes, Fortran has a built-in function called MATMUL specifically for performing matrix times vector calculations. This function is optimized for efficiency and accuracy.

Similar threads

Replies
20
Views
2K
Replies
20
Views
3K
Replies
22
Views
4K
Replies
2
Views
3K
Replies
1
Views
3K
Replies
4
Views
2K
Replies
21
Views
2K
Replies
7
Views
3K
Replies
3
Views
2K
Back
Top