OpenMP and Fortran90/C++ problems

  • Fortran
  • Thread starter ozzimandias
  • Start date
In summary, when calling a Fortran subroutine from C++, the speedup is only ~20% due to I/O operations inside the subroutine. However, if the matrix being passed is not static, then the programmer needs to be careful about data initialization and synchronization.
  • #1
ozzimandias
5
0
Hello,

I need to call a Fortran DLL from C++. My program simulates difussion of some substances, I won't go into details.
I generate my data in C++ and I pass it to the Fortran code which does all the computation part. I repeat this a number of times and try to implement parallel execution using OpenMP.
Everything runs well, I actually try to test the execution speed for the fortran subroutine.
My code is run on an Intel P4. 1 physical core, 2 virtual cores available for multithreading.
Here is some c++ code:
Code:
#pragma omp parallel
    {
	   #pragma omp  for  
	   for (i=1;i<=1000;i++)
	   {
		 ///here i generate some random data  : x,y,z...
		 ///here i call fortran with that data:     fortran_subroutine(x,y,z...)
	   }
    }

I would like to get an answer for the following questions:
1. The execution without openMP takes ~15s and with openMP enabled ~12s. Why is the speedup so low (20% improvement in execution speed)? I mean, I know it shouldn't it be around twice, but still... I don't have I/O operations inside the Fortran subroutine.

2. To pass an array by reference from c++->fortran you do the following:
Code:
int *array;
array=new double[some_size];
//initialize array;
//write array[i] before 
call_fortran(array);
//write array[i] after
The array will be modified inside fortran, then passed to c++.
Now my question is: How do you do the same thing with a matrix ?(i.e. declaration, passing it by reference to fortran)
 
Technology news on Phys.org
  • #2
I"m not sure how efficient hyperthreading is, depending on the type of instructions being peformed, such as integer versus floating point operations. If the program is memory bandwidth limited, then parallel processing won't help much. A lot of random accessing of memory can be an issue also, since caching won't be as effective.
 
  • #3
I'm not familar with the syntax of using OMP, but depending how much work is done for each value of i, it could make a big difference if you are creating 1000 separate threads, one for each value of i, or just 2 threads with 500 values of i processed by each thread.
 
  • #4
ozzimandias said:
Hello,
Now my question is: How do you do the same thing with a matrix ?(i.e. declaration, passing it by reference to fortran)

Its the same thing. You just pass a pointer (machine word containing a memory address) to the routine.

The only thing you have to be careful about, especially in mixed language/platform environments, is that both implementations treat the data the exact same way. If this is not the case then you will have problems.

If the two platforms have different definitions of a float, or if for some reason one compiler added baggage data or optimized away something that it should have, this is what you need to look out for.

Otherwise, it should work perfectly. Obviously if your matrix is not static, then you need to tell the DLL about these parameters, but that is another issue.

Just to help us out, how are you declaring the matrix in C++?
 
  • #5
I got the answer for one of my questions.
It seems that Intel's hyper-threading technology offers an improvement of up to 30% in execution times for multi-threaded applications. So I think that's the reason I don't get such a performance boost. I tested the application on a multicore CPU and the improvement is obvious something around twice as fast.
As for the problem regarding how I pass the matrix, it's still unsolved. Fortran stores differently matrices so that might be a problem but that's to be taken into account only after I successfully pass the matrix, which I don't (yet).
Just to help us out, how are you declaring the matrix in C++?
Code:
double **my_matrix
Obviously if your matrix is not static, then you need to tell the DLL about these parameters
That's taken care of.
Code:
!DEC$ ATTRIBUTES reference :: my_matrix
 
  • #6
ozzimandias said:
I got the answer for one of my questions.
It seems that Intel's hyper-threading technology offers an improvement of up to 30% in execution times for multi-threaded applications. So I think that's the reason I don't get such a performance boost. I tested the application on a multicore CPU and the improvement is obvious something around twice as fast.
As for the problem regarding how I pass the matrix, it's still unsolved. Fortran stores differently matrices so that might be a problem but that's to be taken into account only after I successfully pass the matrix, which I don't (yet).

Code:
double **my_matrix

That's taken care of.
Code:
!DEC$ ATTRIBUTES reference :: my_matrix

Hey ozzimandias.

It's been a while since I've done any serious programming, but what I would recommend you do is to do a simple loop that prints out the memory address of each cell.

This should help you figure out the exact layout in memory of each cell, so you can gaurantee the structure that is fed into your fortran dll.

What I think is that the way it is stored, it stores each row continguosly in memory but I could be wrong. The best way to clarify this is to print the actual memory address of each cell: I don't want to generalize my experience because I could be wrong.

Also chances are the the float data structure is the same because most compilers use the native CPU commands and regardless of whether it was compiled from C or Fortran, the data structure for floats and doubles will be the same. Chances are if the size of the float is the same between platforms, the structure and the representation itself will also be the same.

This isn't gauranteed of course, but its common simply for the fact that compilers would rather use native CPU instructions than something that is say intermediate and slower.
 

Related to OpenMP and Fortran90/C++ problems

1. What is OpenMP and how is it used in Fortran90 and C++?

OpenMP (Open Multi-Processing) is an application programming interface (API) that allows for shared memory multiprocessing in a parallel programming environment. It is used in Fortran90 and C++ to easily parallelize code for efficient execution on multi-core processors.

2. What are some common problems encountered when using OpenMP with Fortran90 and C++?

Some common problems include race conditions (where multiple threads are competing for the same shared resource), deadlocks (where threads are waiting for each other to finish), and load imbalances (where one thread is doing more work than others).

3. How can race conditions be avoided when using OpenMP with Fortran90 and C++?

Race conditions can be avoided by using synchronization techniques such as critical sections, atomic operations, and locks to ensure that only one thread can access a shared resource at a time.

4. Can OpenMP be used with other programming languages besides Fortran90 and C++?

Yes, OpenMP is a widely used API and can be used with other languages such as Python, Java, and C. However, the syntax and implementation may differ slightly.

5. Is there a limit to the number of threads that can be used in an OpenMP parallel region?

Yes, there is a limit depending on the hardware and operating system being used. This limit can be adjusted using the OMP_NUM_THREADS environment variable or the omp_set_num_threads() function.

Similar threads

  • Programming and Computer Science
Replies
8
Views
2K
  • Programming and Computer Science
Replies
14
Views
2K
  • Programming and Computer Science
Replies
8
Views
3K
  • Programming and Computer Science
Replies
5
Views
3K
  • Programming and Computer Science
Replies
11
Views
2K
  • Programming and Computer Science
Replies
5
Views
1K
  • Programming and Computer Science
Replies
11
Views
14K
  • Programming and Computer Science
Replies
2
Views
2K
  • Programming and Computer Science
Replies
5
Views
3K
Back
Top