Running a for loop in parallel via multiprocessing

well, make it easy... usually means that it's not very obvious what's happening, or why what appears to be happening is not actually happening (or is not what you wanted to happen).
  • #1
member 428835
Hi PF!

I'm wanting to run a function in parallel, which I've denoted integrate below, on line 33.
Python:
from math import *
from random import *
import math, scipy.special
import statistics
import multiprocessing as mp
# READ MATHEAMTICA FUNCTIONS
with open("funcL.txt") as fileL:
    fL = fileL.readlines()
with open("funcR.txt") as fileR:
    fR = fileR.readlines()
# DEFINE MATH REQUIRED
Pi = math.pi
def Sqrt(x):
    return math.sqrt(x)
def Power(a,b):
    return a**b
def BesselJ(z,v):
    return scipy.special.jv(z,v)
def Cosh(x):
    return math.cosh(x)
def Cos(x):
    return math.cos(x)
def Csc(x):
    return 1/math.sin(x)
def Cot(x):
    return 1/math.tan(x)
# DEFINE ALPHA
with open("alpha.txt") as alpha:
    alpha1 = alpha.readlines()
    alpha = eval(alpha1[0])
# MCI INTEGRATION
K = [0] * len(fL)
def integrate(i):
    def funcL(x,y):
        # RETURN INTEGRAND AS FUNCTION
        return (eval(fL[i]))
    def funcR(x,y):
        # RETURN INTEGRAND AS FUNCTION
        return (eval(fR[i]))
    #  DEFINE DOMAINS
    def testRegion(pt):
        return (pt[0] > pt[1])
    def genpoint():
        # GENERATE COORDINATES IN A SQUARE
        x = (1 - math.sin(alpha))*random() + math.sin(alpha)
        y = (1 - math.sin(alpha))*random() + math.sin(alpha)
        return (x,y)
   
    # INITIALIZE
    SumL = 0.0
    SumR = 0.0
    Area = (1 - math.sin(alpha))**2
    NL = 0
    NR = 0
    # PARAMETERS OF MCI
    samp_pts = 100
    int_dist = []
    iterations = 100
    # INTEGRATION
    for _ in range(iterations):
        for _ in range(samp_pts):
            pt = genpoint()
            if testRegion(pt):
                SumR += funcR(pt[0],pt[1])
                NR += 1
            else:
                SumL += funcL(pt[0],pt[1])
                NL += 1
        solL = SumL*Area/NL
        solR = SumR*Area/NR
        sol = solL + solR
        int_dist.append(sol)
   
    K[i] = statistics.mean(int_dist)
    return K[i]
def main():
  pool = mp.Pool(mp.cpu_count())
  result = pool.map(integrate, range(len(fL)) )
if __name__ == "__main__":
    main()
    print(K)
I'd like to run the integrate function over a given range, in this case range(len(fL)) = [0,1,2,3,4,5,6,7,8]. Notice I'm trying to store the results of integrate(i) as K. But on my output, K is seemingly a list of zeros. However, if I add print(K) immediately before line 75 I get the correct value. Something seems to be overwriting K, Any ideas?

I've attached the necessary .txt files so you can run it too.

Thanks so much!
 

Attachments

  • alpha.txt
    9 bytes · Views: 117
  • funcL.txt
    7.6 KB · Views: 108
  • funcR.txt
    7.6 KB · Views: 104
Technology news on Phys.org
  • #2
joshmccraney said:
Something seems to be overwriting K,
Nothing is overwriting K. The results of the overall computation are being stored in "result", and you're not printing that anywhere. (You would have to either print it in "main" or return it from "main" to print where you are printing now.)

Note also that, because you are using multiprocessing, there is not just one "K". There is one "K" for each subprocess you run, that is separate from the K in the master process. So storing anything in the K in each subprocess doesn't do anything at all to the K in the master process, which is the only K you can print from the master process. That's why printing K from the master process is showing you all zeros: that's what you initialized K to in the master process, and nothing else in the master process changes it. When you print K inside the integrate function you are printing it inside the subprocesses, which is why that shows you the actual results.
 
  • Like
Likes bikashdaga, Baluncore and member 428835
  • #3
PeterDonis said:
Nothing is overwriting K. The results of the overall computation are being stored in "result", and you're not printing that anywhere. (You would have to either print it in "main" or return it from "main" to print where you are printing now.)

Note also that, because you are using multiprocessing, there is not just one "K". There is one "K" for each subprocess you run, that is separate from the K in the master process. So storing anything in the K in each subprocess doesn't do anything at all to the K in the master process, which is the only K you can print from the master process. That's why printing K from the master process is showing you all zeros: that's what you initialized K to in the master process, and nothing else in the master process changes it. When you print K inside the integrate function you are printing it inside the subprocesses, which is why that shows you the actual results.
Got it, makes sense! Thanks so much, I was beating my head against a wall here.:headbang:
 
  • #4
joshmccraney said:
I was beating my head against a wall here.:headbang:
You're not the first to have that experience when trying to debug programs that use multiprocessing. :wink: It can be a very useful tool, but it takes a certain mental shift to get in sync with what it's doing, and (unfortunately IMO) the drive to make it so easy to use also makes it harder to make the mental shift, because code that uses it looks just like code that doesn't, so one is led to think that both types of code should work the same. But they don't.
 
  • Like
Likes member 428835
  • #5
PeterDonis said:
You're not the first to have that experience when trying to debug programs that use multiprocessing. :wink: It can be a very useful tool, but it takes a certain mental shift to get in sync with what it's doing, and (unfortunately IMO) the drive to make it so easy to use also makes it harder to make the mental shift, because code that uses it looks just like code that doesn't, so one is led to think that both types of code should work the same. But they don't.
Good advice for me moving forward, and for others who may read this. I appreciate your insight.
 
  • #7
try changing line 32 to:

K = [0 for x in range(len(fL))]
 
  • #8
emacstheviking said:
try changing line 32 to:

K = [0 for x in range(len(fL))]
That will (a) give the same K initialization that's already there, and (b) not fix the problem described in the OP.
 
  • #9
emacstheviking said:
try changing line 32 to:

K = [0 for x in range(len(fL))]
Thanks for the interest. As @PeterDonis alludes to earlier, the solution ##K## is in the main() function, and stored as result. I don't think any code needs to be changed.
 

FAQ: Running a for loop in parallel via multiprocessing

How do I run a for loop in parallel using multiprocessing?

To run a for loop in parallel using multiprocessing, you first need to import the "multiprocessing" module in your code. Then, you can create a "Pool" object and use the "map" method to parallelize the for loop. The "map" method takes in the function you want to run in parallel and the iterable on which you want to run the function.

What is the advantage of using multiprocessing for a for loop?

The main advantage of using multiprocessing for a for loop is that it can significantly speed up the execution time of your code. By running the for loop in parallel, multiple processes can work on different iterations simultaneously, making the overall execution much faster.

How many processes should I use for parallelizing a for loop?

The ideal number of processes for parallelizing a for loop depends on various factors such as the number of iterations in the loop, the complexity of the code, and the resources available on your system. It is recommended to experiment with different numbers and see which one gives the best performance for your specific code.

Can I use multiprocessing for any type of for loop?

Yes, you can use multiprocessing for any type of for loop as long as the iterations are independent of each other. This means that each iteration should not depend on the result of the previous one. If there is a dependency, you can use the "Lock" object from the "multiprocessing" module to ensure that only one process is working on the critical section at a time.

Are there any limitations or drawbacks of using multiprocessing for a for loop?

One limitation of using multiprocessing for a for loop is that it can consume a lot of system resources, especially if the number of processes is high. This can cause other programs to slow down or crash, so it is important to carefully choose the number of processes to use. Additionally, multiprocessing may not always be the most efficient solution, so it is recommended to compare its performance with other methods before using it in your code.

Similar threads

Back
Top