- #1
- 35,001
- 21,672
A colleague has a parallel algorithm problem. There is a similar, more familiar problem that also uses backtracking, so lets discuss that.
The problem is called n-queens. You have an nxn chessbaord and the goal is to place n queens on them so that no queen attacks another: no two queens are on the same row, the same column, or the same diagonal.
The classic solution involves backtracking. Given x queens have been placed, try all possible places for queen x+1 that satisfy the conditions. If you find one, go on to queen x+2. If you don't, go back and try another position for queen x.
The question is, "how do you best parallelize this?"
I have two possible solutions, neither of which I really like.
(1) Imagine you have n threads. The first thread places the queen on square 1, the second on square 2 and so on, and then the n threads continue on from that point. At the end, you collect all the solutions.
The first problem is that this is limited to n-way parallelim. If I am solving the 8 queens problem and my computer has 64 cores, this is not very efficient. "Ah!" you say. "Split this up using the 1st two queens!" This is better - that's n(n-1) threads, or 28 for 8 queens. But that exposes the second problem: load balancing. you might have some threads terminate quickly because they have only a few solutions and others go on and on. So its not very efficient.
(2) This comes about it from the other direction. Suppose I have all but two queens placed. I have two unused rows and two unused columns, so I have two possibilities. One thread tries one, and another tries the others. That's the idea, but for efficiency you want maybe "all but 5 queens" or some other large number.
The issue here is that you spend a lot of time figuring out what the permutations are - the checking of potential solutions goes quickly compared to this. (I have columns 2, 3,5,and 7 open, along with rows 1, 2, 4 and 5. Quick! What is the 11th permutation of queens?) So while the parallel part looks great, the price is a big serial part.
The best I can come up with is a variation on (1) - launch 1000 threads on the 64 cores. Way, way overcommit them and balance the load that way. You still have some long-running threads, but they have less work and balance the load that way.
This feels very brute force-ish. Can someone do better?
The problem is called n-queens. You have an nxn chessbaord and the goal is to place n queens on them so that no queen attacks another: no two queens are on the same row, the same column, or the same diagonal.
The classic solution involves backtracking. Given x queens have been placed, try all possible places for queen x+1 that satisfy the conditions. If you find one, go on to queen x+2. If you don't, go back and try another position for queen x.
The question is, "how do you best parallelize this?"
I have two possible solutions, neither of which I really like.
(1) Imagine you have n threads. The first thread places the queen on square 1, the second on square 2 and so on, and then the n threads continue on from that point. At the end, you collect all the solutions.
The first problem is that this is limited to n-way parallelim. If I am solving the 8 queens problem and my computer has 64 cores, this is not very efficient. "Ah!" you say. "Split this up using the 1st two queens!" This is better - that's n(n-1) threads, or 28 for 8 queens. But that exposes the second problem: load balancing. you might have some threads terminate quickly because they have only a few solutions and others go on and on. So its not very efficient.
(2) This comes about it from the other direction. Suppose I have all but two queens placed. I have two unused rows and two unused columns, so I have two possibilities. One thread tries one, and another tries the others. That's the idea, but for efficiency you want maybe "all but 5 queens" or some other large number.
The issue here is that you spend a lot of time figuring out what the permutations are - the checking of potential solutions goes quickly compared to this. (I have columns 2, 3,5,and 7 open, along with rows 1, 2, 4 and 5. Quick! What is the 11th permutation of queens?) So while the parallel part looks great, the price is a big serial part.
The best I can come up with is a variation on (1) - launch 1000 threads on the 64 cores. Way, way overcommit them and balance the load that way. You still have some long-running threads, but they have less work and balance the load that way.
This feels very brute force-ish. Can someone do better?