- #1
steveStevens
- 3
- 0
I am a 3rd year engineering physics student who is in a bit of a pickle. I underestimated the accuracy of a small (~1-5000) N-body simulation in terms of how realistic a dark-matter inclusive galaxy collision simulator would be with it. I now find myself being pointed in the direction of CUDA/OpenCL to be able to resolve these ~N^2 + N computations per frame for an N which will actually give me an accurate result (~200,000-1000000, or so I am told). I am doing N-Body with both CDM and NFW halo density distributions.
I wrote the base program in python, and was planning on using some open source graphics library or other to render everything in simple balls/point particles, but upon trying to punch out a position coordinates matrix, I found it taking FOREVER for anything > ~5,000 particles. I have a strong CPU, (4.4 Ghz i7-950) but clearly that is not enough when it comes to doing 10M + computations per frame. I have two GTX 460's that I'd like to take advantage of ( ~1800 GFLOPs combined).
I have ZERO experience with CUDA/OpenCL. I definitely read through some introductory manuals and explored the included nbody app in the SDK. Needless to say the 10+ C files that comprise the app are quite lengthy and very confusing for someone with no exposure to the CUDA API. I understand the high-level operation of the app and am familiar with algorithmic theory in general, but the syntax is what is holding me back.
Let me state my goals:
Either (1) do the entire computation/rendering in CUDA or (2) use my already written python program with the pyCUDA wrapper to dump the matrix computation to the GPU's and then worry about rendering later.
It is approaching end of term and I fear that this will either cost me a lot of sleepless nights or a nice dent in my GPA : /. If anyone with relevant experience is willing, I would like someone to guide me through this, at least at a high level, over skype or something. I would absolutely be willing to compensate for time spent.
Again, not sure if this is the place to post this, but it seemed like the best option.
I wrote the base program in python, and was planning on using some open source graphics library or other to render everything in simple balls/point particles, but upon trying to punch out a position coordinates matrix, I found it taking FOREVER for anything > ~5,000 particles. I have a strong CPU, (4.4 Ghz i7-950) but clearly that is not enough when it comes to doing 10M + computations per frame. I have two GTX 460's that I'd like to take advantage of ( ~1800 GFLOPs combined).
I have ZERO experience with CUDA/OpenCL. I definitely read through some introductory manuals and explored the included nbody app in the SDK. Needless to say the 10+ C files that comprise the app are quite lengthy and very confusing for someone with no exposure to the CUDA API. I understand the high-level operation of the app and am familiar with algorithmic theory in general, but the syntax is what is holding me back.
Let me state my goals:
Either (1) do the entire computation/rendering in CUDA or (2) use my already written python program with the pyCUDA wrapper to dump the matrix computation to the GPU's and then worry about rendering later.
It is approaching end of term and I fear that this will either cost me a lot of sleepless nights or a nice dent in my GPA : /. If anyone with relevant experience is willing, I would like someone to guide me through this, at least at a high level, over skype or something. I would absolutely be willing to compensate for time spent.
Again, not sure if this is the place to post this, but it seemed like the best option.
Last edited: