MCNP6 with mpi failed with signal 11 (Segmentation fault)

In summary, MCNP on a UNIX system using MPI can run smoothly with few problems if you reserve servers for it.
  • #1
Albert ZHANG
3
0
I use Python scripts to run mcnp.mpi like
mpirun -np 50 i=inp_...
And I encountered this bug report
Primary job terminated normally,but 1 process returned
a non-zero exit code.Per user-direction,the job has been aborted.
mpirun noticed that process rank 31 with PID 0 on node Ubuntu exited on signal 11 (Segmentation fault).
The scipts has run normally for a few hours. I extracted the inp file and it can be run normally.
I searched on Internet and found it seems to be the problem related to memory, but i checked the log, there's still 100+G available. So I don't really know how to solve the problem
 
Engineering news on Phys.org
  • #2
So much causes mcnp to segfault it's difficult to say. Can you share the input file, or a cut down version of the input file that causes the same error? PID of 0 seems weird, a quick google says that is the paging process. Are you doing any big mesh tallies?

The only other thing I can think of is 50 seems like a lot of copies. How many cores does the machine have?
 
  • Like
Likes Albert ZHANG
  • #3
Alex A said:
So much causes mcnp to segfault it's difficult to say. Can you share the input file, or a cut down version of the input file that causes the same error? PID of 0 seems weird, a quick google says that is the paging process. Are you doing any big mesh tallies?

The only other thing I can think of is 50 seems like a lot of copies. How many cores does the machine have?
Thanks for you reply, Alex. My script updates the input file and it may be not the problem of input file, I searched some other results and find PRDMP may help but I am still not sure. The input file focus on radiation shielding so I didnt do any mesh settings.
I run my mcnp on a server and it has 32 cores and 128 processors.
 
  • #4
Sorry to do the old post thing. Maybe it helps somebody else.

I experienced similar things on a UNIX system running MCNP using MPI. The problem turned out to be because MCNP is not very clever about reserving the resources it needs. Memory and connections between nodes and such. So, if some other process started that took one of those resources, MCNP might look around and fail to get that resource. And this might happen after many particles had been run. For example, when nodes did a synch-up with their data batches, and attempted to start a new batch of particles.

If that's the problem, how you fix it may be dependent on details of your system.

We managed to solve the problem by reserving entire servers for MCNP, and not letting anything else run on those servers. Literally nothing else, not even the sys-admin logging in, was permitted on those servers during our runs. And we had to do some script hacking to make sure that MCNP only ran on our reserved servers and not on any of the rest of the system.
 

FAQ: MCNP6 with mpi failed with signal 11 (Segmentation fault)

What is MCNP6 with mpi failed with signal 11 (Segmentation fault)?

MCNP6 with mpi failed with signal 11 (Segmentation fault) is an error message that appears when the Monte Carlo N-Particle Transport Code (MCNP) version 6, which is used for simulating particle transport in complex systems, fails to run properly due to a segmentation fault. This means that the program attempted to access a memory location that it was not allowed to, resulting in a crash.

What causes MCNP6 with mpi failed with signal 11 (Segmentation fault)?

There are several potential causes for this error, including bugs in the code, incompatible system configurations, or insufficient memory or resources. It can also be caused by user error, such as incorrect input files or parameters.

How can I fix MCNP6 with mpi failed with signal 11 (Segmentation fault)?

The first step in fixing this error is to check for any known bugs or compatibility issues with your system and the version of MCNP6 you are using. If there are no known issues, you can try increasing the available memory or resources for the program. You can also try running the program with different input files or parameters to see if the error persists. If all else fails, you may need to contact the developers for further assistance.

Can I prevent MCNP6 with mpi failed with signal 11 (Segmentation fault) from occurring?

While there is no guaranteed way to prevent this error from occurring, there are some steps you can take to minimize the chances of encountering it. These include regularly updating to the latest version of MCNP6, ensuring compatibility with your system, and carefully checking input files and parameters for errors.

Is there an alternative to using MCNP6 with mpi failed with signal 11 (Segmentation fault)?

Yes, there are alternative programs and codes available for simulating particle transport, such as Geant4 and FLUKA. It may be worth exploring these options if you continue to encounter issues with MCNP6. However, keep in mind that each code has its own strengths and limitations, so it is important to research and choose the best option for your specific needs.

Back
Top