Why Does -march=native Cause Crashes on My Zen2 Chip?

  • Thread starter Vanadium 50
  • Start date
  • #1
Vanadium 50
Staff Emeritus
Science Advisor
Education Advisor
2023 Award
34,230
20,957
I don't understand the march/mtune behavior.

This is Linux GCC, on a Zen2 chip.

Mtune seems to do nothing much. OK, sometimes your code is as tuned as its going to get out of the box. I am compiling with -O3 and -Ofast, so maybe it is so optimized there is little to tune.

Default march works fine. -march=znver1 works fine, but again, no faster. OK, again, maybe there's little to be done. -march=native and -march=znver2 should do the same thing. I guess they do. They both hang at pthread_create.

This is not really a problem - I don't really need to squeeze the last bin of performance out of the code - but it sure seems mysterious.

gcc -v gives

Code:
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap --enable-host-pie --enable-host-bind-now --enable-languages=c,c++,fortran,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugs.almalinux.org/ --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --enable-plugin --enable-initfini-array --without-isl --enable-multilib --with-linker-hash-style=gnu --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --enable-cet --with-tune=generic --with-arch_64=x86-64-v2 --with-arch_32=x86-64 --build=x86_64-redhat-linux --with-build-config=bootstrap-lto --enable-link-serialization=1
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.4.1 20231218 (Red Hat 11.4.1-3) (GCC)
 
Computer science news on Phys.org
  • #2
Vanadium 50 said:
Mtune seems to do nothing much. OK, sometimes your code is as tuned as its going to get out of the box. I am compiling with -O3 and -Ofast, so maybe it is so optimized there is little to tune.
If you leave out those optimization options, does mtune seem to do more?
 
  • #3
I have not played with that. I am more interested in my march crashes. The setting march=native should never generate code that the CPU cannot handle.

The way I got into this rabbit hole was noticing that the block of code that takes the longest has some multiply-and-adds. This was to see if the compiler could speed it up by using FMA instructions. What could be simpler than throwing a compiler switch?
 
Back
Top