[FORTRAN] How did I screw up my code?

In summary, the conversation is about a bug in the code that has been persisting for two weeks. The code is related to electron density parameters and uses parallel processing. The code contains parameters for solar mass, proton mass, plank mass, neutron star mass, entropy per baryon, neutrinosphere radius and temperature, electron fraction, and electron and baryon densities. The code also has a subroutine for initializing the mixing angles and masses, and another subroutine for initializing the electron density. The electron density can be read from a data file and is converted from ln(n_e*km^3) to ln(n_e*cm^3) using a spline.
  • #36
Could it be possible to see the code where you call varelec_init?
 
Technology news on Phys.org
  • #37
Here's a link to a man page on gdb, the debugger that jedishrfu suggested. http://man.yolinux.com/cgi-bin/man2html?cgi_command=gdb&cgi_section=&cgi_keyword=m
The page here says that "support for Fortran will be added when a GNU Fortran compiler is ready." The page was written in 2002, and I'm pretty sure that gfortran is a GNU Fortran compiler.

The best advice we can give you is to jump right in with some debugger -- pretty much any debugger. It looks pretty straightforward to set breakpoints - set one in any function that calculates a probability and see why the flaky numbers are showing up.
 
  • #38
And here's a tutorial to go along with Mark44's man page:

http://people.sc.fsu.edu/~jburkardt/f_src/gdb/gdb.html

However, in your case, I suspect you'll need logging of intermediate results so you can track through the code and verify that things are working as intended until you find the source of your error.

I recently had a similar problem with some Java code where my new version of a program's output didn't match the old version. I could swear I got it right and dreaded having to add extensive logging in both old and new programs to see where things varied and then I saw the problem. When mine launched it didn't get the last bearing value and so assumed zero degrees. Basically the programs were looking in different directions only.
 
  • #39
DrClaude said:
Could it be possible to see the code where you call varelec_init?

varelec_init is called in the main program, which is several thousand lines long. Do you want to see the call line itself, or like the whole code? o.o

To be clear, I can leave in the call to varelec_init, and as long as I comment out the 5 lines which actually read in ryegrid and yegrid inside varelec_init, the bug will disappear.

@Mark44 & @jedishrfu Thanks for the advice, I will take a look and try.
 
  • #40
Matterwave said:
varelec_init is called in the main program, which is several thousand lines long.
Not a reflection on you, Matterwave, but whoever wrote this code seems to have been completely unaware of one of the main principles of good programming practice - modularization, breaking a large program into smaller, more easily manageable chunks.
 
  • Like
Likes jedishrfu
  • #41
Mark44 said:
Not a reflection on you, Matterwave, but whoever wrote this code seems to have been completely unaware of one of the main principles of good programming practice - modularization, breaking a large program into smaller, more easily manageable chunks.

I mean, the code is comprised of 6-8 modules too...they are all a bit long XD
 
  • #42
Oh, I've been giving you guys the wrong information up to now...the bug is still there even if the read statements for ryegrid and yegrid are commented out. The only way I can remove the bug is to comment out the call to varelec_init totally. I thought I had commented that back in when I did the test runs with the ryegrid and yegrid read statements commented out...-.- Sorry. So the problem seems to be coming from simply calling varelec_init.

The call statement looks like this:

Fortran:
call varelec_init(varelecfrac,elecprofile,effilename,elecfracname)

I don't think I got this wrong...hopefully.
 
  • #43
Matterwave said:
So the problem seems to be coming from simply calling varelec_init.

The call statement looks like this:

Fortran:
call varelec_init(varelecfrac,elecprofile,effilename,elecfracname)
Can you show the lines where varelecfrac,elecprofile,effilename,elecfracname are declared in the part of the code where this call is?
 
  • #44
DrClaude said:
Can you show the lines where varelecfrac,elecprofile,effilename,elecfracname are declared in the part of the code where this call is?

Certainly. This stuff is read in from a file, the code there looks like this:

Fortran:
if(myid.eq.0) then
  read(5,*) irestart,isconsist,iprofile,spinco,varelecfrac,rhofilename,effilename
  read(5,*) dm21sq,dm32sq,theta12,theta13,theta23
  read(5,*) rsrad,drsinit,dr0,rfinal
  read(5,*) lum(1:nflavor,1:2)
  read(5,*) errtol,drwrite,itwrite
  read(5,*) drwvwrite,itwvwrite
  read(5,*) timemax
  read(5,*) ranrhoescale,ranrhodrscale

  dm21sq=dm21sq*1.d-12
  dm32sq=dm32sq*1.d-12

  do iphi=1,nphi
  do idr=1,nranrhodr
  ranrhoe(idr,iphi)=1.d0+ranrhoescale*rannyu(0)
  enddo
  enddo

!  convert maximum time to hours
  timemax(:)=timemax(:)*3600.d0
  endif

I didn't read in elecprofile because that's only used if varelecfrac=1...it is uninitialized. Maybe that's a problem? But it's not used right now anyways...elecfracname is just passed to the subroutine as a variable and it will be declared in the subroutine itself if certain conditionals are met (which they are not right now). I'm actually...not sure why this is... it was this way with helec_init, so I just kept the same structure as that chunk of code.

Uhm, some updates: editing out this whole chunk of code:

Fortran:
!  read data file
  iend=0
  iegrid=0
  do while (iend.eq.0)
  iegrid=iegrid+1
  read(21,*,iostat=ierr) ryegrid(iegrid), yegrid(iegrid)
  if(ierr.ne.0) then
  iend=1
  iegrid=iegrid-1
  close(21)
  endif
  enddo

  write(6,*) ' Electron fraction grid contains ',iegrid,'values'
  write(6,*) ' Beginning of Y_e grid '
  do i=1,min(5,iegrid)
  write(6,'(f15.7,e18.9)') ryegrid(i),yegrid(i)
  enddo
  write(6,*) ' End of Y_e grid '
  do i=max(1,iegrid-4),iegrid
  write(6,'(f15.7,e18.9)') ryegrid(i),yegrid(i)
  enddo

Does remove the bug... so it seems the bug is in these lines...maybe...I will keep testing.
 
  • #45
I was asking for the part of the code where the variables are declared, meaning where their type is defined.

Your bug is reminiscent of out-of-bounds array assignments. It seems that part of the memory is being modified when it shouldn't, so I would like to see what which are in the call varelec_init are actually like.
 
  • #46
DrClaude said:
I was asking for the part of the code where the variables are declared, meaning where their type is defined.

Your bug is reminiscent of out-of-bounds array assignments. It seems that part of the memory is being modified when it shouldn't, so I would like to see what which are in the call varelec_init are actually like.

Ah ok, they are declared in different places. varelecfrac is declared in module params:
Fortran:
integer :: varelecfrac
Elecprofile is declared in the main code (I might move it to params...):
Fortran:
integer :: elecprofile
effilename and elecfrac name are also decalred in the main code:
Fortran:
character*80 helname,rhofilename,effilename,elecfracname
 
  • #47
UPDATE: Ok, something funky is happening with iegrid. I can change the output of the code by changing iegrid from 0 (regular bug) to 30 (new bug) to 292 (works like usual). It must be messing around with helec_init...and this is probably another instance where an implicit variable is screwing me. :( I think that's what's messing the code up. I'm going to go ahead and rename this variable and see what happens.

EDIT UPDATE: Yes...renaming iegrid to iegrid2 has removed the bug...I don't know...what...the program was doing with it though... .____. I thought it was just an index for a do-loop like "i" or "j" that's frequently used throughout the code...
 
Last edited:
  • #48
Matterwave said:
UPDATE: Ok, something funky is happening with iegrid.
I'm not at all surprised.

Looking at the code again for the varelec_init() sub I see the following:
1. The elecprofilef parameter is never used in the subroutine. The question naturally arises, why is it a parameter?
2. A number of variables appear (as if by magic!) in varelec_init(). These are myvid, icomments, iend, iegrid, ryegrid, and yegrid. They are either local variables that aren't declared, or (much worse) variables that are declared in a module. Either way, these aren't good news.
Matterwave said:
I can change the output of the code by changing iegrid from 0 (regular bug) to 30 (new bug) to 292 (works like usual). It must be messing around with helec_init...and this is probably another instance where an implicit variable is screwing me. :( I think that's what's messing the code up. I'm going to go ahead and rename this variable and see what happens.
 
Last edited:
  • #49
iend and iegrid were never declared as far as I can tell, they appear like all other do loop indices out of thin air...I have since declared them at the top of varelec_init. ryegrid and yegrid are declared at the top of the module just like rhogrid,rhobgrid, rgrid, etc., which are used in helec_init, are declared at the top of the module... myid...I have to go look for where that is defined...

Oh wait, iegrid was declared at the top of the module...uh...
 
  • #50
Mark44 said:
2. A number of variables appear (as if by magic!) in varelec_init(). These are myvid, icomments, iend, iegrid, ryegrid, and yegrid.
I see ryegrid and yegrid declarations at the top of the helectron module. I can't find one for myvid. The helectron module uses two other modules, params and parallel, so maybe myvid is declared in one of these.
 
  • #51
It might be useful to run a fortran lint command against your source code to see what other odd things lurk therein.

Some fortran tools to check out:

http://www.dmoz.org/Computers/Programming/Languages/Fortran/Tools/Code_Analysis/
 
  • #52
Ok, after extensive testing, changing iegrid to iegrid2 (having declared it at the top of varelec_init) has solved the bug issue it seems!

myid was declared in the module parallel...I suppose it has something to do with parallelization of the code. I have not had to work with multiple processors as of yet.
 
Last edited:
  • #53
Working with multiple processors and/or threads can be very tricky, make sure you read up on it and look at simple examples and learn about ways to synchronize your code within a fortran environment. I found these examples:

http://people.sc.fsu.edu/~jburkardt/f_src/openmp/openmp.html

and this:

https://computing.llnl.gov/tutorials/parallel_comp/
 
  • #54
Just a thought, Matterwave. Maybe you could put all of your code in github so that anyone interested could look at the whole code. At least temporarily (it's pretty easy to delete the github repository). In the readme file you could put an example of the input and the expected output.
If you're interested let us know and we'll give you the commands to enter in the terminal.
 
  • #55
fluidistic said:
Just a thought, Matterwave. Maybe you could put all of your code in github so that anyone interested could look at the whole code. At least temporarily (it's pretty easy to delete the github repository). In the readme file you could put an example of the input and the expected output.
If you're interested let us know and we'll give you the commands to enter in the terminal.

This should be checked with his supervisor as some colleges and universities look dimly on posting proprietary code online.
 

Similar threads

Replies
16
Views
2K
Replies
6
Views
3K
Replies
1
Views
2K
Replies
2
Views
2K
Replies
6
Views
5K
Replies
2
Views
11K
Replies
2
Views
4K
Back
Top