FORTRAN- How to read a non-uniformly formatted text file

shitij · Jun 18, 2012

Hi all !

I am a little new to FORTRAN and I am sorry if the title is confusing but I couldn't come up with anything better.

I have a file of which I am showing a snippet below:

Code:

*>>>>>>>>CHARMM22 All-Hydrogen Topology File for Proteins <<<<<<<
*>>>>>>>>>>>>>>>>>>>> and Nucleic Acids <<<<<<<<<<<<<<<<<<<<<<<<<
*>>>>> Includes phi, psi cross term map (CMAP) correction <<<<<<<
*>>>>>>>>>>>>>>>>>>>>>>   July, 2004    <<<<<<<<<<<<<<<<<<<<<<<<<<
* All comments to ADM jr. via the CHARMM web site: www.charmm.org
*               parameter set discussion forum
*
31  1

! references
!
!PROTEINS
!
!MacKerell, A.D., Jr,. Feig, M., Brooks, C.L., III, Extending the
!treatment of backbone energetics in protein force fields: limitations
!of gas-phase quantum mechanics in reproducing protein conformational
!distributions in molecular dynamics simulations, Journal of
!Computational Chemistry, 25: 1400-1415, 2004.
!
!MacKerell, Jr., A. D.; Bashford, D.; Bellott, M.; Dunbrack Jr., R.L.;
!Evanseck, J.D.; Field, M.J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.;
!Joseph-McCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F.T.K.; Mattos,
!C.; Michnick, S.; Ngo, T.; Nguyen, D.T.; Prodhom, B.; Reiher, III,
!W.E.; Roux, B.; Schlenkrich, M.; Smith, J.C.; Stote, R.; Straub, J.;
!Watanabe, M.; Wiorkiewicz-Kuczera, J.; Yin, D.; Karplus, M.  All-atom
!empirical potential for molecular modeling and dynamics Studies of
!proteins.  Journal of Physical Chemistry B, 1998, 102, 3586-3616.
!
!IONS (see lipid and nucleic acid topology and parameter files for
!additional ions
!
!ZINC
!
!Roland H. Stote and Martin Karplus, Zinc Binding in Proteins and
!Solution: A Simple but Accurate Nonbonded Representation, PROTEINS:
!Structure, Function, and Genetics 23:12-31 (1995)
!
!NUCLEIC ACIDS
!
!Foloppe, N. and MacKerell, Jr., A.D. "All-Atom Empirical Force Field for
!Nucleic Acids: 2) Parameter Optimization Based on Small Molecule and
!Condensed Phase Macromolecular Target Data. 2000, 21: 86-104.
!
!and
!
!MacKerell, Jr., A.D. and Banavali, N. "All-Atom Empirical Force Field for
!Nucleic Acids: 2) Application to Molecular Dynamics Simulations of DNA
!and RNA in Solution. 2000, 21: 105-120.
!

MASS     1 H      1.00800 H ! polar H
MASS     2 HC     1.00800 H ! N-ter H
MASS     3 HA     1.00800 H ! nonpolar H
MASS     4 HT     1.00800 H ! TIPS3P WATER HYDROGEN
.
.
.
MASS    95 F3    18.99800 F ! Fluorine, trifluoro (see toppar_all22_prot_fluoro_alkanes.str)
MASS    99 DUM    0.00000 H ! dummy atom
!see NA section          --------------NOTICE THERE ARE COMMENTS B/W MASS RECORDS ALSO
!MASS   100 SOD  22.989770 NA ! Sodium Ion
!MASS   101 MG   24.305000 MG ! Magnesium Ion
!MASS   102 POT  39.102000 K  ! Potassium Ion! check masses
!MASS   103 CES 132.900000 CS ! Cesium Ion
!MASS   104 CAL  40.080000 CA ! Calcium Ion
!MASS   105 CLA  35.450000 CL ! Chloride Ion
!MASS   106 ZN   65.370000 ZN ! zinc (II) cation
!NA section
!MASS 101    HT    1.008000 H ! TIPS3P WATER HYDROGEN

From this file, I have to read the records starting with "MASS" (which end just before records starting with "DECL") and make a list out of it. I have to ignore all other lines. What is the best way to do this??

My approach:

Code:

        line_code='    '
	do while(line_code .ne. 'DECL')
		read(1000,*) line_code, temp_int1, temp_cname,temp_dbl1  ! ERROR IN THIS LINE, MOST PROBABLY
		if(line_code .eq. 'MASS')then
			atom_type_info(tot_atom_types)%type_code=temp_int1
			atom_type_info(tot_atom_types)%type_cname=temp_cname
			atom_type_info(tot_atom_types)%mass=temp_dbl1
			tot_atom_types=tot_atom_types+1
		endif
			
	enddo

The problem is that in the read(1000,*) line, because of the comments in the file, a string is attempted to be read into an integer, which gives an i/o error.
I thought of another way, but its too much work. (first scan lines with read(1000,*) line_code and keep a line_counter. When you encounter MASS record, REWIND and read(1000,*)some_temp till line_counter-1. Then start reading the MASS record.) All this is because we can't REWIND one line before, we can only REWIND to the beginning of the file (right??)

Is there a better way? Like reading the whole line as a character array (with tabs and spaces), then reading from that array the first string. IF that is "MASS" go on and read from that line (stored as character array) rest of the values, otherwise ignore it. This is easy in C/C++ through getline and sscanf, but is there such a way in FORTRAN??

EDIT:
I found a way to read a line and not advance read pointer to next line through ADVANCE='NO' in READ statement, but now the problem is that when I read everything as a character, and there comes an integer in between, it gives an error.
Even if someone could tell me how to read a file in fortran line by line treating each line as a string (even if it is a number), that also might be a lot of help.

Thank you in advance !

shitij · Jun 18, 2012

Never Mind. Got it.
For someone else, you can read each line of text file as a string in fortran as follows:

do
read(1,'(a)',END=10) line !END tells which statement to go to if all the lines have already been read
enddo
10 !your next statement here

Also, from this line you read as a string, you can read formatted input. Just put the name of string in the place you put 'unit number' in read, as below:
read(line,*) line_code, temp_int,...

gsal · Jun 18, 2012

Yes, the way to do it is to first read the line as a character string and, THEN, do what is called an internal read.

The one trick that allows this, though, is to read the line by specifying a format long enough to accommodate all you are trying to read out of it. Or, you can specify a very long format that always read the entire line.

Because in fortran spaces work as separators, you need to specify a format in order to read a string that is not enclosed in quotes and that should include spaces in itself. And, so, the most important line in the code that follows is:

Code:

read(*,'(A26)') line

The following code does the trick; you can test it by compiling and running from the command line using re-direction (mass < inputfile) :

Code:

program mass
   character line*26
   character code*4, temp_cname*4
   integer temp_int1
   real temp_dbl1   
   
   line(1:4) = '    '
   do while (line(1:4) .ne. 'DECL') 
      read(*,'(A26)') line
      if (line(1:4) .eq. "MASS") then
         read(line,*) code, temp_int1, temp_cname, temp_dbl1
         write(*,*) code, temp_int1, temp_cname, temp_dbl1 ! temporary line
         ! assign read values to pointer
      endif
   enddo
end

The code above declares the variable "line" to be just long enough to read up the last value you are interested in (the double)...you know as far as the number of characters to be read from the line (26); or, you could simply read the entire line every time by declaring "line" to be something like 130 characters long, instead...safer, just in case the format of your input file changes a bit.

shitij · Jun 18, 2012

Thank you for your reply !

My own code is very similar to yours.

Cheers !

LudusRex · Jun 25, 2012

Hi there,

As a scientist familiar with FORTRAN, I can suggest a few approaches to tackle this problem. One way is to use the "FORMAT" statement in your READ statement to specify the format of the line you want to read. For example, you can specify the first string as a character and then the rest as integers or doubles. This way, the comments in between will be ignored. Another option is to use the "ADVANCE='NO'" as you mentioned, but instead of reading everything as a character, you can read the line as a string and then use the "INDEX" function to search for the first occurrence of "MASS" in the string. If it is found, then you can continue with your logic to read the rest of the line.

Alternatively, you can also use the "OPEN" statement with the "FORM='FORMATTED'" option to open the file and then use the "READ" statement with the "ADVANCE='NO'" option to read each line as a string. This way, you can use string manipulation functions to extract the necessary information from the line.

I hope these suggestions help you in solving your problem. Good luck!

FORTRAN- How to read a non-uniformly formatted text file

FAQ: FORTRAN- How to read a non-uniformly formatted text file

What is FORTRAN?

What is a non-uniformly formatted text file?

How do I read a non-uniformly formatted text file in FORTRAN?

Can I use other programming languages to read a non-uniformly formatted text file?

Are there any tips for reading a non-uniformly formatted text file efficiently in FORTRAN?

Similar threads

Hot Threads

Recent Insights