How Can I Parse Non-Uniform Substrings in VBA Excel?

Saladsamurai · Aug 9, 2009

This is VBA Excel:

Here is what I am trying to do. I have a MathML file saved as a .txt file. It is simialr to XML.

From the XML file, we have a bunch of text that looks something like:

<mname>Salad</mname><mrow>xyz</mrow>

I would ultimately like to have an array whose elements are the substrings:

Array(1) = <mname>
Array(2) = Salad
Array(3) = </mname>
...So far what I have done is this

Load the entire text file into one continuous string.

I have then fed the string into an array called ElementaryArray such that each individual character of the giant string is an element of the array.

Now I would like to sweep through the array, element by element, and determine the start and end positions of each substring.

If it were simply a bunch of substrings like <mmm><mmm><rrr><rrr><ooo><ooo> it would be easy enough. I could simply find the first "<" and then find its corresponding closing ">" and then restart the loop at the ">" position.

The problem is that not all of the strings start and end with the "<" & ">" characters.

I need a way to determine if the character after a ">" is another "<" or not. And then if it is not another "<" I must mark that character's position and then find the next occurrence of "<" which will be the end position (+1) of the substring that does not start with a "<".Does that all make sense

The tricky part is relating the two different cases via the counter such that I do not get any overlap.

Any ideas?

Saladsamurai · Aug 9, 2009

I am of course also open to suggestions of alternative approaches to this problem.

mXSCNT · Aug 9, 2009

Don't try to hack up a parser by yourself. Use an XML parser library (MathML is XML).

Saladsamurai · Aug 9, 2009

mXSCNT said:

Don't try to hack up a parser by yourself. Use an XML parser library (MathML is XML).

Is that your suggestion? Because that was not what I had in mind when I said I was open to suggestions.

Also, what does that mean?

mXSCNT · Aug 9, 2009

What does what mean? Maybe a couple links will help
http://en.wikipedia.org/wiki/Parsing
http://msdn.microsoft.com/en-us/library/aa163921(office.10).aspx

How Can I Parse Non-Uniform Substrings in VBA Excel?

FAQ: How Can I Parse Non-Uniform Substrings in VBA Excel?

What is a program's logic?

Why is it important to have a clear logic in a program?

How can I improve the logic of my program?

What are some common errors in program logic?

How can I troubleshoot and fix issues with the logic of my program?

Similar threads

Hot Threads

Recent Insights