The problem
I recent wondered whether Rattler could parse something as nasty as Fortran’s Hollerith constant. It turns out, it can, quite easily.
A hollerith constant is a crazy way of specifying a string literal which includes the length of the string. For example, here’s a Hollerith constant for the string “HELLO”
It breaks down like this:
- “5” - There are five characters in this…
- “H” - … Hollerith constant
- “HELLO” - And they are “HELLO”
If you were trying to design a syntax to befuddle parsers, that’d be it.
The solution
Here’s a parser demonstrating the technique:
The input string contains the suffix “…”, which we can see was not included in the result. It stopped after five characters, just as it should.
How it works
Let’s break down the definition of hollerith.
hollerith <-
~( # Don't include this group in the parse tree
integer ~{count = _} # parse an integer, then set variable
# count to that integer
"H" # Parse an "H"
)
@( # Include this group in the parse tree
# as a single string
. # parse any character
&{(count -= 1) >= 0} # Decrement count. Succeed until
# it goes negative.
)+ # Repeat one or more times
Rattler has passed my acid test for parsers with elegance and ease. I think the next thing to do is to convert the Fortran interpreter from my hand-rolled parser to Rattler and see how it holds up under fire.