Wayne Conrad's Blog

home

Parsing Fortran's Hollerith constant with Rattler

21 Jun 2014

The problem

I recent wondered whether Rattler could parse something as nasty as Fortran’s Hollerith constant. It turns out, it can, quite easily.

A hollerith constant is a crazy way of specifying a string literal which includes the length of the string. For example, here’s a Hollerith constant for the string “HELLO”

      5HHELLO

It breaks down like this:

If you were trying to design a syntax to befuddle parsers, that’d be it.

The solution

Here’s a parser demonstrating the technique:

#!/usr/bin/env ruby

require "rattler"

class Hollerith < Rattler::Runtime::ExtendedPackratParser
  grammar %{
    hollerith <- ~(integer ~{count = _} "H") @(. &{(count -= 1) >= 0} )+
    integer <- @(DIGIT+) { _.to_i }
  }
end

p Hollerith.parse!("5HHello...")     # "Hello"

The input string contains the suffix “…”, which we can see was not included in the result. It stopped after five characters, just as it should.

How it works

Let’s break down the definition of hollerith.

hollerith <-
  ~(                            # Don't include this group in the parse tree
    integer ~{count = _}        # parse an integer, then set variable
                                #   count to that integer
    "H"                         # Parse an "H"
  )
  @(                            # Include this group in the parse tree
                                #   as a single string
    .                           # parse any character
    &{(count -= 1) >= 0}        # Decrement count.  Succeed until
                                #   it goes negative.
  )+                            # Repeat one or more times

Rattler has passed my acid test for parsers with elegance and ease. I think the next thing to do is to convert the Fortran interpreter from my hand-rolled parser to Rattler and see how it holds up under fire.

comments powered by Disqus