5.5 Parsing sequences

In the previous chapter we discussed various ways to sequentially apply two or more parsers. In this section we will explain how to repeatedly apply the same parser in order to parse a sequence with an arbitrary number of elements.

5.5.1 The many parser

In regular expressions and many grammar formalisms a Kleene Star marks a parser rule that can be repeatedly applied. For example, number* could represent a sequence of zero or more numbers.

In FParsec the many combinator takes the place of the Kleene Star:

val many: Parser<'a,'u> -> Parser<'a list, 'u>

With many the number example could be translated into the following FParsec code:

let ws = spaces
let number = pint32 .>> ws
> run (many number) "1 2 3 4";;
val it : ParserResult<int32 list,unit> = Success: [1; 2; 3; 4]

The parser many p repeatedly applies the parser p until p fails, i.e. it “greedily” parses as many occurrences of p as possible. The results of p are returned as a list in the order of occurrence.

At the end of a sequence parsed with many p the argument parser p must fail without consuming input (or changing the parser state in any other way). When p fails after consuming input, many p fails with the error returned by p.

The following example illustrates this behaviour:

let ws = spaces
let str s = pstring s
let numberInBrackets = str "[" >>. pint32 .>> str "]" .>> ws

The many numberInBrackets parser successfully parses the first two numbers in this test run:

> run (many numberInBrackets .>> str "(c)") "[1] [2] (c)";;
val it : ParserResult<int32 list,unit> = Success: [1; 2]

However, the same parser fails while trying to parse the 3rd number in this test run:

> run (many numberInBrackets >>. str "[c]") "[1] [2] [c]";;
val it : ParserResult<string,unit> = Failure:
Error in Ln: 1 Col: 10
[1] [2] [c]
         ^
Expecting: integer number (32-bit, signed)

The many parser failed here because the numberInBrackets parser failed after consuming input. In the chapter on looking ahead and backtracking we’ll come back to this example and discuss how you can modify the numberInBrackets parser such that it fails without consuming input if an opening bracket is not followed by a number.[1]

Since many p continues until p fails, you have to be a little careful not to supply an argument parser p that can succeed without consuming input. The following example shows what happens if you accidentally supply such an argument parser:

> run (many (many digit .>> ws)) "123 456";;
System.InvalidOperationException: (Ln: 1, Col: 8): The combinator 'many' was
applied to a parser that succeeds without consuming input and without changing
the parser state in any other way. (If no exception had been raised, the
combinator likely would have entered an infinite loop.)
   (... stack trace ...)
Stopped due to error

The problem here is that many digit .>> ws will succeed without changing the parser state if it can’t parse any digits or trailing whitespace. Thus, if the combined parser hadn’t have thrown an exception, it would have entered an infinite loop at the end of the input.

We can easily avoid the error in the last example by requiring the inner parser to consume at least one digit. Instead of many digit, which succeeds with an empty list if can’t parse any digits, we can use many1 digit, which fails if it can’t parse at least one digit:

> run (many (many1 digit .>> ws)) "123 456";;
val it : ParserResult<char list list,unit> =
  Success: [['1'; '2'; '3']; ['4'; '5'; '6']]

Before we continue, we should point out that an example like many1 digit is somewhat artificial, because you hardly ever want to parse digit chars into a list. If you want to parse numbers, one of the number parsers is usually the best way forward. If you actually need the individual chars, you normally need them as a string, not as a list.

Tip

If you want to parse a sequence of chars, you should generally prefer one of the specialized string parsers.

If you just want to skip over a sequence and don’t need the list of parser results, you can use the optimized combinators skipMany or skipMany1.

5.5.2 sepBy and sepEndBy

Often the elements of a sequence are separated by some separator. A convenient way to parse such sequences are the sepBy and sepEndBy combinators.

sepBy p sep parses a sequence of p separated by sep and returns the results in a list. sepEndBy parses a sequence of p separated and optionally ended by sep.

With these combinators you could for example define the following two parsers for a semicolon‐separated list of numbers in brackets:

let str s = pstring s
let sepList    = between (str "[") (str "]") (sepBy    pint32 (str ";"))
let sepEndList = between (str "[") (str "]") (sepEndBy pint32 (str ";"))

The sepList parser only accepts lists where the semicolons only occur between two numbers:

> run sepList "[]";;
val it : ParserResult<int32 list,unit> = Success: []
> run sepList "[1;2;3]";;
val it : ParserResult<int32 list,unit> = Success: [1; 2; 3]
> run sepList "[1;2;3;]";;
val it : ParserResult<int32 list,unit> = Failure:
Error in Ln: 1 Col: 8
[1;2;3;]
       ^
Expecting: integer number (32-bit, signed)

The sepEndList parser also accepts a terminating semicolon:

> run sepEndList "[1;2;3]";;
val it : ParserResult<int32 list,unit> = Success: [1; 2; 3]
> run sepEndList "[1;2;3;]";;
val it : ParserResult<int32 list,unit> = Success: [1; 2; 3]

Like for the many combinator, there are also variants of the sepBy and sepEndBy parsers that require at least one element in the sequence and/or skip over a sequence without returning the results. Have a look at the parser overview.

5.5.3 Parsing a sequence without creating a list

If you want to parse a sequence and you don’t need the results as an F# list, you can avoid the allocation of a temporary list by defing a custom sequence parser using the inline helper methods Inline.Many and Inline.SepBy.

For example, if you wanted to define a variant of many that parses the elements directly into a ResizeArray, i.e. a System.Collections.Generic.List, you could use the following definition:

let manyRA p =
  // the compiler expands the call to Inline.Many to an optimized sequence parser
  Inline.Many(elementParser = p,
              stateFromFirstElement = (fun x0 ->
                                         let ra = ResizeArray<_>()
                                         ra.Add(x0)
                                         ra),
              foldState = (fun ra x -> ra.Add(x); ra),
              resultFromState = (fun ra -> ra),
              resultForEmptySequence = (fun () -> ResizeArray<_>()))

A test run:

> run (manyRA (pint32 .>> spaces)) "1 2 3";;
val it : ParserResult<System.Collections.Generic.List<int32>,unit> =
  Success: seq [1; 2; 3]

The reference documentation for the Inline class contains some more examples.

Footnotes:
[1]

many doesn’t automatically backtrack when the argument parser fails after changing the parser state for two reasons:

  • In most situations automatic backtracking would only obscure error messages, because the reported input error was indeed severe and backtracking would only trigger secondary error messages that detract from the main error.
  • In the few instances where you rely on backtracking behaviour you can easily introduce it using the combinators detailed in section 5.7. Marking the occasions where you rely on backtracking with these combinators makes your parser implementations easier to debug and optimize.