5.6 Parsing alternatives

FParsec’s main operator for trying to parse input with alternative parsers is

val (<|>): Parser<'a,'u> -> Parser<'a,'u> -> Parser<'a,'u>

This operator implements a form of prioritized choice: it only tries to parse input with the second parser if the first parser fails.

The following example illustrates this behaviour:

type Char = AsciiChar of char
          | Char of char

let asciiLetter = asciiLetter |>> AsciiChar
let letter = letter |>> Char
> run (asciiLetter <|> letter) "a";;
val it : ParserResult<Char,unit> = Success: AsciiChar 'a'
> run (letter <|> asciiLetter) "a";;
val it : ParserResult<Char,unit> = Success: Char 'a'
> run (asciiLetter <|> letter) "ä";;
val it : ParserResult<Char,unit> = Success: Char 'ä'

The prioritized choice also implies that FParsec doesn’t enforce a longest‐match rule like in regular expressions:

> run (pstring "a" <|> pstring "ab") "ab";;
val it : ParserResult<string,unit> = Success: "a"

If you want to accept more than two alternatives, you can either chain multiple <|> operators, like in p1 <|> p2 <|> p3, or you can use the choice combinator, which accepts a sequence of parsers as the argument, like in choice [p1; p2; p3]. In both cases the argument parsers are tried from left to right until a parser succeeds.

A good understanding of the <|> operator is important for productively working with FParsec, so let’s have a look at its implementation:

let (<|>) (p1: Parser<'a,'u>) (p2: Parser<'a,'u>) : Parser<'a,'u> =
    fun stream ->
        let stateTag = stream.StateTag
        let mutable reply = p1 stream
        if reply.Status = Error && stateTag = stream.StateTag then
            let error1 = reply.Error
            reply <- p2 stream
            if stateTag = stream.StateTag then
                reply.Error <- mergeErrors reply.Error error1
        reply

As you can see, the parser p1 <|> p2 works as follows: First, it applies the parser p1 to the input stream. If p1 succeeds, the reply of p1 is returned. If p1 fails with a non‐fatal error (i.e. with the status Error, not FatalError) and without changing the parser state, the parser p2 is applied. If p2 does not change the parser state, the error messages from both parsers are merged. (We compare the StateTag values instead of the actual parser states for optimization reasons, see section 5.4.3.)

The most important point to note here is that p1 <|> p2 will always return with the reply of p1 if p1 changes the parser state, even if p1 eventually fails. Remember that the stream position is part of the parser state, so if p1 fails after consuming input, p2 will not be applied. Since a parser usually consumes input as soon as it can accept at least one atomic token from the input, this means that p1 <|> p2 by default implements backtracking with only a “one token look‐ahead”.

Consider the following example:

let parserA = spaces >>. pstring "a"
let parserB = spaces >>. pstring "b"
run (parserA <|> parserB) " b";;
> run (parserA <|> parserB) " b";;
val it : ParserResult<string,unit> = Failure:
Error in Ln: 1 Col: 2
 b
 ^
Expecting: 'a'

The combined parser fails because parserA fails after consuming the whitespace, so that parserB never gets tried.

Of course, this simple parser could be easily fixed by factoring out the common prefix:

> run (spaces >>. (pstring "a" <|> pstring "b")) " b";;
val it : ParserResult<string,unit> = Success: "b"

The restriction of the look‐ahead in p1 <|> p2 may strike you as odd at first, but it has two big advantages:

  1. The error reporting is simplified and error messages are easier to understand because terminal errors can only occur at one position at a time.
  2. Parser developers are guided towards more efficient grammar implementations because parsers requiring more than a one token look‐ahead need to be explicitly annotated with the attempt or >>? combinators (see the next chapter).[1]
Footnotes:
[1] In case you’re wondering: No, we’re not trying to sell a design limitation as a feature here. In Parsec, the Haskell library on which FParsec’s design was originally based, the limited look‐ahead is essential for the library design, because it allows Parsec to exploit Haskell’s laziness in order to ensure space efficiency. FParsec has a different implementation in which the limited look‐ahead has no effect on space efficiency. We stick to the limited look‐ahead because we think it’s the appropriate default behaviour for a parser combinator library like FParsec. Now, admittedly, if FParsec could automatically optimize the implementation of a parser in a way that minimized backtracking, e.g. by automatically left‐factoring grammars, then backtracking would be less of a problem and a different default behaviour might become more attractive.