5.6 Parsing alternatives
FParsec’s main operator for trying to parse input with alternative parsers is
val (<|>): Parser<'a,'u> -> Parser<'a,'u> -> Parser<'a,'u>
This operator implements a form of prioritized choice: it only tries to parse input with the second parser if the first parser fails.
The following example illustrates this behaviour:
type Char = AsciiChar of char | Char of char let asciiLetter = asciiLetter |>> AsciiChar let letter = letter |>> Char
> run (asciiLetter <|> letter) "a";; val it : ParserResult<Char,unit> = Success: AsciiChar 'a' > run (letter <|> asciiLetter) "a";; val it : ParserResult<Char,unit> = Success: Char 'a' > run (asciiLetter <|> letter) "ä";; val it : ParserResult<Char,unit> = Success: Char 'ä'
The prioritized choice also implies that FParsec doesn’t enforce a longest‐match rule like in regular expressions:
> run (pstring "a" <|> pstring "ab") "ab";; val it : ParserResult<string,unit> = Success: "a"
If you want to accept more than two alternatives, you can either chain multiple <|>
operators, like in p1 <|> p2 <|> p3
, or you can use the choice
combinator, which accepts a sequence of parsers as
the argument, like in choice [p1; p2; p3]
. In both cases the argument parsers are tried from left to right until a parser succeeds.
A good understanding of the <|>
operator is important for productively working with FParsec, so let’s have a look at its implementation:
let (<|>) (p1: Parser<'a,'u>) (p2: Parser<'a,'u>) : Parser<'a,'u> = fun stream -> let stateTag = stream.StateTag let mutable reply = p1 stream if reply.Status = Error && stateTag = stream.StateTag then let error1 = reply.Error reply <- p2 stream if stateTag = stream.StateTag then reply.Error <- mergeErrors reply.Error error1 reply
As you can see, the parser p1 <|> p2
works as follows: First, it applies the parser p1
to the input stream. If p1
succeeds, the reply of p1
is returned. If p1
fails with a non‐fatal
error (i.e. with the status Error
,
not FatalError
) and without
changing the parser state, the parser p2
is applied. If p2
does not change the parser state, the error messages from both parsers are merged. (We compare the StateTag
values instead of
the actual parser states for optimization reasons, see section 5.4.3.)
The most important point to note here is that p1 <|> p2
will always
return with the reply of p1
if p1
changes
the parser state, even if p1
eventually fails. Remember that the stream position is part of
the parser state, so if p1
fails after consuming input, p2
will not be applied. Since a parser usually consumes input as soon as it can accept at least one atomic token from
the input, this means that p1 <|> p2
by default implements backtracking with only a “one token
look‐ahead”.
Consider the following example:
let parserA = spaces >>. pstring "a" let parserB = spaces >>. pstring "b" run (parserA <|> parserB) " b";;
> run (parserA <|> parserB) " b";; val it : ParserResult<string,unit> = Failure: Error in Ln: 1 Col: 2 b ^ Expecting: 'a'
The combined parser fails because parserA
fails after consuming the whitespace, so that parserB
never gets tried.
Of course, this simple parser could be easily fixed by factoring out the common prefix:
> run (spaces >>. (pstring "a" <|> pstring "b")) " b";; val it : ParserResult<string,unit> = Success: "b"
The restriction of the look‐ahead in p1 <|> p2
may strike you
as odd at first, but it has two big advantages:
- The error reporting is simplified and error messages are easier to understand because terminal errors can only occur at one position at a time.
-
Parser developers are guided towards more efficient grammar implementations because parsers requiring more than a one token look‐ahead need to
be explicitly annotated with the
attempt
or>>?
combinators (see the next chapter).[1]
[1] | In case you’re wondering: No, we’re not trying to sell a design limitation as a feature here. In Parsec, the Haskell library on which FParsec’s design was originally based, the limited look‐ahead is essential for the library design, because it allows Parsec to exploit Haskell’s laziness in order to ensure space efficiency. FParsec has a different implementation in which the limited look‐ahead has no effect on space efficiency. We stick to the limited look‐ahead because we think it’s the appropriate default behaviour for a parser combinator library like FParsec. Now, admittedly, if FParsec could automatically optimize the implementation of a parser in a way that minimized backtracking, e.g. by automatically left‐factoring grammars, then backtracking would be less of a problem and a different default behaviour might become more attractive. |
---|