5.4 Applying parsers in sequence

Now that we have discussed how Parser functions work, we can start explaining how FParsec’s parser combinators work.

In this chapter we will discuss combinators that allow you to apply multiple parsers in sequence, i.e. parse the beginning of the input with the first parser, then parse the following input with the second parser, and so on.

5.4.1 The definition of >>.

The simplest combinators for sequentially applying two parsers are

val (>>.):   Parser<'a,'u> -> Parser<'b,'u> -> Parser<'b,'u>
val (.>>):   Parser<'a,'u> -> Parser<'b,'u> -> Parser<'a,'u>

Both operators take two parsers as arguments and return a combined parser that applies the two parsers in sequence. As you can infer from the type signatures, p1 >>. p2 returns the result of p2 and p1 .>> p2 the result of p1. In each case the point points to the parser whose result is returned.

In order to explain exactly what it means to apply two parser in sequence, we give a full definition of the >>. operator:

let (>>.) (p1: Parser<'a,'u>) (p2: Parser<'b,'u>) =
    fun stream ->
        let reply1 = p1 stream
        if reply1.Status = Ok then
            let stateTag = stream.StateTag
            let mutable reply2 = p2 stream
            if stateTag = stream.StateTag then                        // (1)
                reply2.Error <- mergeErrors reply1.Error reply2.Error // (2)
            reply2
        else // reconstruct error reply with new result type
            Reply(reply1.Status, reply1.Error)

The implementation of p1 >>. p2 should be quite self‐explanatory: First p1 is applied to the input stream. If p1 succeeds, i.e. if the status of reply1 is Ok, p2 is applied to stream and the reply of p2 then becomes the reply of the combined parser. However, if p1 fails, its reply is immediately propagated as the reply of the combined parser. Since reply1 has type Reply<'a,'u> but p1 >>. p2 needs to return a Reply<'b,'u>, the error reply needs to be reconstructed with a new result type before it can be returned.

5.4.2 Merging error messages

We mentioned earlier that the error messages returned in the Reply.Error field implicitly refer to the state of the CharStream at the time the parser returns. In particular the error messages refer to the then current stream position. Since the messages do not contain themselves a separate record of the error position they can only be interpreted together with the CharStream state.

When p2 does not change the parser state, the error messages from both replies refer to the state of the CharStream as it is when p1 >>. p2 returns. Thus, the combinator needs to merge the (immutable) ErrorMessageLists from both replies, so that the returned list contains all the relevant error messages (see the line marked with (2)).

In order to check whether the CharStream state has changed, the combinator does not compare the full states from before and after p2 is invoked. Instead it only compares the StateTag values (see line (1)). This improves performance and — for most practical purpose — is almost equivalent to comparing the full state, as we will discuss below.

Note

The way (>>.) handles errors and merges error messages is a template for all combinators in FParsec that perform multiple sequential parser invocations.

You may wonder why the error messages get merged even though p1 succeeded. The somewhat counterintuitive reason is that parsers can return nonempty error message lists even when they don’t fail. For example, a parser that skips over the optional string "str" will return Reply(Ok, (), expectedString "str") if it doesn’t find the string in the input. In this case the error message describes what further input the parser could have parsed at the current stream position. If subsequently a parser fails at the same position, all error messages for the same position can be aggregated to give the user as much information as possible about what went wrong and what alternative inputs could have been parsed at the given position.

The following sample demonstrates the helpful effect of this error handling behaviour:

let str s = pstring s

let oneOrTwoInts =
    str "(" >>. tuple2 pint32 (opt (str "," >>. spaces >>. pint32)) .>> str ")"
> run oneOrTwoInts "(1 2)";;
val it : ParserResult<(int32 * int32 option),unit> = Failure:
Error in Ln: 1 Col: 3
(1 2)
  ^
Expecting: ')' or ','

This error message wouldn’t mention the possibility of a missing comma if the .>> combinator did not merge error messages for the same position when the left‐hand side parser succeeds.

5.4.3 The StateTag

Parser combinators often need to check whether a parser has changed the CharStream state. In a typical FParsec application these checks are performed so frequently that an efficient implementation is important for the overall parser performance. Since a straightforward comparison of the complete CharStream states can be quite expensive, the CharStream class provides a shortcut for this purpose: the StateTag.

The StateTag is a simple integer counter that is incremented every time a CharStream method changes the state. Thus, if the StateTag hasn’t changed, you can safely infer that the state hasn’t changed either.[1] Except for some special cases, the opposite is also true: if the StateTag has changed, the state has changed too.

In the following special cases checking whether the StateTag has changed is not equivalent to checking whether the CharStream state has changed, because the tag may change even though the state doesn’t:

  • A parser calls the basic Skip or Read methods with a 0 offset or an empty argument string.
  • A parser seeks the CharStream to the current position or replaces the user state with the current value.
  • A parser makes several calls to CharStream methods and in later calls undoes the changes it made in earlier calls.

The first and second cases only have practical relevance for generic or parameterized parsers and can be simply avoided by checking the arguments before calling the respective CharStream methods. The third case only arises in the context of backtracking and it too can be easily avoided, either by using the BacktrackTo method for backtracking or by manually restoring the StateTag after the backtracking.

In practice these special cases are extremely rare, usually without consequences for the parser behaviour and always easily avoidable. Hence, FParsec combinators make free use of the StateTag to check whether a parser has changed the CharStream state.

5.4.4 Generalizing >>.

The parsers p1 .>> p2 and p1 >>. p2 only return the results of p1 and p2 respectively. If you want to combine the results from both p1 and p2, you could use the pipe2 combinator instead:

val pipe2: Parser<'a,'u> -> Parser<'b,'u> -> ('a -> 'b -> 'c) -> Parser<'c,'u>

The parser pipe2 p1 p2 f will apply p1 and p2 in sequence, exactly like >>., but instead of returning one of the result values of p1 and p2 it will return the result of the function application f x1 x2, where x1 and x2 are the results returned by p1 and p2.

There are also pipe3, pipe4 and pipe5 combinators, in case you need more than two arguments. Often these combinators are used to pass arguments to object constructors, like in the following example of a parser for a comma‐separated list of XYZ coordinates:

type Data = Point of float*float*float

let ws = spaces
let str s = pstring s .>> ws
let number = pfloat .>> ws
let point = pipe3 number (str "," >>. number) (str "," >>. number)
                  (fun x y z -> Point(x, y, z))
> run point "1, 2, 3";;
val it : ParserResult<Data,unit> = Success: Point (1.0,2.0,3.0)

If you just want to return the parsed values as a tuple, you can use the predefined tuple2-5 parsers. For example, tuple2 p1 p2 is equivalent to pipe2 p1 p2 (fun x1 x2 -> (x1, x2).

tuple2 is also available under the operator name .>>., so that you can write p1 .>>. p2 instead of tuple2 p1 p2.

There is no pipe1 combinator, but there is an operator for the same purpose:

val (|>>): Parser<'a,'u> -> ('a -> 'b) -> Parser<'b, 'u>

This operator is used similarly to the F#’s ubiquitous pipeline operator |>:

type Expression = Number of int
                | Identifier of string

let number = pint32 |>> Number
> run number "123";;
val it : ParserResult<Expression,unit> = Success: Number 123

5.4.5 The >>= combinator

All the sequencing and piping combinators we have discussed so far could be implemented with the help of the “bind” combinator:

val (>>=): Parser<'a,'u> -> ('a -> Parser<'b,'u>) -> Parser<'b,'u>

Instead of two parsers this combinator takes a parser and a function producing a parser as arguments. The combined parser p >>= f first applies the parser p to the input, then it applies the function f to the result returned by p and finally it applies the parser returned by f to the input. If we knew in advance that p returns x then p >>= f would be equivalent to p >>. (f x).

The >>= combinator is quite versatile. For example, the following code implements five of the previously discussed combinators in terms of >>= and the trivial preturn primitive:

let preturn x = fun stream -> Reply(x)

let (|>>) p  f    = p  >>= fun x -> preturn (f x)
let (.>>) p1 p2   = p1 >>= fun x -> p2 >>= fun _ -> preturn x
let (>>.) p1 p2   = p1 >>= fun _ -> p2 >>= fun y -> preturn y
let (.>>.) p1 p2  = p1 >>= fun x -> p2 >>= fun y -> preturn (x, y)
let pipe2 p1 p2 f = p1 >>= fun x -> p2 >>= fun y -> preturn (f x y)

In typical FParsec code >>= is only seldomly used, because in many situations where >>= could in principle be used one of the other specialized operators is more convenient to use and faster. However, on a conceptual level this combinator is important, because its generality allows us to define and test many combinators through their equivalence with a parser defined in terms of >>=. This combinator is also significant for the role it plays in the monadic parser construction syntax, see section 5.10.

Footnotes:
[1] Of course, this doesn’t apply if you manually set back the StateTag to the old value. There is also the purely theoretical possibility that the StateTag has overflown and was incremented exactly 264 times (or 232 if you define the SMALL_STATETAG conditional compiler symbol).