5.4 Applying parsers in sequence
Now that we have discussed how Parser
functions work, we can start explaining how FParsec’s parser combinators work.
In this chapter we will discuss combinators that allow you to apply multiple parsers in sequence, i.e. parse the beginning of the input with the first parser, then parse the following input with the second parser, and so on.
5.4.1 The definition of >>.
The simplest combinators for sequentially applying two parsers are
val (>>.): Parser<'a,'u> -> Parser<'b,'u> -> Parser<'b,'u> val (.>>): Parser<'a,'u> -> Parser<'b,'u> -> Parser<'a,'u>
Both operators take two parsers as arguments and return a combined parser that applies the two parsers in sequence. As you can infer from the
type signatures, p1 >>. p2
returns the result of p2
and p1 .>> p2
the result of p1
. In each
case the point points to the parser whose result is returned.
In order to explain exactly what it means to apply two parser in sequence, we give a full definition of the >>.
operator:
let (>>.) (p1: Parser<'a,'u>) (p2: Parser<'b,'u>) = fun stream -> let reply1 = p1 stream if reply1.Status = Ok then let stateTag = stream.StateTag let mutable reply2 = p2 stream if stateTag = stream.StateTag then // (1) reply2.Error <- mergeErrors reply1.Error reply2.Error // (2) reply2 else // reconstruct error reply with new result type Reply(reply1.Status, reply1.Error)
The implementation of p1 >>. p2
should be quite self‐explanatory: First p1
is applied to the input stream
. If p1
succeeds, i.e. if the status of reply1
is Ok
, p2
is applied to stream and the reply of p2
then becomes the reply
of the combined parser. However, if p1
fails, its reply is immediately propagated as the
reply of the combined parser. Since reply1
has type Reply<'a,'u>
but p1 >>. p2
needs to return a
Reply<'b,'u>
, the error reply needs to be
reconstructed with a new result type before it can be returned.
5.4.2 Merging error messages
We mentioned earlier that the error messages returned in the Reply.Error
field implicitly refer to the state of the CharStream
at the time the parser returns. In particular the error messages refer to the then current stream
position. Since the messages do not contain themselves a separate record of the error position they can only be interpreted together with
the CharStream
state.
When p2
does not change the parser state, the error messages from both replies refer to the
state of the CharStream
as it is when
p1 >>. p2
returns. Thus, the combinator needs to merge the (immutable) ErrorMessageList
s from both replies, so that the
returned list contains all the relevant error messages (see the line marked with (2)
).
In order to check whether the CharStream
state has changed, the combinator does not compare the full states from before and after p2
is invoked. Instead it only compares the StateTag
values (see line (1)
). This improves performance and — for most
practical purpose — is almost equivalent to comparing the full state, as we will discuss below.
The way (>>.)
handles errors and
merges error messages is a template for all combinators in FParsec that perform multiple sequential parser invocations.
You may wonder why the error messages get merged even though p1
succeeded. The somewhat
counterintuitive reason is that parsers can return nonempty error message lists even when they don’t fail. For example, a parser that skips over
the optional string "str"
will
return Reply(Ok, (), expectedString "str")
if it doesn’t find the string in the input. In this case the error message describes what further input the parser
could have parsed at the current stream position. If subsequently a parser fails at the same position, all error messages for the same position
can be aggregated to give the user as much information as possible about what went wrong and what alternative inputs could have been parsed at
the given position.
The following sample demonstrates the helpful effect of this error handling behaviour:
let str s = pstring s let oneOrTwoInts = str "(" >>. tuple2 pint32 (opt (str "," >>. spaces >>. pint32)) .>> str ")"
> run oneOrTwoInts "(1 2)";; val it : ParserResult<(int32 * int32 option),unit> = Failure: Error in Ln: 1 Col: 3 (1 2) ^ Expecting: ')' or ','
This error message wouldn’t mention the possibility of a missing comma if the .>>
combinator did not merge error messages for
the same position when the left‐hand side parser succeeds.
5.4.3 The StateTag
Parser combinators often need to check whether a parser has changed the CharStream
state. In a typical FParsec application these
checks are performed so frequently that an efficient implementation is important for the overall parser performance. Since a straightforward
comparison of the complete CharStream
states can be quite expensive, the CharStream
class provides a shortcut for this purpose: the StateTag
.
The StateTag
is a
simple integer counter that is incremented every time a CharStream
method changes the state. Thus, if the StateTag
hasn’t changed, you can safely infer
that the state hasn’t changed either.[1] Except for some
special cases, the opposite is also true: if the StateTag
has changed, the state has changed too.
In the following special cases checking whether the StateTag
has changed is not equivalent to
checking whether the CharStream
state
has changed, because the tag may change even though the state doesn’t:
-
A parser calls the basic
Skip
orRead
methods with a 0 offset or an empty argument string. -
A parser seeks the
CharStream
to the current position or replaces the user state with the current value. -
A parser makes several calls to
CharStream
methods and in later calls undoes the changes it made in earlier calls.
The first and second cases only have practical relevance for generic or parameterized parsers and can be simply avoided by checking the
arguments before calling the respective CharStream
methods. The third case only arises in the context of backtracking and it too can be easily avoided,
either by using the BacktrackTo
method for backtracking or by manually restoring the StateTag
after the backtracking.
In practice these special cases are extremely rare, usually without consequences for the parser behaviour and always easily avoidable. Hence,
FParsec combinators make free use of the StateTag
to check whether a parser has changed the CharStream
state.
5.4.4 Generalizing >>.
The parsers p1 .>> p2
and p1 >>. p2
only return the
results of p1
and p2
respectively. If
you want to combine the results from both p1
and p2
, you could use the pipe2
combinator instead:
val pipe2: Parser<'a,'u> -> Parser<'b,'u> -> ('a -> 'b -> 'c) -> Parser<'c,'u>
The parser pipe2 p1 p2 f
will apply p1
and p2
in sequence, exactly like >>.
, but instead of returning one of the result
values of p1
and p2
it will return the
result of the function application f x1 x2
,
where x1
and x2
are the results returned
by p1
and p2
.
There are also pipe3
, pipe4
and pipe5
combinators, in case you need more than two
arguments. Often these combinators are used to pass arguments to object constructors, like in the following example of a parser for a
comma‐separated list of XYZ coordinates:
type Data = Point of float*float*float let ws = spaces let str s = pstring s .>> ws let number = pfloat .>> ws let point = pipe3 number (str "," >>. number) (str "," >>. number) (fun x y z -> Point(x, y, z))
> run point "1, 2, 3";; val it : ParserResult<Data,unit> = Success: Point (1.0,2.0,3.0)
If you just want to return the parsed values as a tuple, you can use the predefined tuple2-5
parsers. For example, tuple2 p1 p2
is equivalent to pipe2 p1 p2 (fun x1 x2 -> (x1, x2)
.
tuple2
is also available under the
operator name .>>.
, so
that you can write p1 .>>. p2
instead of tuple2 p1 p2
.
5.4.5 The >>=
combinator
All the sequencing and piping combinators we have discussed so far could be implemented with the help of the “bind” combinator:
val (>>=): Parser<'a,'u> -> ('a -> Parser<'b,'u>) -> Parser<'b,'u>
Instead of two parsers this combinator takes a parser and a function producing a parser as arguments. The combined parser p >>= f
first applies the parser p
to the input, then it applies the
function f
to the result returned by p
and finally it applies the parser returned by f
to the input. If we knew in advance that
p
returns x
then p >>= f
would be equivalent to p >>. (f x)
.
The >>=
combinator is
quite versatile. For example, the following code implements five of the previously discussed combinators in terms of >>=
and the trivial preturn
primitive:
let preturn x = fun stream -> Reply(x) let (|>>) p f = p >>= fun x -> preturn (f x) let (.>>) p1 p2 = p1 >>= fun x -> p2 >>= fun _ -> preturn x let (>>.) p1 p2 = p1 >>= fun _ -> p2 >>= fun y -> preturn y let (.>>.) p1 p2 = p1 >>= fun x -> p2 >>= fun y -> preturn (x, y) let pipe2 p1 p2 f = p1 >>= fun x -> p2 >>= fun y -> preturn (f x y)
In typical FParsec code >>=
is only seldomly used, because in many situations where >>=
could in principle be used one of the
other specialized operators is more convenient to use and faster. However, on a conceptual level this combinator is important, because its
generality allows us to define and test many combinators through their equivalence with a parser defined in terms of >>=
. This combinator is also significant for
the role it plays in the monadic parser construction syntax, see section 5.10.
[1] |
Of course, this doesn’t apply if you manually set back the StateTag to the old value. There is also the
purely theoretical possibility that the StateTag has overflown and was incremented exactly 264 times (or 232 if you define the SMALL_STATETAG conditional compiler symbol).
|
---|