5.7 Looking ahead and backtracking
5.7.1 Backtracking
Sometimes you need more than the default one token look‐ahead of <|>
, either because it really can’t be
avoided or because avoiding it would be too inconvenient. In those instances you can use one of the combinators attempt
, >>?
, .>>?
or >>=?
to force a parser to backtrack after
an error.
The attempt
combinator
val attempt: Parser<'a,'u> -> Parser<'a,'u>
takes a parser as the argument and returns a wrapped parser that behaves exactly like the argument, except that if the argument parser fails with an output state different from the input state or with a fatal error, the wrapped parser will backtrack to the original input state and report a non‐fatal error.
You can observe the effect of the attempt
combinator in the following error message:
> run (attempt (pstring "a" >>. pstring "b")) "ac";; val it : ParserResult<string,unit> = Failure: Error in Ln: 1 Col: 1 ac ^ The parser backtracked after: Error in Ln: 1 Col: 2 ac ^ Expecting: 'b'
The next example demonstrates the effect of attempt
on the choice combinator.
Without attempt
the following
test produces an error:
> run (ab <|> ac) "ac";; val it : ParserResult<(string * string),unit> = Failure: Error in Ln: 1 Col: 2 ac ^ Expecting: 'b'
By introducing attempt
we allow
the <|>
combinator to
recover from the error in the first branch:
> run ((attempt ab) <|> ac) "ac";; val it : ParserResult<(string * string),unit> = Success: ("a", "c")
Sometimes it can be a disadvantage that attempt
will trigger backtracking after any error returned by the argument parser, no matter how much content the
parser has consumed. Consider for example a parser like prefix >>. expr
, where expr
is a parser for a potentially large and deeply nested expression. If you wrap this parser
with attempt
then the wrapped
parser will not only backtrack if an error occurs within the prefix or directly after the prefix, but also if it occurs anywhere in the
expression. However, in most cases you only want the parser to backtrack if the error occurs directly after the prefix, not if the error occurs
deeply inside the expression parser. For situations like this FParsec defines the >>?
, .>>?
, .>>.?
and >>=?
operators.
The >>?
combinator
val (>>?): Parser<'a,'u> -> Parser<'b,'u> -> Parser<'b,'u>
behaves like the >>.
operator, except that p1 >>? p2
will backtrack to the beginning if p2
fails with a non‐fatal error and without changing the parser state, even if p1
has changed the parser state. Similarly, .>>?
, .>>.?
and >>=?
behave like .>>
, .>>.
and >>=
, except that they will backtrack to the
beginning if the second parser fails with a non‐fatal error and without changing the parser state
The following tests illustrate the differences between backtracking implemented via attempt
and .>>.?
.
A test with attempt
on the left
side of <|>
:
> run ((attempt (str "a" .>>. bInBrackets)) <|> ac) "a[B]";; val it : ParserResult<(string * string),unit> = Failure: Error in Ln: 1 Col: 2 a[B] ^ Expecting: 'c'
A test with attempt
on both sides
of <|>
:
> run ((attempt (str "a" .>>. bInBrackets)) <|> attempt ac) "a[B]";; val it : ParserResult<(string * string),unit> = Failure: Error in Ln: 1 Col: 1 a[B] ^ The parser backtracked after: Error in Ln: 1 Col: 2 a[B] ^ Expecting: 'c' The parser backtracked after: Error in Ln: 1 Col: 3 a[B] ^ Expecting: 'b'
A test with .>>.?
instead of attempt
on the left
side of <|>
:
> run (str "a" .>>.? bInBrackets <|> ac) "a[B]";; val it : ParserResult<(string * string),unit> = Failure: Error in Ln: 1 Col: 3 a[B] ^ Expecting: 'b'
You can of course chain multiple of the >>?
and .>>?
operators to backtrack longer distances, like in prefix1 >>? prefix2 >>? p .>>? postfix
.
When implementing backtracking parsers you should generally prefer the >>?
, .>>?
and .>>.?
combinators to the attempt
combinator, because the
former combinators offer finer control over the exact backtracking behaviour and hence will often lead to better error reporting. Note
however that neither can completely replace the other.
Backtracking combinators can also be useful when parsing sequences. In the chapter “Parsing sequences” we briefly discussed the following example:
let ws = spaces let str s = pstring s let numberInBrackets = str "[" >>. pint32 .>> str "]" .>> ws
> run (many numberInBrackets >>. str "[c]") "[1] [2] [c]";; val it : ParserResult<string,unit> = Failure: Error in Ln: 1 Col: 10 [1] [2] [c] ^ Expecting: integer number (32-bit, signed)
The problem here is that the argument parser to many
fails after consuming input if it encounters a bracket that is not followed by a digit. If we decided that
this is a defect of the parser as opposed to the grammar, we could fix it by simply replacing a >>.
with >>?
.
let numberInBrackets = str "[" >>? pint32 .>> str "]" .>> ws
> run (many numberInBrackets .>> str "[c]") "[1] [2] [c]";; val it : ParserResult<int32 list,unit> = Success: [1; 2]
A similar example is the sepEndBy1
combinator for parsing a sequence of one or more elements separated and optionally ended by a separator.
If FParsec didn’t provide this combinator, you could define it yourself using many
and >>?
:
let sepEndBy1_ p sep = pipe2 p (many (sep >>? p)) (fun hd tl -> hd::tl) .>> opt sep
The following tests show that our sepEndBy1
replacement works as expected:
> run (sepEndBy1_ pint32 (str ";")) "1;2;3";; val it : ParserResult<int32 list,unit> = Success: [1; 2; 3] > run (sepEndBy1_ pint32 (str ";")) "1;2;3;";; val it : ParserResult<int32 list,unit> = Success: [1; 2; 3]
Note however that in contrast to sepEndBy1_
the version of sepEndBy1
provided by FParsec doesn’t need to parse the
separator twice when it terminates a sequence.
5.7.2 Parser predicates
The backtracking combinators allow you to “look ahead” by tentatively parsing input and then backtracking if an error occurs. However, they don’t allow you to conditionally parse the input with one parser depending on the success or failure of another parser. This is what the following two combinators are for:
val followedBy: Parser<'a,'u> -> Parser<unit,'u> val notFollowedBy: Parser<'a,'u> -> Parser<unit,'u>
The parser followedBy p
(notFollowedBy p
) succeeds without changing the parser state if p
succeeds (fails) when applied at the current position.
For example, both the following parser definitions only parse positive integer literals without a leeding zero:
let p1 = followedBy (satisfy ((<>) '0')) >>. pint32 let p2 = notFollowedBy (pstring "0") >>. pint32
Both definitions will correctly parse "123"
and fail to parse "01"
:
> run p1 "123";; val it : ParserResult<int32,unit> = Success: 123 > run p1 "01";; val it : ParserResult<int32,unit> = Failure: Error in Ln: 1 Col: 1 01 ^ Unknown Error(s) > run p2 "123";; val it : ParserResult<int32,unit> = Success: 123 > run p2 "01";; val it : ParserResult<int32,unit> = Failure: Error in Ln: 1 Col: 1 01 ^ Unknown Error(s)
While both parsers work as expected, the generated error messages aren’t very helpful. The problem is that followedBy
and notFollowedBy
can’t generate better error messages,
because they don’t know what kind of input their argument parsers accept.[1] To improve the error messages you can either use the “labeled” combinator variants
followedByL
and notFollowedByL
or you could use
the labelling operator <?>
that we will discuss in the next chapter.
For example:
> run (followedByL (satisfy ((<>) '0')) "positive int w/o leading 0" >>. pint32) "01";; val it : ParserResult<int32,unit> = Failure: Error in Ln: 1 Col: 1 01 ^ Expecting: positive int w/o leading 0 > run (followedBy (satisfy ((<>) '0')) >>. pint32 <?> "positive int w/o leading 0") "01";; val it : ParserResult<int32,unit> = Failure: Error in Ln: 1 Col: 1 01 ^ Expecting: positive int w/o leading 0 > run (notFollowedByL (pstring "0") "'0'" >>. pint32) "01";; val it : ParserResult<int32,unit> = Failure: Error in Ln: 1 Col: 1 01 ^ Unexpected: '0'
The parser notFollowedByL (pstring "0") "'0'"
from the last example could actually be simplified to notFollowedByString "0"
, which uses the specialized parser predicate notFollowedByString
. In table 6.1.9 you’ll find an overview of all available parser
predicates.
A frequent application for the notFollowedBy
predicate are sequence parsers similar to many (notFollowedBy pEnd >>. p) .>> pEnd
. If you are writing such a parser, you should check whether you can replace it with an application of one of the
manyTill
parsers.
Please consult the reference for more details.
Before we conclude this chapter we want to emphasize that you’re not limited to the built‐in (backtracking) combinators of FParsec. A great advantage of FParsec is the simplicity with which you can write custom combinators using the low‐level API.
For example, you could define a combinator that backtracks if the result of the argument parser doesn’t satisfy a predicate function:
let resultSatisfies predicate msg (p: Parser<_,_>) : Parser<_,_> = let error = messageError msg fun stream -> let state = stream.State let reply = p stream if reply.Status <> Ok || predicate reply.Result then reply else stream.BacktrackTo(state) // backtrack to beginning Reply(Error, error)
With this combinator you could conveniently define a parser for positive ints:
[1] |
In the case of notFollowedBy p the problem is clear: notFollowedBy p fails if p succeeds and
when p succeeds, p doesn’t generate an
error message that notFollowedBy could reuse. In the case of followedBy p the situation is
different: followedBy p fails if p fails, so followedBy could try to reuse the error messages
generated by p . However, the error messages generated by the argument parser will in
practice often not suffice to explain what kind of input is expected. So, for reasons of consistency and performance, followedBy doesn’t even try to reuse the error
messages generated by the argument parser.
|
---|