5.7 Looking ahead and backtracking

5.7.1 Backtracking

Sometimes you need more than the default one token look‐ahead of <|>, either because it really can’t be avoided or because avoiding it would be too inconvenient. In those instances you can use one of the combinators attempt, >>?, .>>? or >>=? to force a parser to backtrack after an error.

The attempt combinator

val attempt: Parser<'a,'u> -> Parser<'a,'u>

takes a parser as the argument and returns a wrapped parser that behaves exactly like the argument, except that if the argument parser fails with an output state different from the input state or with a fatal error, the wrapped parser will backtrack to the original input state and report a non‐fatal error.

You can observe the effect of the attempt combinator in the following error message:

> run (attempt (pstring "a" >>. pstring "b")) "ac";;
val it : ParserResult<string,unit> = Failure:
Error in Ln: 1 Col: 1
ac
^

The parser backtracked after:
  Error in Ln: 1 Col: 2
  ac
   ^
  Expecting: 'b'

The next example demonstrates the effect of attempt on the choice combinator.

let str s = pstring s
let ab = str "a" .>>. str "b"
let ac = str "a" .>>. str "c"

Without attempt the following test produces an error:

> run (ab <|> ac) "ac";;
val it : ParserResult<(string * string),unit> = Failure:
Error in Ln: 1 Col: 2
ac
 ^
Expecting: 'b'

By introducing attempt we allow the <|> combinator to recover from the error in the first branch:

> run ((attempt ab) <|> ac) "ac";;
val it : ParserResult<(string * string),unit> = Success: ("a", "c")

Sometimes it can be a disadvantage that attempt will trigger backtracking after any error returned by the argument parser, no matter how much content the parser has consumed. Consider for example a parser like prefix >>. expr, where expr is a parser for a potentially large and deeply nested expression. If you wrap this parser with attempt then the wrapped parser will not only backtrack if an error occurs within the prefix or directly after the prefix, but also if it occurs anywhere in the expression. However, in most cases you only want the parser to backtrack if the error occurs directly after the prefix, not if the error occurs deeply inside the expression parser. For situations like this FParsec defines the >>?, .>>?, .>>.? and >>=? operators.

The >>? combinator

val (>>?):  Parser<'a,'u> -> Parser<'b,'u> -> Parser<'b,'u>

behaves like the >>. operator, except that p1 >>? p2 will backtrack to the beginning if p2 fails with a non‐fatal error and without changing the parser state, even if p1 has changed the parser state. Similarly, .>>?, .>>.? and >>=? behave like .>>, .>>. and >>=, except that they will backtrack to the beginning if the second parser fails with a non‐fatal error and without changing the parser state

The following tests illustrate the differences between backtracking implemented via attempt and .>>.?.

let bInBrackets = str "[" >>. str "b" .>> str "]"

A test with attempt on the left side of <|>:

> run ((attempt (str "a" .>>. bInBrackets)) <|> ac) "a[B]";;
val it : ParserResult<(string * string),unit> = Failure:
Error in Ln: 1 Col: 2
a[B]
 ^
Expecting: 'c'

A test with attempt on both sides of <|>:

> run ((attempt (str "a" .>>. bInBrackets)) <|> attempt ac) "a[B]";;
val it : ParserResult<(string * string),unit> = Failure:
Error in Ln: 1 Col: 1
a[B]
^

The parser backtracked after:
  Error in Ln: 1 Col: 2
  a[B]
   ^
  Expecting: 'c'

The parser backtracked after:
  Error in Ln: 1 Col: 3
  a[B]
    ^
  Expecting: 'b'

A test with .>>.? instead of attempt on the left side of <|>:

> run (str "a" .>>.? bInBrackets <|> ac) "a[B]";;
val it : ParserResult<(string * string),unit> = Failure:
Error in Ln: 1 Col: 3
a[B]
  ^
Expecting: 'b'

You can of course chain multiple of the >>? and .>>? operators to backtrack longer distances, like in prefix1 >>? prefix2 >>? p .>>? postfix.

Note

When implementing backtracking parsers you should generally prefer the >>?, .>>? and .>>.? combinators to the attempt combinator, because the former combinators offer finer control over the exact backtracking behaviour and hence will often lead to better error reporting. Note however that neither can completely replace the other.

Backtracking combinators can also be useful when parsing sequences. In the chapter “Parsing sequences” we briefly discussed the following example:

let ws = spaces
let str s = pstring s
let numberInBrackets = str "[" >>. pint32 .>> str "]" .>> ws
> run (many numberInBrackets >>. str "[c]") "[1] [2] [c]";;
val it : ParserResult<string,unit> = Failure:
Error in Ln: 1 Col: 10
[1] [2] [c]
         ^
Expecting: integer number (32-bit, signed)

The problem here is that the argument parser to many fails after consuming input if it encounters a bracket that is not followed by a digit. If we decided that this is a defect of the parser as opposed to the grammar, we could fix it by simply replacing a >>. with >>?.

let numberInBrackets = str "[" >>? pint32 .>> str "]" .>> ws
> run (many numberInBrackets .>> str "[c]") "[1] [2] [c]";;
val it : ParserResult<int32 list,unit> = Success: [1; 2]

A similar example is the sepEndBy1 combinator for parsing a sequence of one or more elements separated and optionally ended by a separator. If FParsec didn’t provide this combinator, you could define it yourself using many and >>?:

let sepEndBy1_ p sep =
    pipe2 p (many (sep >>? p)) (fun hd tl -> hd::tl) .>> opt sep

The following tests show that our sepEndBy1 replacement works as expected:

> run (sepEndBy1_ pint32 (str ";")) "1;2;3";;
val it : ParserResult<int32 list,unit> = Success: [1; 2; 3]
> run (sepEndBy1_ pint32 (str ";")) "1;2;3;";;
val it : ParserResult<int32 list,unit> = Success: [1; 2; 3]

Note however that in contrast to sepEndBy1_ the version of sepEndBy1 provided by FParsec doesn’t need to parse the separator twice when it terminates a sequence.

5.7.2 Parser predicates

The backtracking combinators allow you to “look ahead” by tentatively parsing input and then backtracking if an error occurs. However, they don’t allow you to conditionally parse the input with one parser depending on the success or failure of another parser. This is what the following two combinators are for:

val followedBy:    Parser<'a,'u> -> Parser<unit,'u>
val notFollowedBy: Parser<'a,'u> -> Parser<unit,'u>

The parser followedBy p (notFollowedBy p) succeeds without changing the parser state if p succeeds (fails) when applied at the current position.

For example, both the following parser definitions only parse positive integer literals without a leeding zero:

let p1 = followedBy    (satisfy ((<>) '0')) >>. pint32
let p2 = notFollowedBy (pstring "0")        >>. pint32

Both definitions will correctly parse "123" and fail to parse "01":

> run p1 "123";;
val it : ParserResult<int32,unit> = Success: 123
> run p1 "01";;
val it : ParserResult<int32,unit> =  Failure:
Error in Ln: 1 Col: 1
01
^
Unknown Error(s)
> run p2 "123";;
val it : ParserResult<int32,unit> = Success: 123
> run p2 "01";;
val it : ParserResult<int32,unit> = Failure:
Error in Ln: 1 Col: 1
01
^
Unknown Error(s)

While both parsers work as expected, the generated error messages aren’t very helpful. The problem is that followedBy and notFollowedBy can’t generate better error messages, because they don’t know what kind of input their argument parsers accept.[1] To improve the error messages you can either use the “labeled” combinator variants followedByL and notFollowedByL or you could use the labelling operator <?> that we will discuss in the next chapter.

For example:

> run (followedByL (satisfy ((<>) '0')) "positive int w/o leading 0" >>. pint32)
      "01";;
val it : ParserResult<int32,unit> =  Failure:
Error in Ln: 1 Col: 1
01
^
Expecting: positive int w/o leading 0

> run (followedBy (satisfy ((<>) '0')) >>. pint32 <?> "positive int w/o leading 0")
      "01";;
val it : ParserResult<int32,unit> = Failure:
Error in Ln: 1 Col: 1
01
^
Expecting: positive int w/o leading 0

> run (notFollowedByL (pstring "0") "'0'" >>. pint32) "01";;
val it : ParserResult<int32,unit> = Failure:
Error in Ln: 1 Col: 1
01
^
Unexpected: '0'

The parser notFollowedByL (pstring "0") "'0'" from the last example could actually be simplified to notFollowedByString "0", which uses the specialized parser predicate notFollowedByString. In table 6.1.9 you’ll find an overview of all available parser predicates.

A frequent application for the notFollowedBy predicate are sequence parsers similar to many (notFollowedBy pEnd >>. p) .>> pEnd. If you are writing such a parser, you should check whether you can replace it with an application of one of the manyTill parsers. Please consult the reference for more details.

Before we conclude this chapter we want to emphasize that you’re not limited to the built‐in (backtracking) combinators of FParsec. A great advantage of FParsec is the simplicity with which you can write custom combinators using the low‐level API.

For example, you could define a combinator that backtracks if the result of the argument parser doesn’t satisfy a predicate function:

let resultSatisfies predicate msg (p: Parser<_,_>) : Parser<_,_> =
    let error = messageError msg
    fun stream ->
      let state = stream.State
      let reply = p stream
      if reply.Status <> Ok || predicate reply.Result then reply
      else
          stream.BacktrackTo(state) // backtrack to beginning
          Reply(Error, error)

With this combinator you could conveniently define a parser for positive ints:

let positiveInt = pint32 |> resultSatisfies (fun x -> x > 0)
                                            "The integer must be positive."
> run positiveInt "1";;
val it : ParserResult<int32,unit> = Success: 1
> run positiveInt "-1";;
Error in Ln: 1 Col: 1
-1
^
The integer must be positive.
Footnotes:
[1] In the case of notFollowedBy p the problem is clear: notFollowedBy p fails if p succeeds and when p succeeds, p doesn’t generate an error message that notFollowedBy could reuse. In the case of followedBy p the situation is different: followedBy p fails if p fails, so followedBy could try to reuse the error messages generated by p. However, the error messages generated by the argument parser will in practice often not suffice to explain what kind of input is expected. So, for reasons of consistency and performance, followedBy doesn’t even try to reuse the error messages generated by the argument parser.