5.8 Customizing error messages

Generating relevant and informative parser error messages is one of FParsec’s greatest strengths. The top‐down approach of recursive‐descent parsing guarantees that there is always enough context to describe the exact cause of a parser error and how it could be avoided. FParsec exploits this context to automatically generate descriptive error messages whenever possible. This chapter explains how you can ensure with minimal efforts that your parser always produces understandable error messages.

As we already described in detail in section 5.4.2, error reporting in FParsec is based on the following two principles:

  • Parsers that fail or could have consumed more input return as part of their Reply an ErrorMessageList describing the input they expected or the reason they failed.
  • Parser combinators aggregate all error messages that apply to the same input position and then propagate these error messages as appropriate.

The various error messages in the previous chapters demonstrate that the built‐in error reporting usually works quite well even without any intervention by the parser author. However, sometimes FParsec lacks the information necessary to produce an informative error message by itself.

Consider for example the many1Satisfy f parser, which parses a string consisting of one or more chars satisfying the predicate function f. If this parser fails to parse at least one char, the generated error is not very helpful:

> run (many1Satisfy isLetter) "123";;
val it : ParserResult<string,unit> = Failure:
Error in Ln: 1 Col: 1
123
^
Unknown Error(s)

The problem here is that many1Satisfy can’t describe what chars the function predicate accepts. Hence, when you don’t use many1Satisfy as part of a combined parser that takes care of a potential error, you better replace it with many1SatisfyL, which allows you to describe the accepted input with a label (hence the “L”):

> run (many1SatisfyL isLetter "identifier") "123";;
val it : ParserResult<string,unit> = Failure:
Error in Ln: 1 Col: 1
123
^
Expecting: identifier

There are also labelled variants of other parsers and combinators, for example choiceL and notFollowedByL.

If there is no labelled parser variant or you want to replace a predefined error message, you can always use the labelling operator

val (<?>): Parser<'a,'u> -> string -> Parser<'a,'u>

The parser p <?> label behaves like p, except that the error messages are replaced with expectedError label if p does not change the parser state (usually because p failed).

For example, if FParsec didn’t provide many1SatisfyL, you could define it yourself as

let many1SatisfyL f label = many1Satisfy f <?> label

The labelling operator is particularly useful for producing error messages in terms of higher‐level grammar productions instead of error messages in terms of lower‐level component parsers. Suppose you want to parse a string literal with the following parser

let literal_ = between (pstring "\"") (pstring "\"")
                       (manySatisfy ((<>) '"'))

If this parser encounters input that doesn’t start with a double quote it will fail with the error message produced by the parser for the opening quote:

> run literal_ "123";;
val it : ParserResult<string,unit> = Failure:
Error in Ln: 1 Col: 1
123
^
Expecting: '"'

In situations like these an error message that mentions the aggregate thing you’re trying to parse will often be more helpful:

let literal = literal_ <?> "string literal in double quotes"
> run literal "123";;
val it : ParserResult<string,unit> = Failure:
Error in Ln: 1 Col: 1
123
^
Expecting: string literal in double quotes

Note that <?> only replaces the error message if the parser doesn’t consume input. For example, our literal parser won’t mention that we’re trying to parse a string literal if it fails after the initial double quote:

> run literal "\"abc def";;
val it : ParserResult<string,unit> = Failure:
Error in Ln: 1 Col: 9
"abc def
        ^
Note: The error occurred at the end of the input stream.
Expecting: '"'

With the compound labelling operator <??> you can make sure that the compound gets mentioned even if the parser fails after consuming input:

let literal = literal_ <??> "string literal in double quotes"
> run literal "\"abc def";;
val it : ParserResult<string,unit> = Failure:
Error in Ln: 1 Col: 1
"abc def
^
Expecting: string literal in double quotes

string literal in double quotes could not be parsed because:
  Error in Ln: 1 Col: 9
  "abc def
          ^
  Note: The error occurred at the end of the input stream.
  Expecting: '"'
Tip

If you don’t like the formatting of these error messages, you can write a custom formatter for your application. The data structure in which error messages are stored is easy to query and process. See the reference for the Error module.

The parsers we discussed so far in this chapter only generated Expected error messages, but FParsec also supports other type of error messages. For example, the notFollowedByL parser generates Unexpected error messages:

> run (notFollowedByL spaces "whitespace") " ";;
val it : ParserResult<unit,unit> = Failure:
Error in Ln: 1 Col: 1

^
Unexpected: whitespace

Error messages that don’t fit into the Expected and Unexpected categories can be produced with the fail and failFatally primitives:

let theory =
    charsTillString "3) " true System.Int32.MaxValue
     >>. (pstring "profit" <|> fail "So much about that theory ... ;-)")

let practice = "1) Write open source library 2) ??? 3) lot's of unpaid work"

> run theory practice;;
val it : ParserResult<string,unit> = Failure:
Error in Ln: 1 Col: 40
1) Write open source library 2) ??? 3) lot's of unpaid work
                                       ^
Expecting: 'profit'
Other error messages:
  So much about that theory... ;-)

If you can’t get the built‐in operators and parsers to produce the error message you need, you can always drop down one API level and write a special‐purpose parser combinator.

The following example shows how you can define a custom between combinator that includes the position of the opening delimiter as part of the error message that gets generated when the closing delimiter cannot be parsed.

let betweenL (popen: Parser<_,_>) (pclose: Parser<_,_>) (p: Parser<_,_>) label =
  let expectedLabel = expected label
  let notClosedError (pos: Position) =
     messageError (sprintf "The %s opened at %s was not closed."
                           label (pos.ToString()))
  fun (stream: CharStream<_>) ->
    // The following code might look a bit complicated, but that's mainly
    // because we manually apply three parsers in sequence and have to merge
    // the errors when they refer to the same parser state.
    let state0 = stream.State
    let reply1 = popen stream
    if reply1.Status = Ok then
      let stateTag1 = stream.StateTag
      let reply2 = p stream
      let error2 = if stateTag1 <> stream.StateTag then reply2.Error
                   else mergeErrors reply1.Error reply2.Error
      if reply2.Status = Ok then
        let stateTag2 = stream.StateTag
        let reply3 = pclose stream
        let error3 = if stateTag2 <> stream.StateTag then reply3.Error
                     else mergeErrors error2 reply3.Error
        if reply3.Status = Ok then
          Reply(Ok, reply2.Result, error3)
        else
          Reply(reply3.Status,
                mergeErrors error3 (notClosedError (state0.GetPosition(stream))))
      else
        Reply(reply2.Status, reply2.Error)
    else
      let error = if state0.Tag <> stream.StateTag then reply1.Error
                  else expectedLabel
      Reply(reply1.Status, error)

The behaviour of the betweenL combinator differs from that of the standard between combinator in two ways:

  • If popen fails without changing the parser state, betweenL popen p pclose label fails with expected label, just like between popen p pclose <?> label would have.
  • If pclose fails without changing the parser state, betweenL additionally prints the opening position of the compound.

The following tests demonstrate this behaviour:

let stringLiteral = betweenL (str "\"") (str "\"")
                             (manySatisfy ((<>) '"'))
                             "string literal in double quotes"
> run stringLiteral "\"test\"";;
val it : ParserResult<string,unit> = Success: "test"

> run stringLiteral "\"test";;
val it : ParserResult<string,unit> = Failure:
Error in Ln: 1 Col: 6
"test
     ^
Note: The error occurred at the end of the input stream.
Expecting: '"'
Other messages:
  The string literal in double quotes opened at (Ln: 1, Col: 1) was not closed.

> run stringLiteral "test";;
val it : ParserResult<string,unit> = Failure:
Error in Ln: 1 Col: 1
test
^
Expecting: string literal in double quotes