5.11 Debugging a parser

Debugging a parser implemented with the help of a combinator library has its special challenges. In particular, setting a breakpoint and stepping through the code is not as straightforward as in a regular recursive descent parser. Furthermore, stack traces can be difficult to decipher because of the ubiquitous use of anonymous functions.[1] However, with the help of the techniques we explain in this chapter, working around these issues should be easy.

5.11.1 Setting a breakpoint

Suppose you have a combined parser like

let buggyParser = pipe2 parserA parserB (fun a b -> ...)

and you would like to break into the debugger whenever buggyParser calls parserB. One thing you could try is to set a breakpoint at the beginning of parserB. However, that’s only possible if parserB is not itself a combined parser, and even then you still have the problem that your breakpoint is also triggered whenever parserB is called from any other place in your source. Similarly, a breakpoint you set in pipe2 will probably be triggered by many other parsers besides buggyParser.

Fortunately there’s a simple workaround if you can modify and recompile the code. Just define a wrapper function like the following

let BP (p: Parser<_,_>) stream =
    p stream // set a breakpoint here

Then redefine the buggy parser as

let buggyParser = pipe2 parserA (BP parserB) (fun a b -> ...)

If you now set a breakpoint at the body of the BP function, it will be triggered whenever parserB is called from buggyParser.

With such a wrapper it’s also easy define a precise conditional breakpoint. For example, if you only want to break once the parser has reached line 100 of the input file, you could use the breakpoint condition stream.Line >= 100.

By the way, you don’t need to set the breakpoint in the debugger. You can also write it directly into the code:

let BP (p: Parser<_,_>) (stream: CharStream<_>) =
    // this will execute much faster than a
    // conditional breakpoint set in the debugger
    if stream.Line >= 100L then
        System.Diagnostics.Debugger.Break()
    p stream
Note

There are some issues with setting breakpoints in or stepping into anonymous or curried F# functions in Visual Studio 2008. In Visual Studio 2010 many of these issues have been fixed.

If you’re using Visual Studio, don’t forget to switch on the “Suppress JIT optimization on module load” option in the Tools – Options – Debugging – General dialog. And, when possible, use a debug build (of FParsec) for debugging.

5.11.2 Tracing a parser

Occasionally you have a parser that doesn’t work as expected and playing around with the input or staring at the code long enough just isn’t enough for figuring out what’s wrong. In such cases the best way to proceed usually is to trace the execution of the parser. Unfortunately, stepping through the parser under a debugger can be quite tedious, because it involves stepping through long sequences of nested invocations of parser combinators. A more convenient approach often is to output tracing information to the console or a logging service.

A simple helper function for printing trace information to the console could like the following example:

let (<!>) (p: Parser<_,_>) label : Parser<_,_> =
    fun stream ->
        printfn "%A: Entering %s" stream.Position label
        let reply = p stream
        printfn "%A: Leaving %s (%A)" stream.Position label reply.Status
        reply

To demonstrate how you could use such a tracing operator, let’s try to debug the following buggy (and completely silly) parser:

let number = many1Satisfy isDigit

let emptyElement = pstring "[]" : Parser<_,unit>
let numberElement = pstring "[" >>. number .>> pstring "]"
let nanElement = pstring "[NaN]"

let element = choice [emptyElement
                      numberElement
                      nanElement] .>> spaces

let elements : Parser<_,unit> = many element

The following test run shows that the above parser is indeed buggy:

> run elements "[] [123] [NaN]";;
val it : ParserResult<string list,unit> = Failure:
Error in Ln: 1 Col: 11
[] [123] [NaN]
          ^
Unknown Error(s)

You probably don’t need trace information to figure out why the "NaN" bit of the string doesn’t get parsed, but let’s pretend you do. Obviously, there’s something wrong with the element parser. To find out what’s wrong, let’s decorate the element parser and all subparsers with the <!> operator and an appropriate label:

let number = many1Satisfy isDigit <!> "number"

let emptyElement  = pstring "[]"                           <!> "emptyElement"
let numberElement = pstring "[" >>. number .>> pstring "]" <!> "numberElement"
let nanElement    = pstring "[NaN]"                        <!> "nanElement"

let element = choice [emptyElement
                      numberElement
                      nanElement] .>> spaces <!> "element"

let elements  : Parser<_,unit> = many element

If you now run the parser on the same input as before, you get the following output:

> run elements "[] [123] [NaN]";;
(Ln: 1, Col: 1): Entering element
(Ln: 1, Col: 1): Entering emptyElement
(Ln: 1, Col: 3): Leaving emptyElement (Ok)
(Ln: 1, Col: 4): Leaving element (Ok)
(Ln: 1, Col: 4): Entering element
(Ln: 1, Col: 4): Entering emptyElement
(Ln: 1, Col: 4): Leaving emptyElement (Error)
(Ln: 1, Col: 4): Entering numberElement
(Ln: 1, Col: 5): Entering number
(Ln: 1, Col: 8): Leaving number (Ok)
(Ln: 1, Col: 9): Leaving numberElement (Ok)
(Ln: 1, Col: 10): Leaving element (Ok)
(Ln: 1, Col: 10): Entering element
(Ln: 1, Col: 10): Entering emptyElement
(Ln: 1, Col: 10): Leaving emptyElement (Error)
(Ln: 1, Col: 10): Entering numberElement
(Ln: 1, Col: 11): Entering number
(Ln: 1, Col: 11): Leaving number (Error)
(Ln: 1, Col: 11): Leaving numberElement (Error)
(Ln: 1, Col: 11): Leaving element (Error)
val it : ParserResult<string list,unit> = Failure:
Error in Ln: 1 Col: 11
[] [123] [NaN]
          ^
Unknown Error(s)

This trace log clearly reveals that the element parser failed because the numberElement parser failed after consuming the left bracket and thus the choice parser never got to try the the nanElement parser. Of course, this issue could be easily avoided by factoring out the bracket parsers from the emptyElement, numberElement and nanElement parsers. Also, if we had used many1SatisfyL instead of manySatisfy for the number parser, we would have gotten an error message more descriptive than “Unknown error(s)” (see the chapter on customizing error messages).

Footnotes:
[1] Although, debugging a parser written with a combinator library is often still easier than debugging one generated by an opaque parser generator tool.