5.10 Where is the monad?

If you have previously used Haskell’s Parsec library or an early version of FParsec you’re probably wondering by now where the “monadic syntax” has gone. There’s also a chance that you’ve stumbled upon FParsec while searching for a “monadic parser library” for F#/.Net and you’re now wondering whether FParsec actually is one.

To answer these questions right away: FParsec supports a monadic parser construction syntax, but this syntax is only an optional feature, not the foundation of the library design. FParsec doesn’t use the monadic syntax internally and we no longer recommend using it for new parser projects when performance is a concern.

5.10.1 An example using the monadic syntax

With the monadic syntax you can, for example, write a parser for a pair of floating‐point numbers as follows:

open FParsec

let ws = spaces // whitespace parser

let str_ws str = parse {do! skipString str
                        do! ws
                        return ()}

let number_ws = parse {let! number = pfloat
                       do! ws
                       return number}

let pairOfNumbers = parse {do! str_ws "("
                           let! number1 = number_ws
                           let! number2 = number_ws
                           do! str_ws ")"
                           return (number1, number2)}

We’ll explain how the F# compiler handles the parse {...} expressions in the next section. For now, just compare the previous implementation with the following one using the usual FParsec combinators:

open FParsec

let ws = spaces // whitespace parser

let str_ws str = skipString str >>. ws

let number_ws = pfloat .>> ws

let pairOfNumbers = between (str_ws "(") (str_ws ")")
                            (tuple2 number1 number2)

The latter implementation is obviously more concise, but – at least for users without prior exposure to FParsec – the first implementation is probably a bit more intuitive and self‐explanatory. What makes the first implementation so intuitive is that the syntax of the parse {...} expressions is a) very close to what developers are used to from their normal work with F# and b) expressive enough that it obviates the need for many of FParsec’s basic combinators. Unfortunately, the intuitiveness of the monadic syntax comes at the price of a large performance penalty.

5.10.2 How the monadic syntax works

To explain how the monadic syntax works, we need to take a look at how the F# compiler translates the parse {...} expressions.

The foundation for the monadic syntax is the >>= combinator introduced in section 5.4.5:

val (>>=): Parser<'a,'u> -> ('a -> Parser<'b,'u>) -> Parser<'b>

This operator takes a parser and a function returning a parser as arguments. The combined parser p >>= f first applies the parser p to the input, then it applies the function f to the result returned by p and finally it applies the parser returned by f to the input. As we exlained in section 5.4.5, this way to combine parsers is powerful enough that we can express many other sequencing combinators in terms of >>= and preturn.

For example, we could implement the pipe3 combinator for sequentially applying three parsers as follows:

let pipe3 p1 p2 p3 f =
    p1 >>= fun x1 ->
             p2 >>= fun x2 ->
                      p3 >>= fun x3 ->
                               preturn (f x1 x2 x3)

Directly using the >>= and preturn combinators obviously leads to somewhat unwieldy and unreadable expressions. Fortunately, F#’s computation expressions allow us to rewrite this expression in a more intuitive way:

let pipe3 p1 p2 p3 f =
    parse {let! x1 = p1
           let! x2 = p2
           let! x3 = p3
           return f x1 x2 x3}

The parse object that we reference in this and other code snippets of this chapter is a so‐called “builder” object for computation expressions. It is defined in FParsec’s Primitives module. Using the methods of this object, the F# compiler translates the computation expression in the curly braces to the following equivalent expression:

let pipe3 p1 p2 p3 f =
    parse.Delay(fun () ->
      parse.Bind(p1, fun x1 ->
        parse.Bind(p2, fun x2 ->
          parse.Bind(p3, fun x3 ->
            parse.Return(f (x1 x2 x3))))))

When we replace the parse object method calls with the respective method bodies, we will see that this definition is equivalent to our original definition using >>= and preturn.

The Bind, Return and Delay methods of the parse object are defined as:

    member t.Bind(p, f) = p >>= f
    member t.Return(x) = preturn x
    member t.Delay(f:(unit -> Parser<'a,'u>)) = fun stream -> (f ()) stream

Substituting these method bodies into the previous expression yields an expression that is very similar to the original one (except for the additional indirection introduced by the Delay method[1]):

let pipe3 p1 p2 p3 f =
    fun stream ->
      (p1 >>= fun x1 ->
                p2 >>= fun x2 ->
                         p3 >>= fun x3 ->
                                  preturn (f x1 x2 x3)) stream

In summary, the parse {...} syntax is syntactic sugar for defining parsers with the >>= operator. The expressiveness of this syntax stems from the power of the >>= operator.

5.10.3 The term “monad”

A function with a signature like the one of the >>= operator is often called “bind”. The above examples make it obvious why: the >>= combinator binds the result of the parser on the left‐hand side to the function argument on the right‐hand side.

The Parser type together with the >>= and preturn operations constitute a monad, which is an abstraction in type theory that denotes this kind of combination of a generic type with associated bind and return operations.

Discussing the theoretical background of monads would be outside the scope of this user’s guide. For our purposes it is enough to note that the monad abstraction is so useful for certain applications that F# comes with built‐in syntax support for monadic expressions. FParsec utilizes this language feature (computation expressions) to enable parse {...} expressions.

Be assured that you don’t need to know anything about monads in general in order to use FParsec’s parse {...} expressions. To fully understand this feature all you need to know to is how the F# compiler translates parse {...} expressions into normal code.

Besides let!, do! and return there are some more language constructs that are supported inside parse {...} expressions. Please refer to the reference documentation for more information.

5.10.4 Why the monadic syntax is slow

Compared to parsers implemented with only the usual FParsec operators and functions, parsers implemented with parse {...} expressions can be up to several times slower.

The relatively bad performance can be directly attributed to the way parse {...} expressions are compiled. As you have seen above, a parse {...} expression is simply translated into a series of nested closures that are chained through calls to the >>= operator. With the current compiler technology and the current implementation of FParsec this introduces some significant overhead.

Every time a Parser function constructed with the parse {...} syntax is called:

  • Two function closures get newly instantiated for each invocation of the >>= operator: the closure that is passed as the second argument to >>= and the closure that is returned by >>=.
  • Any parser created inside a parse {...} expression gets (re‐)created every time execution reaches that point in the expression.

In principle, you can avoid the overhead described in the second point by moving the construction of parser functions out of the parse {...} expression.

For example, you can avoid the repeated construction of the skipString parsers in

let numberInParens = parse {do! skipString "("
                            let! number = pfloat
                            do! skipString ")"
                            return number}

by rewriting the code as

let parenOpen = skipString "("
let parenClose = skipString ")"
let numberInParens = parse {do! parenOpen
                            let! number = pfloat
                            do! parenClose
                            return number}

However, if you wanted to factor out any parser construction from a parse {...} expression, you’d also have to factor out any use of parser combinators, which would take away a lot from the attractiveness of the syntax.

If performance is not that important for your application, you can just ignore that a parser like skipString "(" is repeatedly constructed, since its construction is relatively cheap. But if you do the same for parsers based on regex or anyOf, where the construction potentially involves some relatively expensive compilation or runtime code generation, you might be surprised just how slow your parsers can become.

Because of the described performance issues, we recommend not to use parse {...} expressions and instead work with FParsec’s rich set of operators and other combinators. Not only does the operator‐based notation (which is used everywhere else in FParsec’s documentation) lead to faster parsers, it also allows for more concise parser code with a higher signal‐to‐noise ratio.

Footnotes:
[1] The computation expression specification does not require a Delay method. So, we could avoid the overhead associated with the additional indirection by removing the Delay method from the ParserCombinator class. However, this would make the behaviour of parse expressions somewhat counter‐intuitive, as the behaviour would differ from the behaviour of F#’s seq and async expressions.