5.10 Where is the monad?
If you have previously used Haskell’s Parsec library or an early version of FParsec you’re probably wondering by now where the “monadic syntax” has gone. There’s also a chance that you’ve stumbled upon FParsec while searching for a “monadic parser library” for F#/.Net and you’re now wondering whether FParsec actually is one.
To answer these questions right away: FParsec supports a monadic parser construction syntax, but this syntax is only an optional feature, not the foundation of the library design. FParsec doesn’t use the monadic syntax internally and we no longer recommend using it for new parser projects when performance is a concern.
5.10.1 An example using the monadic syntax
With the monadic syntax you can, for example, write a parser for a pair of floating‐point numbers as follows:
open FParsec let ws = spaces // whitespace parser let str_ws str = parse {do! skipString str do! ws return ()} let number_ws = parse {let! number = pfloat do! ws return number} let pairOfNumbers = parse {do! str_ws "(" let! number1 = number_ws let! number2 = number_ws do! str_ws ")" return (number1, number2)}
We’ll explain how the F# compiler handles the parse {...}
expressions in the next
section. For now, just compare the previous implementation with the following one using the usual FParsec combinators:
open FParsec let ws = spaces // whitespace parser let str_ws str = skipString str >>. ws let number_ws = pfloat .>> ws let pairOfNumbers = between (str_ws "(") (str_ws ")") (tuple2 number_ws number_ws)
The latter implementation is obviously more concise, but – at least for users without prior exposure to FParsec – the first implementation is
probably a bit more intuitive and self‐explanatory. What makes the first implementation so intuitive is that the syntax of the parse {...}
expressions is a) very close to what developers are used to from their normal work with F#
and b) expressive enough that it obviates the need for many of FParsec’s basic combinators. Unfortunately, the intuitiveness of the monadic
syntax comes at the price of a large performance penalty.
5.10.2 How the monadic syntax works
To explain how the monadic syntax works, we need to take a look at how the F# compiler translates the parse {...}
expressions.
The foundation for the monadic syntax is the >>=
combinator introduced in section 5.4.5:
val (>>=): Parser<'a,'u> -> ('a -> Parser<'b,'u>) -> Parser<'b>
This operator takes a parser and a function returning a parser as arguments. The combined parser p >>= f
first applies the parser p
to the input, then it applies the
function f
to the result returned by p
and finally it applies the parser returned by f
to the input. As we exlained in section 5.4.5, this way to combine parsers is powerful enough that we can express
many other sequencing combinators in terms of >>=
and preturn
.
For example, we could implement the pipe3
combinator for sequentially applying three parsers as follows:
Directly using the >>=
and preturn
combinators obviously
leads to somewhat unwieldy and unreadable expressions. Fortunately, F#’s computation expressions allow us to rewrite this expression in a more
intuitive way:
The parse
object that we reference
in this and other code snippets of this chapter is a so‐called “builder” object for computation expressions. It is defined in
FParsec’s Primitives
module. Using the methods of
this object, the F# compiler translates the computation expression in the curly braces to the following equivalent expression:
let pipe3 p1 p2 p3 f = parse.Delay(fun () -> parse.Bind(p1, fun x1 -> parse.Bind(p2, fun x2 -> parse.Bind(p3, fun x3 -> parse.Return(f (x1 x2 x3))))))
When we replace the parse
object
method calls with the respective method bodies, we will see that this definition is equivalent to our original definition using >>=
and preturn
.
The Bind
, Return
and Delay
methods of the parse
object are defined as:
member t.Bind(p, f) = p >>= f member t.Return(x) = preturn x member t.Delay(f:(unit -> Parser<'a,'u>)) = fun stream -> (f ()) stream
Substituting these method bodies into the previous expression yields an expression that is very similar to the original one (except for the
additional indirection introduced by the Delay
method[1]):
5.10.3 The term “monad”
A function with a signature like the one of the >>=
operator is often called “bind”. The above examples make it obvious why: the >>=
combinator binds the
result of the parser on the left‐hand side to the function argument on the right‐hand side.
The Parser
type together with the
>>=
and preturn
operations constitute a monad, which is an abstraction in type theory that denotes this kind
of combination of a generic type with associated bind and return operations.
Discussing the theoretical background of monads would be outside the scope of this user’s guide. For our purposes it is enough to note that the
monad abstraction is so useful for certain applications that F# comes with built‐in syntax support for monadic expressions. FParsec utilizes
this language feature (computation expressions) to enable parse {...}
expressions.
Be assured that you don’t need to know anything about monads in general in order to use FParsec’s parse {...}
expressions. To fully understand this feature all you need to know to is how the F# compiler translates parse {...}
expressions into normal code.
Besides let!
, do!
and return
there are some more language constructs that are supported inside parse {...}
expressions. Please refer to the reference documentation for
more information.
5.10.4 Why the monadic syntax is slow
Compared to parsers implemented with only the usual FParsec operators and functions, parsers implemented with parse {...}
expressions can be up to several times slower.
The relatively bad performance can be directly attributed to the way parse {...}
expressions are compiled. As you have seen above, a parse {...}
expression is simply translated into a series of nested closures that are chained through calls to the >>=
operator. With the
current compiler technology and the current implementation of FParsec this introduces some significant overhead.
Every time a Parser
function constructed with the parse {...}
syntax is called:
-
Two function closures get newly instantiated for each invocation of the
>>=
operator: the closure that is passed as the second argument to>>=
and the closure that is returned by>>=
. -
Any parser created inside a
parse {...}
expression gets (re‐)created every time execution reaches that point in the expression.
In principle, you can avoid the overhead described in the second point by moving the construction of parser functions out of the parse {...}
expression.
For example, you can avoid the repeated construction of the skipString
parsers in
let numberInParens = parse {do! skipString "(" let! number = pfloat do! skipString ")" return number}
by rewriting the code as
let parenOpen = skipString "(" let parenClose = skipString ")" let numberInParens = parse {do! parenOpen let! number = pfloat do! parenClose return number}
However, if you wanted to factor out any parser construction from a parse {...}
expression, you’d also have to factor out any use of parser combinators, which would take away a lot from the
attractiveness of the syntax.
If performance is not that important for your application, you can just ignore that a parser like skipString "("
is repeatedly constructed,
since its construction is relatively cheap. But if you do the same for parsers based on regex
or anyOf
, where the construction potentially involves some
relatively expensive compilation or runtime code generation, you might be surprised just how slow your parsers can become.
Because of the described performance issues, we recommend not to use parse {...}
expressions and instead work with FParsec’s rich set of operators and other combinators. Not only does the
operator‐based notation (which is used everywhere else in FParsec’s documentation) lead to faster parsers, it also allows for more concise
parser code with a higher signal‐to‐noise ratio.
[1] |
The computation expression specification does not require a Delay method. So, we could avoid
the overhead associated with the additional indirection by removing the Delay method from
the ParserCombinator
class. However, this would make the behaviour of parse expressions somewhat counter‐intuitive, as the behaviour would differ from the behaviour of F#’s seq and async expressions.
|
---|