| Author: | Stephan Tolksdorf |
|---|---|
| Contact: | fparsec [at] quanttec [dot] com |
| Version: | Version 0.7.2 |
| Date: | 2008-11-17 |
| Address: | Geschwister-Scholl-Allee 253, 25524 Itzehoe, Germany |
Contents
FParsec is an F# adaptation of Parsec, the popular parser combinator library for Haskell by Daan Leijen. It can parse context-sensitive, infinite look-ahead grammars, has complete support for unicode input and large files (> 4 GB) and produces excellent error messages. Design and implementation of the library have been thoroughly optimized for speed. FParsec is distributed under a permissive open source license (BSD-style).
The basic idea behind combinator parsing is to compose parsers for higher-level grammar expressions from simple atomic parsers. A parser combinator library provides those predefined atomic parsers and the means to combine them into larger units. In the case of FParsec all parsers are F# functions, functions that can be combined with higher-level functions, so-called "combinators".
The parser combinator approach as implemented by FParsec has a number of advantages, in particular:
The top-down parsing approach of FParsec also has some disadvantages in comparison with bottom-up parsers generated with a tool like fsyacc:
This is a pre-release version of FParsec. Although it has been tested rather comprehensively, it likely still contains bugs. If you want to use FParsec in a production environment, you need to test your parsers thoroughly.
If you have any questions or comments on the implementation or documentation of FParsec, don't hesitate to contact me at the above email address. Any feedback is very much appreciated, especially bug reports!
Copyright (c) 2007-2008, Stephan Tolksdorf. All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
This software is provided by the copyright holders "as is" and any express or implied warranties, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose are disclaimed. In no event shall the copyright holders be liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) however caused and on any theory of liability, whether in contract, strict liability, or tort (including negligence or otherwise) arising in any way out of the use of this software, even if advised of the possibility of such damage.
FParsec is currently only available as a source code package. You can download the zip file from here.
The current version of FParsec requires F# version 1.9.6.2 or later. A C# compiler is also required.
FParsec is built as two libraries: FParsecCS.dll and FParsec.dll. The former DLL contains some types and helper functions implemented in C#, while the latter DLL contains the main library code written in F#. All programs using FParsec will need to reference both DLLs.
The Visual Studio solution FParsec.sln in the top-level directory can be used to build the library as well as all tests and samples. By default, the DLLs and the test executable will be built in the Build/ subfolder. The sample executables will be built in the bin/ subdirectories of the respective sample folders. Note that you don't need a full edition of Visual Studio, the free Shell Edition together with the F# plugin will build the complete solution even though it can't load the C# project. You can also build FParsec with an MSBuild compatible tool. On Windows, for example, the command c:\windows\Microsoft.Net\Framework\v3.5\MsBuild.exe FParsec.sln /p:Configuration=Release will build the complete solution.
In case you need to build FParsec manually: just compile FParsecCS.dll from the C# files CharStream.cs and FParsec.cs and compile FParsec.dll from the F# files error.fsi, error.fs, primitives.fsi, primitives.fs, charparser.fsi, charparser.fs (in that order).
The core design and basic algorithms of FParsec have their origin in Daan Leijen's Parsec library [Parsec01]. Many of the basic ideas can be traced back to Graham Hutton and Eric Meijer, the "inventors" of monadic parser combinators [Monadic96].
FParsec is not an attempt of a literal port of Parsec from Haskell to F#. Due to the fundamental language differences such a port would not be feasible. Neither would it be desirable, since it would preclude FParsec from making full use of F# and .Net specific features. Nonetheless, the public interfaces of both libraries are very similar, hence porting parsers from Parsec to FParsec should be rather easy. However, there are some differences between Parsec and FParsec you should be aware of:
The following features in Parsec are not implemented in FParsec:
The stated goal of FParsec is to provide a viable alternative to external parser generator tools - even for performance sensitive applications. In order to achieve this goal, FParsec employs some rather aggressive optimizations. Since functional programming techniques sometimes carry a certain handicap on .Net, the implementation (not the interface) of FParsec is at times a little more imperative and less idiomatic than is usually expected from a library written in a functional programming language. In this case the end (a powerful new tool for functional programming) should justify the means (some imperative programming, a little code duplication and slightly less readable code).
Should you require maximum performance, the following guidelines will help you:
If you're working on a 32-bit Windows platform, make sure you have the Service Pack 1 for .Net 3.5 installed. The updated JIT comes with several new optimizations for code involving structs that give FParsec a serious boost. Since there are no such improvements in the 64-bit JIT, FParsec now actually runs significantly faster under the 32-bit CLR.
Factor out constant parser expressions from inside parse { ... } expressions, especially those involving parsers with arguments. For example, rewrite
parse {let! s = regex "(...)"
do! skipManyChars (anyOf ".,;" <|> whitespace)
return s}
as
let pregex = regex "(...)"
let separator = skipManyChars (anyOf ".,;" <|> whitespace)
parse {let! s = pregex
do! whitespace
return s}
Where possible, use the chaining and pipelining primitives .>> , >>., >>? and |>>, pipe2, pipe3, pipe4 instead of parse { ... } expressions to combine parsers. For example, rewrite the above parser as pregex .>> separator.
Use manySatisfy, manyChars or manyCharsTill for parsing many chars. These special purpose parsers are heavily optimized and will be faster than parsers using general combinators like many or manyTill.
If you need to parse strings satisfying complex rules, it is often easier and more efficient to write a parser that skips over the interesting content and then use the skipped combinator to retrieve the actual content.
If you need to parse complex expressions char-by-char, you are likely to improve performance by replacing frequently used combined parsers with your own special purpose primitives. See the section on how to write a (char) parser function.
This tutorial assumes that you already have a working knowledge of F#. If you are new to F# and are looking for an introductory text that can bring you quickly up to speed, have a look at the excellent book Expert F#, by Don Syme, Adam Granicz and Antonio Cisternino. Expert F# covers all the language features required by FParsec in great detail and also has a complete chapter on parser writing (with a focus on the fslex and fsyacc tools). A comprehensive and in-depth description of F# can be found in the F# draft language specification.
The following text begins with an explanation of the basics of FParsec: underlying types, essential combinators and principle mechanics. It then gives an overview of the built-in char parsers and how to run them on input. The remaining sections discuss the customization of error messages, useful combinators and some advanced topics.
A good way to deepen your understanding of FParsec is to play with the four sample applications that ship with FParsec. One is the classical calculator example demonstrating how to parse simple arithmetic expressions. Another example implements a parser for Parsing Expression Grammars (PEGs). The third sample shows how to parse JSON input. The fourth sample is a port of the parsing sample that came with earlier versions of F#. Comparing the original fslex + fsyacc implementation with the FParsec implementation is a good way to get a quick impression of the differences between the two approaches to parsing (external parser generator tool vs. parser combinator library).
Always a good place to get help and advise on F# in general and FParsec in particular is the forum on www.hubfs.net. If you are stuck with a (parser) coding problem, just post a question in the forum and there's a good chance someone in the hubfs community will answer it quickly.
In FParsec a parser is a function of
type GenParser<'a,'s> when 's :> IState = 's -> Reply<'a,'s>
where 's denotes the type of the state and 'a denotes the type of the parser result. The state type 's is required to implement the (trivial) interface type IState. All the basic ("primitive") and derived combinators in FParsec's Primitives module take parsers of this type as arguments.
A simple example for a parser is the anyChar parser from the CharParser module. It has the following type:
val anyChar: CharParser.State<'u> -> Reply<char,CharParser.State<'u>>
As you can see, anyChar takes a CharParser.State object as the input and returns a Reply with the output state and the parsed char. Objects of type CharParser.State<'u> hold a reference to the input stream, keep track of the position in the input and contain a user-defined state of type u. When anyChar is called it will read a char from the input stream, increment a character counter within the state and return the modified state together with the parsed char.
All parsers in FParsec return their results in containers of type Reply<'a,'s>. If the parser was successful, the Reply value contains the result and the output state, if the parser failed, it contains an error message. You can think of Reply as a discriminated union type [1] with the following definition:
type Reply<'a, 's> = | ConsumedOk of 'a * 's * ParserError | ConsumedError of ParserError | EmptyOk of 'a * 's * ParserError | EmptyError of ParserError
The four distinct cases signal whether the parser was successful (Ok) or not (Error) and whether it consumed input (Consumed), i.e. changed the position in the input, or not (Empty). Differentiating between consuming and non-consuming parsers at this fundamental level simplifies error reporting and is a hallmark of the Parsec library design.
| [1] | For performance reasons the Reply type is not actually defined as a discriminated union type but as a struct type. However, since the Primitives module also defines appropriate constructor functions and an active pattern , one can use the Reply as if it was defined as given above. See the reference for more details. |
A parser combinator is a function that takes one or more parsers as arguments and returns a "combined" parser.
The most important parser combinator in FParsec is the sequencing operator
val (>>=): GenParser<'a,'s> -> ('a -> GenParser<'b,'s>) -> GenParser<'b,'s>
It takes a parser and a function returning a parser as arguments and returns a parser that applies both arguments in sequence. In the simplest case, when no error occurs, the parser returned by p >>= f applies the first argument p to the input, then uses the result of p as the argument for f to get another parser and finally applies that second parser to the input. Before we discuss the exact meaning of >>= we first want to show how this operator is used in actual code.
Employing >>= in its raw form looks like in the following definition of a parser which parses x and y and returns the values as a tuple
let parseXY = parseX >>= fun x ->
parseY >>= fun y ->
preturn (x, y)
Here parseX and parseY stand for some arbitrary parser functions and preturn is a basic primitive defined as:
let preturn x = fun state -> EmptyOk x state NoError
preturn takes an argument x and returns a parser function that will always return x without touching the input state. If you wonder why we use preturn instead of returning (x, y) directly, remember that the second argument to >>= must be a function of type 'a -> GenParser<'b,'s>.
Admittedly, the above code snippet doesn't look very intuitive. To improve the readability of such expressions involving higher-order functions, Haskell offers its "monadic do-notation". Fortunately, F# has its own brand of syntactic sugar: computation expressions. Thanks to this feature we can rewrite the above expression as
let parseXY = parse {let! x = parseX
let! y = parseY
return (x, y)}
Here parse is a so-called "builder" object defined in FParsec's Primitives module. Using the methods of this object, the F# compiler translates the computation expression in the curly braces back into the original form from above. Both forms are (almost) identical in meaning, yet the latter version makes the intent of the code much clearer: run parseX on the input and bind the result to x, then run parseY and bind the result to y, and finally return (x,y). See the ParserCombinator reference to find out what other constructs besides let! and return the parse object supports.
The easiest way to explain the precise meaning of >>= is to give a definition:
let (>>=) p f =
fun s0 ->
match p s0 with
| EmptyError e1 -> EmptyError e1
| ConsumedError e1 -> ConsumedError e1
| EmptyOk (r1, s1, e1) ->
let p2 = f r1
match p2 s1 with
| EmptyError e2 -> EmptyError (mergeErrors e2 e1)
| EmptyOk (r2, s2, e2) -> EmptyOk (r2, s2, mergeErrors e2 e1)
| consumed -> consumed
| ConsumedOk (r1, s1, e1) ->
let p2 = f r1
match p2 s1 with
| EmptyError e2 -> ConsumedError (mergeErrors e2 e1)
| EmptyOk (r2, s2, e2) -> ConsumedOk (r2, s2, mergeErrors e2 e1)
| consumed -> consumed
Apparently p >>= f returns a parser function which does the following: First, it applies the parser p to the input state s0. Then, if p returns an Ok reply with the result r1 and the output state s1, it applies f to r1 and thus gets a second parser p2, which subsequently is applied to s1 in order to obtain the reply for the complete parser. If instead p returns an Error reply, this reply becomes the reply of the complete parser. Every time both parsers are evaluated and the second parser does not consume input, i.e. returns an Empty reply, the error messages are merged.
The choice operator
val (<|>): GenParser<'a,'s> -> GenParser<'a,'s> -> GenParser<'a,'s>
is the second fundamental operator in FParsec. Its definition is given by [2]
let (<|>) p1 p2 =
fun s0 ->
match p1 s0 with
| EmptyError e1 ->
match p2 state with
| EmptyError e2 -> EmptyError (mergeErrors e2 e1)
| EmptyOk (r2, s2, e2) -> EmptyOk (r2, s2, (mergeErrors e2 e1))
| consumed -> consumed
| other -> other
| [2] | This definition deviates from the one given in the paper [Parsec01], though it is consistent with the actual implementation in Haskell's Parsec package. In the paper's version a consuming second parser takes precedence over a not consuming first parser, even if the first parser succeeds. |
Invoking p1 <|> p2 yields a parser function which first tries to run p1 on the input. If p1 consumes input or succeeds without consuming, its reply will become the reply of the combined parser. If instead p1 returns an EmptyError, p2 will be run on the original input and its reply will become the reply of the combined parser. In case the second parser does not consume, the error messages of both parsers are merged.
The most important point to note here is that p1 <|> p2 will always return with the reply of p1 in case p1 consumes input, even if p1 eventually fails. Since a parser usually consumes input as soon as it can accept at least one atomic token from the input, this means that p1 <|> p2 by default implements backtracking with only a one token look-ahead. Restricting the look-ahead this way has two advantages: First, error reporting is simplified and error messages are easier to understand because fatal errors can only occur at one position at a time. Second, parser developers are guided towards more efficient grammar implementations because parsers requiring more than one token look-ahead need to be explicitly annotated with the attempt combinator.
The attempt combinator
val attempt: GenParser<'a,'s> -> GenParser<'a,'s>
takes one parser as the argument and will return a parser that behaves exactly like the argument, except that all ConsumedError Reply values are turned into EmptyError values (with the error messages appropriately modified). Thus, introducing attempt in the definitions of parsers allows backtracking over more than one token. For example, attempt p1 <|> p2 will always execute p2 when p1 fails, even if p1 consumes input before failing.
Parser combinators alone will not parse anything, you also need parsers that actually consume input. The parsers and types in the Primitives module are too general for this, they don't assume any particular state or input type. More specific parsers that operate on text-based input can be found in the CharParser module. All parsers provided by that module have the type
type Parser<'a,'u> = GenParser<'a,Charparser.State<'u>>
where 'a denotes the result type and 'u takes the place of a user defined state type, which in many cases will simply be unit. CharParser.State<'u> contains a reference to a CharStream providing the input, keeps track of the input position (including line and column numbers) and also stores a user-defined state of type 'u. Internally all of FParsec's char parsers receive their input through CharStreams, but for the moment you don't need to know more about these details.
Let us define a simple parser that parses an upper case letter followed by a '#' char:
#light
open FParsec.Primitives
open FParsec.CharParser
// our simple parser has no user state
type Parser<'a> = Parser<'a,unit>
let simple: Parser<_> = // the ": Parser<_>" annotation avoids
parse {let! c = upper // problems with F#'s value restriction [3]
do! skipChar '#' // [4]
return c}
Both upper and skipChar are parsers defined in the CharParser module.
| [3] | The use of type annotations is one of three ways to deal with the value restriction in F#. Without the type annotation simple would be a value of type Parser<char,'u>, which would violate the value restriction because 'u is a generic type variable. One alternative to using type annotations is to use the value in the same source file in a context where the general type is restricted to a specific type. For example, if the definition of simple was in the same source file as the call to run simple below, 'u would have been automatically restricted to unit. The problem with this approach is that it makes it difficult to copy code snippets to the interactive compiler prompt. Another alternative would be to turn simple into an explicit function by writing let simple st = parse { ... } st. The problem with this approach is that in certain situations it might lead to code that unnecessarily duplicates initialization work every time the function is called. |
| [4] | In case you wonder: The expression do! p is equivalent to let! () = p and can be used with parsers that have a unit result. The names of parsers that skip over the text without returning any result (other than unit) usually start with "skip". Not all parsers have a skipping alternative, but instead of do! skipChar '#' you can always use let! _ = char '#'. You could also use do! char '#' |>> ignore, which is very similar to what you would do in normal 'F#' code, though this alternative carries a small runtime penalty. |
Char parsers can be run on input with one of the runParser functions, which all return a value of type
type ParserResult<'a> = Success of 'a
| Failure of string * ParserError
In case of Success the ParserResult contains the result of the parser, otherwise it contains a human-readable error message and the corresponding ParserError. Most convenient for testing parser without user state on string input is
val run: Parser<'a,unit> -> string -> ParserResult<'a>
If we run our simple parser on the string "F#"
run simple "F#"
we get the output
val it : ParserResult<char> = Success 'F'
If instead we try to
run simple "C++"
we get the error message:
> val it : ParserResult<char> = Failure Error in Ln: 1 Col: 2 C++ ^ Expecting: '#'
The command to run our simple parser on a file named "test.txt" is a little bit more verbose:
runParserOnFile simple () "test.txt" System.Encoding.Default
Here we also need to specify the initial user state (unit) along with the text encoding that is assumed if it can not be detected automatically (System.Encoding.Default).
The CharParser module offers a lot of predefined parsers, including:
Follow the links into the reference section to find out more!
It is important that only "newline-aware" parsers like newline, spaces, skipped or manyTillString are used to parse newline characters. If you parse a newline with other parsers, for instance anyChar or anyString, you need to register the new line with registerNL, otherwise the parser state looses track of the correct line and column numbers. For this reason you will typically use anyChar in constructs similar to
newline <|> anyChar
Incidentally, the built-in parser anyCharOrNL has an equivalent definition.
As you can see above, FParsec generates quite readable error messages without any effort by the developer. This is possible because parsers provided by the CharParser module have appropriate "labels" attached and combinators from the Primitives module propagate error messages as necessary.
Error reporting in FParsec is based on the following principle: Parsers that fail or could have consumed more input return in their reply a ParserError which describes the input they expected or the reason they failed. All error messages for the same position in the input are collected together until either an elementary parser consumes input or the combined parser runs out of alternatives to try and finally fails.
The most frequently generated error message is an ExpectedError. This message is generally used when there is a clear notion of what kind of input would have allowed the parser to consume (more) input. Take for example the parser
let ident = letter <|> digit <|> char '_' run ident ".test"
which parses any letter, digit or underscore. If you apply this parser on the string ".test" you get the following output
> val it : ParserResult<char> = Failure Error in Ln: 1 Col: 1 .test ^ Expecting: '_', digit or letter
Each of the three elemental parsers letter, digit or char '_' gets applied to the first char of the input and fails with an ExpectedError. The choice operator <|> collects these errors and eventually fails with a list of all errors.
Sometimes an error message in terms of a higher-level grammar production is more helpful than an error message in terms of elementary (char) parsers. In such cases you can use the combinator
val (<?>): GenParser<'a,'s> -> string -> GenParser<'a,'s>
to attach a string label to the higher-level parser. p <?> label generates an Unexpected error message with the given string label, when p fails without consuming input:
let identL = letter <|> digit <|> char '_' <?> "identifier" run identL ".test" > val it : ParserResult<char> = Failure Error in Ln: 1 Col: 1 .test ^ Expecting: identifier
Note that all operators that begin with the char '<' have the same precedence and are left associative, hence the label "identifier" applies to the whole expression on the left hand side of <?>.
p <?> label does not overwrite the lower level errors if p consumes input before failing:
let ident2 = parse {do! skipChar '_'
let! c = letter <|> digit <|> char '_'
return c} <?> "identifier"
run ident2 "_.test"
>val it : ParserResult<char>
= Failure
Error in Ln: 1 Col: 2
_.test
^
Expecting: '_', digit or letter
In such cases you can provide more context by using the compound labelling combinator:
val (<??>): GenParser<'a,'s> -> string -> GenParser<'a,'s>
p <??> label works like p <?> label in case p fails without consuming input. But if p fails after consuming input, both an ExpectedError with the string label and the lower-level error messages generated by p are displayed:
let ident2L = parse {do! skipChar '_'
let! c = letter <|> digit <|> char '_'
return c} <??> "identifier"
run ident2L "_.test"
> val it : ParserResult<char>
= Failure
Error in Ln: 1 Col: 1
_.test
^
Expecting: identifier
identifier could not be parsed because:
Error in Ln: 1 Col: 2
_.test
^
Expecting: '_', digit or letter
If you would like to have more control over the generated error message, you can use the fail primitive as in the following example:
let ident2F = parse {do! skipChar '_'
let! p = getPos
let! c = letter <|> digit <|> char '_' <|>
fail ("Can't accept the char at index: " + p.Index.ToString())
return c}
run ident2F "_.test"
> val it : ParserResult<char>
= Failure
Error in Ln: 1 Col: 2
_.test
^
Can't accept the char at index: 1
The Primitives module offers many useful combinators that we haven't yet discussed.
The most prominent example is the many combinator and its various variants, which play the role of the operators * and + in regular expressions or the EBNF syntax. For example:
run (many digit) "123abc" >val it : ParserResult<char list> = Success ['1'; '2'; '3']
You should always prefer parsing sequences with the many combinator instead of parsing them with "hand-rolled" recursive parsers. The reason is that many only uses constant stack space while a recursive parser's stack usage will grow at least linearly in the input. (In case you wonder: recursive parsers can not be implemented in a truly tail recursive way using standard parser sequencing.)
By the way, if you want to parse many chars as a string, you should also have a look at manyChars, manySatisfy and manyTillString. If you need to parse sequences that are separated by a separator, you will find the combinators sepBy, endBy and sepEndBy useful.
When parsing many alternatives you should prefer the choice combinator to long sequences of <|> expressions. Writing choice [p1; p2; p3; ...] instead of p1 <|> p2 <|> p3 <|> ... will yield a more efficient parser.
The chaining operators >>. and .>> can safe a lot of typing. Their definition is equivalent to
let (>>.) p q = parse {let! _ = p
return! q}
let (.>>) p q = parse {let! r = p
let! _ = q
return r}
You could for example use .>> to parse a sequence of digits followed by semicolons
run (many (digit .>> char ';')) "1;2;3;" > val it : ParserResult<char list> = Success ['1'; '2'; '3']
or use both operators to parse a digit in brackets
run (char '(' >>. digit .>> char ')') "(1)"
> val it : ParserResult<char> = Success '1'
If you find p >>. preturn x still too verbose, you could use p >>$ x instead. Similarly you can use p <|>$ x as an abbreviation for p <|> preturn x.
Of course, there is also a parser equivalent of F#'s beloved pipeline operator
let (|>>) p f = parse {let! x = parser
return f x}
You could use this operator for example to parse a digit as an integer
run (digit |>> fun c -> Char.code c - Char.code '0') "1" > val it : ParserResult<int> = Success 1
or to redefine the skipChar parser:
let skipChar c = char `c` |>> ignore
The combinators lookAhead, followedBy and notFollowedBy are helpful when you want to parse or check for input without consuming it. With notFollowedBy you could for example write a parser that parses any char sequence ended by ";;"
run (manyChars (notFollowedBy (string ";;") ";;" >>. anyCharOrNL) .>> string ";;") "123;456;;" > val it : ParserResult<string> = Success "123;456"
Although in this case it would be more efficient to use the CharParser primitive manyTillString.
If you need to parse expressions involving infix, prefix, postfix or ternary operators, you will fill the OperatorPrecedenceParser class useful. The calculator sample that is shipped with FParsec shows how to use this class to implement a parser for arithmetic expressions.
Many real world parsers require additional state, for instance to hold a symbol table in the case of a compiler or to hold a list of references in the case of a text processing tool. Where possible such user defined state should be explicitly incorporated into the parser state, not introduced through side-effects of parsers.
FParsec directly supports user definable state through the CharParser.State type and the getState, setState and updateState parsers. These parsers can be used as in the following example of a parser that skips over pairs of (nested) parenthesis and counts them
let rec skipParens = parse {do! skipChar '('
do! optional skipParens
do! skipChar ')'
let! count = getState
do! setState (count + 1)
return ()}
runParserOnString (skipMany skipParens >>. getState) 0 "" "((()))()()"
> val it : ParserResult<int> = Success 5
If the built-in parsers are too inconvenient or not efficient enough for a particular purpose, you can easily write your own parser primitives. Here are a few useful hints for writing your own (char) parser function:
The interface of FParsec has the following structure:
module FParsec.Primitives open FParsec.Error type IState = interface abstract member Pos: Pos end type ReplyFlags = None = 0 | Ok = 1 | Consumed = 2 // type Reply<'a, 's> = // | ConsumedOk of 'a * 's * ParserError // | ConsumedError of ParserError // | EmptyOk of 'a * 's * ParserError // | EmptyError of ParserError type Reply<'a,'s> when 's :> IState = struct val mutable State: 's val mutable Result: 'a val mutable Error: ParserError val mutable Flags: ReplyFlags member IsOk: bool member IsError: bool member IsConsumed: bool member IsEmpty: bool member IsConsumedOk: bool // = IsConsumed && isOk member IsConsumedError: bool // ... member IsEmptyOk: bool member IsEmptyError: bool interface System.IEquatable<Reply<'a,'s>> end
// constructor functions for Reply values val ConsumedOk: 'a * 's * ParserError -> Reply<'a,'s> val ConsumedOK: 'a -> 's -> Reply<'a,'s> val EmptyOk: 'a * 's * ParserError -> Reply<'a,'s> val EmptyOK: 'a -> 's -> Reply<'a,'s> val ConsumedError: ParserError -> Reply<'a,'s> val EmptyError: ParserError -> Reply<'a,'s> val (|ConsumedOk|ConsumedError|EmptyOk|EmptyError|): Reply<'a,'s> -> Choice<('a * 's * ParserError),ParserError, ('a * 's * ParserError),ParserError> type GenParser_<'a,'s> when 's :> IState = 's -> Reply<'a,'s> type ParserResult_<'a> = | Success of 'a | Failure of string * ParserError val applyParser : GenParser<'a,'s> -> 's -> ParserResult<'a> // the basic primitives val preturn: 'a -> GenParser<'a,'s> val (>>=): GenParser<'a,'s> -> ('a -> GenParser<'b,'s>) -> GenParser<'b,'s> val (|>>=): GenParser<'a,'s> -> 'a -> GenParser<'b,'s> -> GenParser<'b,'s> val (<|>): GenParser<'a,'s> -> GenParser<'a,'s> -> GenParser<'a,'s> val pzero: GenParser<'a,'s> // the "builder object" for parse expressions type ParserCombinator = class new : unit -> ParserCombinator member Delay: (unit -> GenParser<'a,'s>) -> GenParser<'a,'s> member Return: 'a -> GenParser<'a,'s> member Bind: GenParser<'a,'s> -> ('a -> GenParser<'b,'s>) -> GenParser<'b,'s> member Zero: unit -> GenParser<'a,'s> member TryWith: GenParser<'a,'s> * (exn -> GenParser<'a,'s>) -> GenParser<'a,'s> member TryFinally: GenParser<'a,'s> * (unit -> unit) -> ('s -> Reply<'a,'s>) end val parse : ParserCombinator // more primitives val attempt: GenParser<'a,'s> -> GenParser<'a,'s> val (<?>): GenParser<'a,'s> -> string -> GenParser<'a,'s> val (<??>): GenParser<'a,'s> -> string -> GenParser<'a,'s> val message: string -> GenParser<'a,'s> val fail: string -> GenParser<'a,'s> val lookAhead: GenParser<'a,'s> -> GenParser<'a,'s> val followedBy: GenParser<'a,'s> -> string -> GenParser<unit,'s> val notFollowedBy: GenParser<'a,'s> -> string -> GenParser<unit,'s> // some additional operators for convenience (and performance) val choice: GenParser<'a,'s> list -> GenParser<'a,'s> val (<|>$): GenParser<'a,'s> -> 'a -> GenParser<'a,'s> val (|>>): GenParser<'a,'s> -> ('a -> 'b) -> GenParser<'b,'s> val (>>$): GenParser<'a,'s> -> 'b -> GenParser<'b,'s> val (.>>): GenParser<'a,'s> -> GenParser<'b,'s> -> GenParser<'a,'s> val (>>.): GenParser<'a,'s> -> GenParser<'b,'s> -> GenParser<'b,'s> val (>>?): GenParser<'a,'s> -> GenParser<'b,'s> -> GenParser<'b,'s> val (.>>?): GenParser<'a,'s> -> GenParser<'b,'s> -> GenParser<'a,'s>
// various variants of the many combinator val many: GenParser<'a,'s> -> GenParser<'a list,'s> val many1: GenParser<'a,'s> -> GenParser<'a list,'s> val manyRev: GenParser<'a,'s> -> GenParser<'a list,'s> val many1Rev: GenParser<'a,'s> -> GenParser<'a list,'s> val skipMany: GenParser<'a,'s> -> GenParser<unit,'s> val skipMany1: GenParser<'a,'s> -> GenParser<unit,'s> val manyFoldLeft: ('a -> 'b -> 'a) -> 'a -> GenParser<'b,'s> -> GenParser<'a,'s> val many1FoldLeft: ('a -> 'b -> 'a) -> 'a -> GenParser<'b,'s> -> GenParser<'a,'s> val manyFoldRight: ('a -> 'b -> 'a) -> 'a -> GenParser<'b,'s> -> GenParser<'a,'s> val many1FoldRight: ('a -> 'b -> 'a) -> 'a -> GenParser<'b,'s> -> GenParser<'a,'s> val count: int -> GenParser<'a,'s> -> GenParser<'a list,'s> val skipCount: int -> GenParser<'a,'s> -> GenParser<unit,'s> // generalizations of |>> val pipe2: GenParser<'a,'s> -> GenParser<'b,'s> -> ('a -> 'b -> 'c) -> GenParser<'c,'s> val pipe3: GenParser<'a,'s> -> GenParser<'b,'s> -> GenParser<'c,'s> -> ('a -> 'b -> 'c -> 'd) -> GenParser<'c,'s> val pipe4: GenParser<'a,'s> -> GenParser<'b,'s> -> GenParser<'c,'s> -> GenParser<'d,'s> -> ('a -> 'b -> 'c -> 'd -> 'e) -> GenParser<'e,'s> // some derived combinators val manyTill: GenParser<'a,'s> -> GenParser<'b,'s> -> GenParser<'a list,'s> val skipManyTill: GenParser<'a,'s> -> GenParser<'b,'s> -> GenParser<unit,'s> val opt: GenParser<'a,'s> -> GenParser<'a option,'s> val option: 'a -> GenParser<'a,'s> -> GenParser<'a,'s> val optional: GenParser<'a,'s> -> GenParser<unit,'s> val trySkip: GenParser<'a,'s> -> GenParser<bool,'s> val pair: GenParser<'a,'s> -> GenParser<'b,'s> -> GenParser<('a * 'b),'s> val triple: GenParser<'a,'s> -> GenParser<'b,'s> -> GenParser<'c,'s> -> GenParser<('a * 'b * 'c),'s> val quad: GenParser<'a,'s> -> GenParser<'b,'s> -> GenParser<'c,'s> -> GenParser<'d,'s> -> GenParser<('a * 'b * 'c * 'd),'s> val between: GenParser<'a,'s> -> GenParser<'b,'s> -> GenParser<'c,'s> -> GenParser<'c,'s> val sepBy: GenParser<'a,'s> -> GenParser<'b,'s> -> GenParser<'a list,'s> val sepBy1: GenParser<'a,'s> -> GenParser<'b,'s> -> GenParser<'a list,'s> val sepEndBy: GenParser<'a,'s> -> GenParser<'b,'s> -> GenParser<'a list,'s> val sepEndBy1: GenParser<'a,'s> -> GenParser<'b,'s> -> GenParser<'a list,'s> val endBy: GenParser<'a,'s> -> GenParser<'b,'s> -> GenParser<'a list,'s> val endBy1: GenParser<'a,'s> -> GenParser<'b,'s> -> GenParser<'a list,'s> val chainl: GenParser<'a,'s> -> GenParser<('a -> 'a -> 'a),'s> -> 'a -> GenParser<'a,'s> val chainr: GenParser<'a,'s> -> GenParser<('a -> 'a -> 'a),'s> -> 'a -> GenParser<'a,'s> val chainl1: GenParser<'a,'s> -> GenParser<('a -> 'a -> 'a),'s> -> GenParser<'a,'s> val chainr1: GenParser<'a,'s> -> GenParser<('a -> 'a -> 'a),'s> -> GenParser<'a,'s> // a helper function for defining mutually recursive parser values val createParserForwardedToRef: unit -> GenParser<'a,'s> * GenParser<'a,'s> ref
IState is the interface type that any state type which is used with FParsec's Reply type has to support. It is currently defined as a type abbreviation for an interface type defined in FParsecCS.dll:
type IState = FParsec.IState
This definition is equivalent to
type IState = interface
abstract member Pos: Pos
end
The single member Pos is used by FParsec's error handling routines to read out position information from a state.
Reply is the return type of all parsers in FParsec. For optimization reasons it is currently defined as a type abbreviation for a struct type defined in FParsecCS.dll:
type Reply<'a,'s> when 's :> IState = FParsec.Reply<'a,'s,ParserError>
This declaration is equivalent to:
type Reply<'a,'s> when 's :> IState = struct
val mutable State: 's
val mutable Result: 'a
val mutable Error: ParserError
val mutable Flags: ReplyFlags
member IsOk: bool
member IsError: bool
member IsConsumed: bool
member IsEmpty: bool
member IsConsumedOk: bool // = IsConsumed && IsOk
member IsConsumedError: bool // ...
member IsEmptyOk: bool
member IsEmptyError: bool
interface System.IEquatable<Reply<'a,'s>>
end
ReplyFlags is an enumeration type defined as:
type ReplyFlags = None = 0
| Ok = 1
| Consumed = 2
The Flags member of the Reply type indicates whether a parser succeeded and whether it consumed input (i.e. has changed the input position). The properties IsOk, IsError and IsConsumed, IsEmpty provide a convenient way to evaluate whether or not the flags Ok and Consumed are set.
In case the parser has succeeded, the members result and state hold the respective output values, otherwise result and state hold null/zero values. The error member should always hold a proper value, though in case of a successful parser run that will usually be a NoError, which is represented as a null value.
The recommended practice for user code is to treat Reply as an immutable type and to construct Reply values with the respective constructor functions. The Primitives module also defines an active pattern, so that you can pattern match over Reply values as if they were defined as a discriminated union type:
type Reply<'a, 's> = | ConsumedOk of 'a * 's * ParserError | ConsumedError of ParserError | EmptyOk of 'a * 's * ParserError | EmptyError of ParserError
With the help of this active pattern one can pattern match over the Reply struct as if it was a discriminated union type:
match reply with | EmptyError error -> ... | ConsumedError error -> ... | EmptyOk (result, state, error) -> ... | ConsumedOk (result, state, error) -> ...
This class is defined as
type ParserCombinator() =
member t.Delay(f) = fun state -> (f ()) state
member t.Return(x) = preturn x
member t.Bind(p, f) = p >>= f
member t.Zero() = pzero
member t.TryWith(p, cf) = fun state -> try p state
with e -> (cf e) state
member t.TryFinally(p, ff) = fun state -> try p state
finally ff ()
Instances of this class can be used to build parsers using F#'s computation expression syntax. The default instance for this purpose is parse.
Some constructs supported by parse and their translations are:
let! pat = expr in cexpr ==> expr >>= (fun pat -> cexpr) let pat = expr in cexpr ==> let pat = expr in cexpr do! expr in cexpr ==> expr >>= (fun () -> cexpr) do expr in cexpr ==> expr; cexpr if expr then cexpr1 ==> if expr then cexpr1 else cexpr2 else cexpr2 if expr then cexpr ==> if expr then cexpr1 else pzero return exp ==> preturn rexpr return! expr ==> expr
where expr is any F# expression and cexpr stands for a computational expression, which in our case always is an expression of type GenParser<_,_>. You need to use the '!'-constructs whenever you have a right hand side expression that results in a parser.
These constructs can be used as in the following example
parse {do! skipChar `(`
let! d = digit
let n = Char.code d - Char.code '0'
do! skipChar `)`
if n > 0 then
return n
else
return -1}
which F# translates into something equivalent to
fun state ->
(skipChar `(`
>>= fun () ->
digit
>>= fun d ->
let n = Char.code d - Char.code '0'
skipChar `)`
>>= fun () ->
if n > 0 then preturn n
else preturn 0
) state
The parser attempt p behaves like p but pretends it hasn't consumed any input when p fails. This is done by converting all ConsumedError Reply values into EmptyError values and wrapping the attached error messages into a BacktrackPoint.
Introducing attempt in the definitions of parsers allows backtracking over more than one token. For example, attempt p1 <|> p2 will always execute p2 when p1 fails, even if p1 consumes input before failing.
The parser p <?> "label" behaves like p, but when p fails without consuming input, the error message is replaced with an ExpectedError.
The label combinator is used to return error messages in terms of higher level grammar productions rather than lower level elementary parsers. For example,
let identifier = letter <|> digit <|> char '_' run identifier ".test"
gives the error message
> val it : ParserResult<char> = Failure Error in Ln: 1 Col: 1 .test ^ Expecting: '_', digit or letter
while
let identifier = letter <|> digit <|> char '_' <?> "identifier" run identifier ".test"
gives the error message
> val it : ParserResult<char> = Failure Error in Ln: 1 Col: 1 .test ^ Expecting: identifier
The parser p <??> "label" behaves like p <?> "label", but when p fails after consuming input, a CompoundError is generated with the error that occured and the label that was given to the compound.
For example,
let identifier = char '_' >>. (letter <|> digit) <?> "identifier" run identifier "_.test"
gives the error message
> val it : ParserResult<char> = Failure Error in Ln: 1 Col: 2 _.test ^ Expecting: digit or letter
while
let identifier = char '_' >>. (letter <|> digit) <??> "identifier" run identifier "_.test"
gives the error message
> val it : ParserResult<char> = Failure Error in Ln: 1 Col: 1 _.test ^ Expecting: identifier identifier could not be parsed because: Error in Ln: 1 Col: 2 _.test ^ Expecting: digit or letter
p1 .>> p2 is equivalent to
parse {let! x = p1
let! _ = p2
return x}
p1 >>. p2 is equivalent to
parse {let! _ = p1
let! y = p2
return y}
p1 >>? p2 behaves like p1 >>. p2 but will backtrack to the beginning if p2 fails without consuming input even if p1 has consumed input.
This combinator is useful in situations where parsers are only partially left-factored, for example:
let tagsA = char '<' >>? choice [string "tag1";
string "tag2";
string "tag3";] >>. '>'
let tagsB = char '<' >>? choice [string "tag4";
string "tag5";
string "tag6";] >>. '>'
let tags = spaces >>? (tagsA <|> tagsB)
many p repeatedly applies the parser p for until p fails. It returns a list of the results returned by p. In case p returns a result without consuming input a FailureException is raised to prevent many p from entering an infinite loop.
Ignoring efficiency issues and the infinite recursion case, many could be defined as
let rec many p = parse {let! x = p
let! xs = many p
return x::xs} <|>$ []
many1 p repeatedly applies the parser p until p fails, but for a minimum of one times. It returns a list of the results returned by p. In case p returns a result without consuming input a FailureException is raised to prevent many1 p from entering an infinite loop.
many1 p is equivalent to
parse {let! x = p
let! xs = many p
return x::xs}
count n p applies the parser p n times. It returns a list of the results returned by p.
The definition of count is equivalent to
let rec count n p =
parse {if n > 0 then
let! x = p
let! xs = count (n - 1) p
return x::xs
else
return []}
pipe2 p1 p2 f is equivalent to
parse {let! a = p1
let! b = p2
return f a b}
pipe3 p1 p2 p3 f is equivalent to
parse {let! a = p1
let! b = p2
let! c = p3
return f a b c}
pipe4 p1 p2 p3 p4 f is equivalent to
parse {let! a = p1
let! b = p2
let! c = p3
let! d = p4
return f a b c d}