5.9 Parsing with user state
Each CharStream<'u>
holds a value of the freely definable user state type 'u
. In previous chapters we just ignored the user state and always assumed 'u
to be unit
. In this section we finally get to discuss the
purpose of the user state and how you can use it in your parsers.
5.9.1 Overview
The user state allows you to introduce additional variables into the state tracked by FParsec parsers. It has the following two important properties:
-
The user state is stored in the
CharStream<'u>
instance and hence associated with the input. It is not shared globally and not associated with particular parser instances. The same parser instances can be concurrently applied to differentCharStream<'u>
instances with different user state instances. - The user state is tracked by FParsec parsers together with the input stream position. This means in particular that a parser restores the previous user state value when it backtracks.
If you want changes to the user state to be undone during backtracking, you must change the user state by assigning a new value to the user state, not by mutating an existing user state value.
With the help of the user state you can implement context sensitive parsers, i.e. parsers whose behaviour not only depends on the immediate input but also on the context of the input. In general this works as follows:
- You establish a context by defining variables in the user state.
- You update the context depending on the input by letting parsers update the user state.
- You parse input depending on the context by making the parser behaviour dependent on the user state variables.
The user state is exposed through the UserState
property of the CharStream<'u>
. You can implement parsers using the low‐level API that directly access this property, or you can use the
following parser primitives from the CharParsers
module:
The next section contains an example employing updateUserState
to change the user state and userStateSatisfies
to check for parser
preconditions.
5.9.2 Recursive grammars with nesting restrictions
An important area of application for context sensitive parsers are recursive grammars where certain grammar elements cannot nest within others or where grammar elements need to be parsed differently depending on the nesting context.
Consider for example a textual markup languages like HTML. Many such markup languages support various “inline tags” to annotate text in a paragraph. Usually these inline tags can nest arbitrarily, except for a few tags with special restrictions. One of these restrictions often is that hyperlinks must not contain hyperlinks, even though they can contain any other inline content. Other restrictions may apply to elements allowed in superscript text or footnotes. A convenient way to enforce such restrictions during parsing is to introduce variables into the user state that keep track of the nesting context. The following example demonstrates this approach.[1]
The following parser for a tiny markup‐language employs the user state
- to ensure that nested hyperlinks are not accepted and
-
to parse potentially nested quotations between matching pairs of
'\''
or'\"'
chars.
open FParsec type Element = Text of string | Bold of Element list | Italic of Element list | Url of string * Element list | Quote of char * Element list type UserState = {InLink: bool QuoteStack: char list} with static member Default = {InLink = false; QuoteStack = []} let ws = spaces let ws1 = spaces1 let str s = pstring s let elements, elementsR = createParserForwardedToRef() let text = many1Satisfy (isNoneOf "<>'\"\\") |>> Text let escape = str "\\" >>. (anyChar |>> (string >> Text)) let quote (q: char) = let pq = str (string q) let pushQuote = updateUserState (fun us -> {us with QuoteStack = q::us.QuoteStack}) let popQuote = updateUserState (fun us -> {us with QuoteStack = List.tail us.QuoteStack}) let isNotInQuote = userStateSatisfies (fun us -> match us.QuoteStack with | c::_ when c = q -> false | _ -> true) isNotInQuote >>. between pq pq (between pushQuote popQuote (elements |>> fun ps -> Quote(q, ps))) // helper functions for defining tags let tagOpenBegin tag = str ("<" + tag) >>? nextCharSatisfiesNot isLetter // make sure tag name is complete <?> "<" + tag + "> tag" let tagOpen tag = tagOpenBegin tag >>. str ">" let tagClose tag = str ("</" + tag + ">") let tag t p f = between (tagOpen t) (tagClose t) (p |>> f) let attributeValue = ws >>. str "=" >>. ws >>. between (str "\"") (str "\"") (manySatisfy (isNoneOf "\n\"")) let attribute s = str s >>. attributeValue let nonNestedTag tag pAttributesAndClose pBody f isInTag setInTag setNotInTag = tagOpenBegin tag >>. ((fun stream -> if not (isInTag stream.UserState) then stream.UserState <- setInTag stream.UserState Reply(()) else // generate error at start of tag stream.Skip(-tag.Length - 1) Reply(FatalError, messageError ("Nested <" + tag + "> tags are not allowed."))) >>. pipe2 pAttributesAndClose pBody f .>> (tagClose tag >>. updateUserState setNotInTag)) // the tags let bold = tag "b" elements Bold let italic = tag "i" elements Italic let url = nonNestedTag "a" (ws >>. attribute "href" .>> (ws >>. str ">")) elements (fun url phrases -> Url(url, phrases)) (fun us -> us.InLink) (fun us -> {us with InLink = true}) (fun us -> {us with InLink = false}) let element = choice [text escape quote '\'' quote '\"' bold italic url] do elementsR:= many element let document = elements .>> eof
> runParserOnString document UserState.Default "" "A \"'text' with 'nested \"<b>quotes</b>\"'.\"";; val it : ParserResult<Element list,UserState> = Success: [Text "A "; Quote ('"', [Quote ('\'',[Text "text"]); Text " with "; Quote ('\'',[Text "nested "; Quote ('"',[Bold [Text "quotes"]])]); Text "."])] > runParserOnString document UserState.Default "" @"<b>Text <i></i>with</b> <a href=""url"">'link' but no \<blink\></a>";; val it : ParserResult<Element list,UserState> = Success: [Bold [Text "Text "; Italic []; Text "with"]; Text " "; Url("url", [Quote ('\'',[Text "link"]); Text " but no "; Text "<"; Text "blink"; Text ">"])] > runParserOnString document UserState.Default "" "<a href=\"url\"><a href=\"nested\">test</a></a>";; val it : ParserResult<Element list,UserState> = Failure: Error in Ln: 1 Col: 15 <a href="url"><a href="nested">test</a></a> ^ Nested <a> tags are not allowed.
5.9.3 Parameterizing a parser through the user state
The user state is also a good place to store parser configuration data that is specific to a “parser job”. For example, a compiler
that processes multiple compilation units could put configuration data that is specific to the compilation unit, e.g. include paths, into the
user state and then parse different compilation units with the same Parser
instance, like in the following code:
type CompilationUnitAST = (* ... *) type UserState = { IncludePaths = string list (* ... *) } let parser : Parser<CompilationUnitAST, UserState> = (* ... *) let parseCompilationUnit file encoding includePaths (* ... *) = let initialUserState = {IncludePaths = includePaths; (* ... *)} runParserOnFile parser initialUserState file encoding
[1] | An alternative way to handle such restrictions at the parser level would be to define separate instances of the parser for each possible combination of restrictions, e.g. separate parsers for inline elements at the top level, for inline elements within hyperlinks, for elements within hyperlinks within superscript text and so on. However, with an increasing number of restrictions this approach quickly falls victim to the combinatorial explosion caused by the recursive nature of the involved parsers. |
---|