6.11 FParsec.CharStream
6.11.1 CharStream
Provides read‐access to a sequence of UTF‐16 chars.
6.11.1.1 Interface
// FParsecCS.dll namespace FParsec type CharStream = interface System.IDisposable new: chars: string * index: int * length: int -> CharStream new: chars: string * index: int * length: int * streamBeginIndex: int64 -> CharStream new: chars: char[] * index: int * length: int -> CharStream new: chars: char[] * index: int * length: int * streamBeginIndex: int64 -> CharStream new: chars: NativePtr<char> * length: int -> CharStream new: chars: NativePtr<char> * length: int * streamBeginIndex: int64 -> CharStream new: path: string * encoding: System.Text.Encoding -> CharStream new: path: string * encoding: System.Text.Encoding * detectEncodingFromByteOrderMarks: bool -> CharStream new: path: string * encoding: System.Text.Encoding * detectEncodingFromByteOrderMarks: bool * blockSize: int * blockOverlap: int * minRegexSpace: int * byteBufferLength: int -> CharStream new: stream: System.IO.Stream * encoding: System.Text.Encoding -> CharStream new: stream: System.IO.Stream * leaveOpen: bool * encoding: System.Text.Encoding -> CharStream new: stream: System.IO.Stream * leaveOpen: bool * encoding: System.Text.Encoding * detectEncodingFromByteOrderMarks: bool -> CharStream new: stream: System.IO.Stream * leaveOpen: bool * encoding: System.Text.Encoding * detectEncodingFromByteOrderMarks: bool * blockSize: int * blockOverlap: int * minRegexSpace: int * byteBufferLength: int -> CharStream member Dispose: unit -> unit member BlockOverlap: int member IndexOfFirstChar: int64 member IndexOfLastCharPlus1: int64 member IsBeginOfStream: bool member IsEndOfStream: bool member Index: int64 member IndexToken: CharStreamIndexToken member Line: int64 member LineBegin: int64 member Column: int64 member Name: string with get, set member Position: Position val mutable StateTag: uint64 member Seek: index: int64 -> unit member Seek: indexToken: CharStreamIndexToken -> unit static val EndOfStreamChar: char member Peek: unit -> char member Peek2: unit -> TwoChars member Peek: utf16Offset: int -> char member Peek: utf16Offset: uint32 -> char member PeekString: length: int -> string member PeekString: buffer: char[] * bufferIndex: int * length: int -> int member PeekString: buffer: NativePtr<char> * length: int -> int member Match: char -> bool member Match: chars: string -> bool member Match: chars: char[] * charsIndex: int * length: int -> bool member Match: chars: NativePtr<char> * length: int -> bool member MatchCaseFolded: caseFoldedChar: char -> bool member MatchCaseFolded: caseFoldedChars: string -> bool member MatchCaseFolded: caseFoldedChars: NativePtr<char> * length:int -> bool member Match: System.Text.RegularExpressions.Regex -> System.Text.RegularExpressions.Match member MinRegexSpace: int with get, set member RegisterNewline: unit -> bool member RegisterNewlines: lineOffset: int -> newColumnMinus1: int -> bool member RegisterNewlines: lineOffset: int64 -> newColumnMinus1: int64 -> bool // The following methods require manual registration of skipped newlines member Skip: unit -> unit member Skip: utf16Offset: int -> unit member Skip: utf16Offset: uint32 -> unit member Skip: utf16Offset: int64 -> unit member SkipAndPeek: unit -> char member SkipAndPeek: utf16Offset: int -> char member SkipAndPeek: utf16Offset: uint32 -> char member Skip: char -> bool member Skip: TwoChars -> bool member Skip: chars: string -> bool member Skip: chars: char[] * charsIndex: int * length: int -> bool member Skip: chars: NativePtr<char> * length: int -> bool member SkipCaseFolded: caseFoldedChar: char -> bool member SkipCaseFolded: caseFoldedChars: string -> bool member SkipCaseFolded: caseFoldedChars: NativePtr<char> * length:int -> bool member Read: unit -> char member Read: length: int -> string member Read: buffer: char[] * bufferIndex: int * length: int -> int member Read: buffer: NativePtr<char> * length: int -> int member ReadFrom: indexOfFirstChar: CharStreamIndexToken -> string // The following methods automatically register skipped newlines member SkipWhitespace: unit -> bool member SkipUnicodeWhitespace: unit -> bool member SkipNewline: unit -> bool member SkipUnicodeNewline: unit -> bool member SkipNewlineThenWhitespace: powerOf2TabStopDistance: int * allowFormFeed: bool -> int member SkipRestOfLine: skipNewline: bool -> unit member ReadRestOfLine: skipNewline: bool -> string member ReadCharOrNewline: unit -> char member SkipCharsOrNewlines: maxCount: int -> int member ReadCharsOrNewlines: maxCount: int * normalizeNewlines: bool -> string member SkipCharsOrNewlinesWhile: predicate: (char -> bool) -> int member SkipCharsOrNewlinesWhile: predicateForFirstChar: (char -> bool) * predicate: (char -> bool) -> int member SkipCharsOrNewlinesWhile: predicate: (char -> bool) * minCount: int * maxCount: int -> int member SkipCharsOrNewlinesWhile: predicateForFirstChar: (char -> bool) * predicate: (char -> bool) * minCount: int * maxCount: int -> int member ReadCharsOrNewlinesWhile: predicate: (char -> bool) * normalizeNewlines: bool -> string member ReadCharsOrNewlinesWhile: predicateForFirstChar: (char -> bool) * predicate: (char -> bool) * normalizeNewlines: bool -> string member ReadCharsOrNewlinesWhile: predicate: (char -> bool) * minCount: int * maxCount: int * normalizeNewlines: bool -> string member ReadCharsOrNewlinesWhile: predicateForFirstChar: (char -> bool) * predicate: (char -> bool) * minCount: int * maxCount: int * normalizeNewlines: bool -> string member SkipCharsOrNewlinesUntilString: str: string * maxCount: int * foundString: out<bool> -> int member SkipCharsOrNewlinesUntilString: str: string * maxCount: int * normalizeNewlines: bool * skippedCharsIfStringFoundOtherwiseNull: out<string> -> int member SkipCharsOrNewlinesUntilCaseFoldedString: caseFoldedString: string * maxCount: int * foundString: out<bool> -> int member SkipCharsOrNewlinesUntilCaseFoldedString: caseFoldedString: string * maxCount: int * normalizeNewlines: bool * skippedCharsIfStringFoundOtherwiseNull: out<string> -> int
6.11.1.2 Remarks
The CharStream class provides a unified interface for efficiently reading UTF‐16 chars from
a binary stream or an in‐memory char buffer (e.g. a string). It is optimized for the use in backtracking parser applications and supports
arbitrary char‐based seeking, even for streams larger than the addressable memory (on 32‐bit platforms).
The CharStream class is the base class of CharStream<'TUserState>, which adds a user‐definable state component and some convenience methods for working with the state
of a CharStream instance.
A CharStream constructed from a System.IO.Stream or a file path reads the stream
block‐wise and only holds the most recently accessed block in memory. The blocks overlap in order to provide efficient access on the
boundary between blocks.
If the char content is already available as a string or a char array, a CharStream can be
directly constructed from the char buffer (without needing to copy the buffer). The overhead of accessing an in‐memory char buffer through a
CharStream is minimal.
- Position information
-
The position of the next char in the stream is described by the following 4 properties:
Among these properties the char index is the most important one, as the
CharStreamuses it to uniquely identify a UTF‐16 char in the stream.The other 3 properties further describe the text location of the char identified by the index, but they are not necessary for the core functionality of the
CharStreamclass. TheCharStreamclass keeps track of this additional position information to provide a more convenient interface to higher‐level library functions, in particular to assist debugging and error reporting purposes. - Newlines
-
For performance reasons the most basic stream operations do not automatically recognize newlines (end‐of‐line markers) in the stream content. If you skip any newline with these methods, you have to manually register the newline afterwards with one of the
RegisterNewlinemethods (otherwise the line and column count becomes incorrect).In order to provide a convenient interface for parser routines, the
CharStreamclass also provides some more advanced methods that automatically register any skipped standard newline ("\n","\r\n"and"\r"). Additionally, it provides two methods that automatically register any Unicode newline (SkipUnicodeWhitespaceandSkipUnicodeNewline).It should be obvious from the method names which methods automatically register newlines and which don’t.
- Case‐insensitive matching
-
The
MatchCaseFoldedandSkipCaseFoldedmembers match the content of the stream “case‐insensitively” with a reference string. In this instance “case‐insensitive” means that before the chars are matched with the reference string they are mapped to a canonical form where case differences are erased. For performance reasonsMatchCaseFoldedonly applies the (non‐Turkic) 1‐to‐1 case folding mappings (v. 8.0.0) for Unicode code points in the Basic Multilingual Plane, i.e. code points below 0x10000. These mappings are sufficient for many case‐insensitive parser grammars encountered in practice, but they are not appropriate for matching arbitrary natural language content. Please also note that theCharStreamclass performs no Unicode normalization. - Non‐sequential access
-
This note does not apply to the Low‐Trust version of FParsec.
If you construct aCharStreamfrom aSystem.IO.Streamor a file path and you backtrack over a distance long enough to require theCharStreamto reread a previous block, then the underlying byte stream needs to support seeking, otherwise aNotSupportedExceptionis thrown. Furthermore, the Decoder for the input Encoding must be serializable if you backtrack to a block other than the first in the stream. Note that file streams created for regular disk files are always seekable and all the .NET standard decoders are serializable. In order to support non‐seekable streams for applications which don’t require extensive backtracking, no exception will be thrown before an operation actually requires backtracking and the necessary capabilities of the stream or decoder are not available. - Decoder errors
-
A
CharStreamconstructed from a binary input stream decodes the input data with the help of aDecoderinstance obtained via theEncodings’sGetDecodermethod. Depending on the configuration of the encoding the decoder might throw an exception if it encounters invalid byte sequences, usually aSystem.Text.DecoderFallbackExceptionor aSystem.IO.ArgumentException. [1] - Disposable interface
-
This note does not apply to the Low‐Trust version of FParsec.
ACharStreamholds managed and unmanaged resources that need to be explicitly released. Hence, it is very important thatCharStreamobjects are promptly disposed after use. Where possibleCharStreamobjects should only be used within a “using” block (C#), a “use” expression( F#) or similar constructs in other languages. - Thread safety
-
CharStreammembers are not thread‐safe. - Low‐Trust version
-
If you compile FParsec with the
LOW_TRUSTconditional compiler symbol, theCharStreamclass differs from the normal version as follows:- No unverifiable code involving pointers is used. This allows FParsec to be executed in an environment with reduced trust, such as medium trust ASP.NET applications or Silverlight applications.
-
A
CharStreamthat is constructed from aSystem.IO.Streamor a file path reads the complete file into a single string during construction. This severely limits the maximum practical stream size. -
Although the
CharStreamclass still supports theIDisposableinterface, disposing theCharStreaminstances is no longer necessary, since no resources are hold that need to be explicitly released.
See also section 3.5.
6.11.1.3 I/O exceptions
If you construct a CharStream from a System.IO.Stream or a file path, the constructor and any CharStream operation that requires reading chars from the underlying byte stream may throw one of
the following exceptions.
In the Low‐Trust version, the constructor decodes the complete byte stream and hence only the constructor may throw one of these exceptions.
Doing actual work in a constructor and potentially throwing exceptions seems to be a somewhat controversial design. We think it’s the right
choice for the CharStream class, because this way you can a have a reasonable
expectation that the CharStream actually works after you’ve successfully constructed it.
In general it is not safe to continue to use a CharStream instance after one of
these exceptions was thrown, though calling Dispose() is always safe.
NotSupportedException-
Seeking of the underlying byte stream is required, but the byte stream does not support seeking or the
Encoding’sDecoderis not serializable. See also the remarks above on non‐sequential access. IOException-
An I/O occurred while reading data from the underlying byte stream.
ArgumentException-
The underlying byte stream contains invalid bytes and the
Encodingwas constructed with thethrowOnInvalidBytesoption. DecoderFallbackException-
The underlying byte stream contains invalid bytes for which the decoder fallback threw this exception.
The byte index of the invalid bytes in the stream is stored as a boxed
System.Int64in the"Stream.Position"entry of theDatamember of the exception instance. The precision of the index depends on the precision of theDecoderFallbackException’sIndexmember. If the underlyingSystem.IO.Streamis not seekable, the byte index only takes into account the bytes read by theCharStream, but not any bytes read before theCharStreamwas constructed.
6.11.1.4 Members
new: chars: string * index: int * length: int -> CharStream
Is equivalent to new CharStream(chars, index, length, 0L).
new: chars: string * index: int * length: int * streamBeginIndex: int64 -> CharStream
Constructs a CharStream from the chars in the string argument between the indices index (inclusive) and index + length (exclusive). By directly referencing the chars in the string this constructor
avoids any copy of the string content.
The first char in the stream is assigned the index streamBeginIndex. A positive streamBeginIndex allows you for example to create a substream of another CharStream, i.e. a CharStream instance
that only contains a sub‐segment of another char stream but is accessible through the same char indices.
chars must not be null. An ArgumentOutOfRangeException is thrown if the arguments do not satisfy the following conditions:
-
index≥ 0,length≥ 0,index+length≤chars.Lengthand - 0 ≤
streamBeginIndex< 260.
This note does not apply to the Low‐Trust version of
FParsec.
The given string is “pinned” until the CharStream is disposed. Pinning the string prevents the GC from moving it around in memory during garbage
collection. On .NET (at least in versions up to and including 4.0) the pinning has no effect if the string is large enough to be
allocated on the Large Object Heap, i.e. has a length of about 42500 chars or more. However, pinning smaller strings does constrain the
normal operations of the GC. Thus, to minimize the negative impact on the GC, you should dispose CharStream instances constructed from small strings as soon as you’re done parsing it. If you keep a
large number of CharStream instances constructed from small strings around for an
extended period of time, you risk fragmenting the heap.
new: chars: char[] * index: int * length: int -> CharStream
This constructor is not available in the Low‐Trust version of FParsec.
Is equivalent to new CharStream(chars, index, length, 0L).
new: chars: char[] * index: int * length: int * streamBeginIndex: int64 -> CharStream
This constructor is not available in the Low‐Trust version of FParsec.
Constructs a CharStream from the chars in the char array argument between the indices
index (inclusive) and index + length (exclusive). By directly referencing the chars in the char array this constructor
avoids any copy of the char array content.
The first char in the stream is assigned the index streamBeginIndex. A positive streamBeginIndex allows you for example to create a substream of another CharStream, i.e. a CharStream instance
that only contains a sub‐segment of another char stream but is accessible through the same char indices.
chars must not be null. An ArgumentOutOfRangeException is thrown if the arguments do not satisfy the following conditions:
-
index≥ 0,length≥ 0,index+length≤chars.Lengthand - 0 ≤
streamBeginIndex< 260.
A CharStream constructed from a char array does not support .NET regex matching via
the Match method.
The given char array is “pinned” until the CharStream is disposed. Pinning the char array prevents the GC from moving it around in
memory during garbage collection. On .NET (at least in versions up to and including 4.0) the pinning has no effect if the char array is
large enough to be allocated on the Large Object Heap, i.e. has a length of about 42500 chars or more. However, pinning smaller char
arrays does constrain the normal operations of the GC. Thus, to minimize the negative impact on the GC, you should dispose CharStream instances constructed from small char arrays as soon as you’re done parsing
it. If you keep a large number of CharStream instances constructed from
small char arrays around for an extended period of time, you risk fragmenting the heap.
new: chars: NativePtr<char> * length: int -> CharStream
This constructor is not available in the Low‐Trust version
of FParsec.
Is equivalent to new CharStream(chars, length, 0L).
new: chars: NativePtr<char> * length: int * streamBeginIndex: int64 -> CharStream
This constructor is not available in the Low‐Trust version of FParsec.
Constructs a CharStream from the length chars at the pointer address. By directly referencing the chars at the pointer address this constructor
avoids any copy of the char buffer.
The first char in the stream is assigned the index streamBeginIndex. A positive streamBeginIndex allows you for example to create a substream of another CharStream, i.e. a CharStream instance
that only contains a sub‐segment of another char stream but is accessible through the same char indices.
chars must not be null. An ArgumentOutOfRangeException is thrown if the arguments do not satisfy the following conditions:
-
length≥ 0,chars + lengthmust not overflow and - 0 ≤
streamBeginIndex< 260.
A CharStream constructed from a pointer does not support .NET regex matching via the
Match method.
new: path: string * encoding: System.Text.Encoding -> CharStream
Is equivalent to new CharStream(path, encoding, true).
new: path: string * encoding: System.Text.Encoding * detectEncodingFromByteOrderMarks: bool -> CharStream
Is equivalent to
new CharStream( path, encoding, detectEncodingFromByteOrderMarks, blockSize = DefaultBlockSize (* = 3*2^16 ≈ 200k *), blockOverlap = DefaultBlockSize/3, minRegexSpace = ((DefaultBlockSize/3)*2)/3, byteBufferLength = DefaultByteBufferLength )
new: path: string * encoding: System.Text.Encoding * detectEncodingFromByteOrderMarks: bool * blockSize: int * blockOverlap: int * minRegexSpace: int * byteBufferLength: int -> CharStream
Constructs a CharStream from a FileStream as if by calling
new CharStream( new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, FileOptions.SequentialScan), leaveOpen = false, encoding = encoding, detectEncoding = true, blockSize = DefaultBlockSize (* = 3*2^16 ≈ 200k *), blockOverlap = DefaultBlockSize/3, minRegexSpace = ((DefaultBlockSize/3)*2)/3, byteBufferLength = DefaultByteBufferLength )
If an exception occurs after the FileStream is constructed but before the CharStream
constructor is finished, the FileStream is disposed.
The FileStream constructor might throw an exception, too.
new: stream: System.IO.Stream * encoding: System.Text.Encoding -> CharStream
Is equivalent to new CharStream(stream, false, encoding, true).
new: stream: System.IO.Stream * leaveOpen: bool * encoding: System.Text.Encoding -> CharStream
Is equivalent to new CharStream(stream, leaveOpen, encoding, true).
new: stream: System.IO.Stream * leaveOpen: bool * encoding: System.Text.Encoding * detectEncodingFromByteOrderMarks: bool -> CharStream
Is equivalent to
new CharStream( stream, leaveOpen, encoding, detectEncodingFromByteOrderMarks, blockSize = DefaultBlockSize (* = 3*2^16 ≈ 200k *), blockOverlap = DefaultBlockSize/3, minRegexSpace = ((DefaultBlockSize/3)*2)/3, byteBufferLength = DefaultByteBufferLength )
new: stream: System.IO.Stream * leaveOpen: bool * encoding: System.Text.Encoding * detectEncodingFromByteOrderMarks: bool * blockSize: int * blockOverlap: int * minRegexSpace: int * byteBufferLength: int -> CharStream
Constructs a CharStream from a System.IO.Stream.
The normal version of the CharStream class supports stream sizes up to approximately
(231/p)×(blockSize ‐ blockOverlap) chars, where p is 4 on a 32‐bit CLR and 8 on a 64‐bit CLR.
The Low‐Trust version only supports streams small enough that the complete content can be read into a
single string.
This constructor reads the first block of chars from the input stream and hence can throw any of the I/O related exceptions detailed in the exceptions section above.
Arguments:
stream-
The byte stream providing the input. If
stream.CanReadreturnsfalse, anArgumentExceptionis thrown. leaveOpen-
Indicates whether the
streamshould be left open when theCharStreamhas finished reading it. encoding-
The default
Encodingused for decoding the byte stream into chars.If the preamble returned by
encoding.GetPreamble()is present at the beginning of the stream, theCharStreamwill skip over it. detectEncodingFromByteOrderMarks-
Indicates whether the constructor should detect the encoding from a unicode byte‐order mark at the beginning of the stream. An encoding detected from a byte‐order mark overrides the default
encoding. The standard byte‐order marks for the following encodings are supported: UTF‐8, UTF‐16 LE/BE and UTF‐32 LE/BE. blockSize-
The number of chars per block. The value is rounded up to the first positive multiple of 1536. The default is 3×216 ≈ 200k.
-
blockOverlap -
The number of chars at the end of a block that are preserved when reading the next block into into its internal char buffer. If this value is less than
encoding.GetMaxCharCount(1)or not less thanblockSize/2, the default value is used instead. The default isblockSize/3. - byteBufferLength
-
The size of the byte buffer used for decoding purposes. The default is 212 = 4KB.
member Dispose: unit -> unit
Releases all resources used by the CharStream. If the CharStream was constructed from a System.IO.Stream or a file path and the constructor was not called
with leaveOpen = true, the byte stream
is closed.
member BlockOverlap: int
The number of chars at the end of a block that are preserved when the CharStream reads
the next block into its internal char buffer.
This value is only relevant for optimization purposes and as the maximum value for MinRegexSpace.
This value can only be set at construction time with the respective constructor parameter.
If the CharStream is constructed from a string, char array or char pointer or only
contains 1 block, then this value is 0. In the Low‐Trust version this value is always 0.
member IndexOfFirstChar: int64
The index of the first char in the stream. This value is determined by the streamIndexOffset argument of some of the CharStream
constructors. By default this value is 0.
member IndexOfLastCharPlus1: int64
The index of the last char of the stream plus 1, or Int64.MaxValue if the end of the stream has not yet been detected.
member IsBeginOfStream: bool
Indicates whether the next char in the stream is the first char, i.e. whether Index equals IndexOfFirstChar.
If the stream is empty, this value is always true.
member IsEndOfStream: bool
Indicates whether there is no char remaining in the stream, i.e. whether Index equals IndexOfLastCharPlus1.
If the stream is empty, this value is always true.
member Index: int64
The stream index of the next char.
member IndexToken: CharStreamIndexToken
A CharStreamIndexToken value representing the
current Index value.
member Line: int64
The line number for the next char. (The line count starts with 1.)
member LineBegin: int64
The stream index of the first char of the line that also contains the next char.
member Name: string with get, set
This string is used in error messages to describe the input stream.
If the CharStream is constructed from a file path, the constructor initializes the
Name value with the file path value. Otherwise, Name is initialized to null.
If the stream content is the concatenated content of multiple input files, you can improve error messages and help debugging by setting the name and resetting the line and column count at the transitions between the different content pieces.
Setting the Name value increments the StateTag by 1, independent of whether the new value is different
from the previous one.
val mutable StateTag: uint64
The StateTag’s purpose is to provide an efficient way to determine whether the
publically visible state of the CharStream has changed after a series of method calls.
For the purpose of this property, the state is defined as the aggregate of the Index, Line, LineBegin and Name
values. The UserState value of CharStream<'UserState> instances is also part of the CharStream state. If a
method or property setter changes one or more of these state values it increments the StateTag’s by 1. Thus, to determine whether a series of method calls has changed the CharStream, it is often enough to compare the StateTag values
from before and after the method calls.
The StateTag property is primarily meant for use in the implementation of parser
combinators. If you directly call CharStream methods, you normally don’t need the StateTag to determine whether the state has changed, because that is usually obvious from
either the method’s return value or the context in which it was called. Please see section 5.4.3 for more details on the design rationale behind the
StateTag.
member Seek: index: int64 -> unit
Seeks the CharStream to the char with the specified index in the stream.
If you pass an index larger than the index of the last char in the stream, this method seeks the stream to the end of the stream, i.e. to one char past the last char in the stream.
The index is zero‐based, except if the CharStream was constructed with a positive streamIndexOffset argument, in which case the index of the first char equals the value of the
streamIndexOffset argument (and the IndexOfFirstChar value).
When this method changes the stream position, it increments the StateTag by 1. When it does not change the position, it may or may not increment the StateTag by 1.
An ArgumentOutOfRangeException is thrown if the index is less than the IndexOfFirstChar. This method may also throw any of the I/O related exceptions detailed above.
member Seek: indexToken: CharStreamIndexToken -> unit
static val EndOfStreamChar: char
member Peek: unit -> char
Returns the next char without changing the state of the CharStream.
At the end of the CharStream the EndOfStreamChar ('\uFFFF') is returned.
member Peek: utf16Offset: int -> char
Returns the char at the stream index Index + utf16Offset, without changing the state of the CharStream.
If Index + utf16Offset is smaller than the index of the first char in the stream or larger than the index of the last char in
the stream, the EndOfStreamChar
('\uFFFF') is
returned.
This method may throw any of the I/O related exceptions detailed above.
member Peek: utf16Offset: uint32 -> char
This method is an optimized implementation of Peek(int) for uint32 arguments.
member PeekString: length: int -> string
Returns a string with the next length stream chars, without changing the state of the
CharStream.
If less than length chars are remaining in the stream, only the remaining chars are
returned.
This note does not apply to the Low‐Trust version of
FParsec.
If length is greater than the number of remaining chars
in the stream, a temporary string with length chars may be allocated. For very large
length values this might lead to an OutOfMemoryException even though a string with only the remaining chars in the stream would comfortably fit
into memory.
Please also note that the maximum length of a string on .NET is less than 230. Allocating a string larger than the maximum
length will always yield an OutOfMemoryException, even on 64‐bit systems with enough
physical memory.
If length is negative, an ArgumentOutOfRangeException is thrown. This method may also throw any of the I/O related exceptions detailed above.
member PeekString: buffer: char[] * bufferIndex: int * length: int -> int
Copies the next length stream chars into buffer, without changing the state of the CharStream. Returns
the number of chars copied.
The chars are written into buffer beginning at the index bufferIndex. If less than length chars are remaining in the
stream, only the remaining chars are copied.
An ArgumentOutOfRangeException is thrown if the arguments do not satisfy the following conditions: bufferIndex ≥ 0, length ≥ 0 and bufferIndex + length ≤ buffer.Length. This method may also throw any
of the I/O related exceptions detailed above.
member PeekString: buffer: NativePtr<char> * length: int -> int
This method is not available in the Low‐Trust version of FParsec.
Copies the next length stream chars into the buffer at the specified pointer address,
without changing the state of the CharStream. Returns the number of chars copied.
If less than length chars are remaining in the stream, only the remaining chars are
copied.
If length is negative, an ArgumentOutOfRangeException is thrown. This method may also throw any of the I/O related exceptions detailed above.
member Match: char -> bool
Returns true if the next char in the stream matches the specified char. At the end of
the stream Match always returns false.
This method does not change the state of the CharStream.
This method may throw any of the I/O related exceptions detailed above.
member Match: chars: string -> bool
Returns true if the passed string chars matches the next chars.Length stream chars.
If not all the chars match or if there are not enough chars remaining in the stream, false is returned. If chars is empty, true is returned. chars must not be null.
This method does not change the state of the CharStream.
This method may throw any of the I/O related exceptions detailed above.
member Match: chars: char[] * charsIndex: int * length: int -> bool
Returns true if the next length
stream chars match the chars in the array chars at the indices charIndex to charsIndex + length - 1.
If not all the chars match or if there are not enough chars remaining in the stream, false is returned. If length is 0, true is returned. chars must not be null.
This method does not change the state of the CharStream.
An ArgumentOutOfRangeException is thrown if the arguments do not satisfy the following conditions: charsIndex ≥ 0, length ≥ 0 and charsIndex + length ≤ chars.Length. This method may also throw any
of the I/O related exceptions detailed above.
member Match: chars: NativePtr<char> * length: int -> bool
This method is not available in the Low‐Trust version of FParsec.
Returns true if the next length
stream chars match the chars at the specified pointer address.
If not all the chars match or if there are not enough chars remaining in the stream, false is returned. If length is 0, true is returned.
This method does not change the state of the CharStream.
If length is negative, an ArgumentOutOfRangeException is thrown. This method may also throw any of the I/O related exceptions detailed above.
member MatchCaseFolded: caseFoldedChar: char -> bool
Behaves like Match(caseFoldedChar), except that the next char in the stream is case‐folded before it is
compared with caseFoldedChar.
While the char in the stream is case‐folded before it is matched, the char caseFoldedChar is assumed to already be case‐folded (e.g. with the help of FParsec.Text.FoldCase). Please also see the above remarks on case‐insensitive matching.
member MatchCaseFolded: caseFoldedChars: string -> bool
Behaves like Match(caseFoldedChars), except that the chars in the stream are
case‐folded before they are compared with caseFoldedChars.
While the chars in the CharStream are case‐folded before they are matched, the chars
in the string argument caseFoldedChars are assumed to already be case‐folded (e.g.
with the help of FParsec.Text.FoldCase). Please also see the above remarks on case‐insensitive matching.
member MatchCaseFolded: caseFoldedChars: NativePtr<char> * length:int -> bool
This method is not available in the Low‐Trust version of FParsec.
Behaves like Match(caseFoldedChars, length), except that the chars in the stream are case‐folded before they are compared with the chars at the pointer
address caseFoldedChars.
While the chars in the CharStream are case‐folded before they are matched, the chars
at the pointer address caseFoldedChars are assumed to already be case‐folded (e.g.
with the help of FParsec.Text.FoldCase). Please also see the above remarks on case‐insensitive matching.
member Match: System.Text.RegularExpressions.Regex -> System.Text.RegularExpressions.Match
Applies the given regular expression to the stream chars beginning with the next char. Returns the resulting Match object.
For performance reasons you should specify the regular expression such that it can only match at the beginning of a string, for example by
prepending "\\A".
For CharStream instances constructed from strings the regular expression is applied to
a string containing all the remaining chars in the stream.
For CharStream instances constructed from large binary streams (with more than 1 block)
the regular expression is not applied to a string containing all the remaining chars in the stream. Here the MinRegexSpace value determines the minimum number of
chars that are guaranteed to be visible to the regular expression (assuming there are still enough chars remaining in the stream). The
exact number of chars visible to the regular expression may be affected even by calls to CharStream methods like Peek or Match that otherwise guarantee to not change the (outwardly visible) state of the CharStream.
This method may throw any of the I/O related exceptions detailed above.
This note does not apply to the Low‐Trust version of
FParsec.
This method is not supported by CharStream instances
constructed directly from char arrays or pointers. A NotSupportedException is thrown if this method is called on such a CharStream instance.
This note does not apply to the Low‐Trust version of
FParsec.
If the CharStream was constructed from a System.IO.Stream or a file path, the
regular expression is applied to an internal mutable buffer. Since the Match object may work lazily, i.e. compute return values not before they are needed, you need to retrieve
all the required information from the Match object before you continue to access the
CharStream, otherwise you might get back invalid match results. Note that all
strings returned by the Match object are, of course, immutable.
member MinRegexSpace: int with get, set
The number of chars that are guaranteed to be visible to a regular expression when it is matched by Match (assuming there are enough chars remaining in the stream).
The value must be non‐negative and not greater than BlockOverlap. The default value is 2/3 of BlockOverlap.
If the CharStream is constructed from a string, char array or char pointer or has only
1 block, then this value has no relevance and calling the property setter has no effect. (No Low‐Trust version CharStream instance has more
than 1 block.)
The MinRegexSpace value is not recorded in CharStreamState instances and setting its value does not affect the StateTag.
An ArgumentOutOfRangeException is thrown if you try to set the property on a multi‐block CharStream instance to a negative value or a value larger than the BlockOverlap.
member RegisterNewline: unit -> bool
Registers a newline (an end‐of‐line character) at the previous stream char, i.e. increments the Line value by 1 and sets the LineBegin to Index.
The previous LineBegin value must not equal
Index. (For performance reasons this condition
is only checked by an assert check in the debug build).
This method also increments the StateTag by
1.
member RegisterNewlines: lineOffset: int -> newColumnMinus1: int -> bool
Increments the Line value by lineOffset and sets the LineBegin value to Index - newColumnMinus1 (so that the Column value becomes newColumnMinus1 + 1).
The lineOffset must not be 0, the new Line value must be greater than 0 and and the new LineBegin value must be different from the
previous one. (For performance reasons these conditions are only checked by assert checks in the debug build).
This method also increments the StateTag by
1.
member RegisterNewlines: lineOffset: int64 -> newColumnMinus1: int64 -> bool
This method is a variant of RegisterNewlines for int64 arguments.
member Skip: unit -> unit
Advances the position within the stream by 1 char, except at the end of the stream, where it does nothing.
When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag.
This method may throw any of the I/O related exceptions detailed above.
member Skip: utf16Offset: int -> unit
Advances the position within the stream by utf16Offset chars.
The new position within the stream will be max(Index + utf16Offset, IndexOfLastCharPlus1). This means you can’t move past the end of the stream, because any position beyond the last char in the stream
is interpreted as precisely one char beyond the last char.
An ArgumentOutOfRangeException is thrown if the new position would lie before the beginning of the CharStream, i.e. if the new index would be less than IndexOfFirstChar. This method may also throw any of the I/O related exceptions detailed above.
member Skip: utf16Offset: uint32 -> unit
This method is an optimized implementation of Skip for uint32 offsets.
member Skip: utf16Offset: int64 -> unit
This method is a variant of Skip for int64 offsets.
member SkipAndPeek: unit -> char
member SkipAndPeek: utf16Offset: int -> char
c <- SkipAndPeek(utf16Offset) is an optimized implementation of Skip(utf16Offset); c <- Peek(), with the following exception for
negative offsets n:
If the new position would lie before the beginning of
the CharStream, i.e. if the new index would be less than IndexOfFirstChar, then SkipAndPeek(n) does not throw an exception
like stream.Skip(n) would do. Instead it sets the
position of the stream to IndexOfFirstChar and returns the EndOfStreamChar ('\uFFFF').
member SkipAndPeek: utf16Offset: uint32 -> char
member Skip: char -> bool
Skips over the next char in the stream if this char matches the passed argument char. Returns true if the chars match; otherwise, false. At the end of the
stream this method always returns false.
When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag.
This method may throw any of the I/O related exceptions detailed above.
Skips over the next two chars in the stream if these chars match the two chars in the passed TwoChars value. Returns true if the chars match.
If not both chars match or if there are less than 2 chars remaining in the stream, no char is skipped and false is returned.
When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag.
This method may throw any of the I/O related exceptions detailed above.
member Skip: chars: string -> bool
Skips over the next chars.Length chars
in the stream if these chars match the passed string chars. Returns true if the chars match.
If not all the chars match or if there are not enough chars remaining in the stream, no char is skipped and false is returned. If chars is empty, true is returned. chars must not be null.
When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag, except if chars is empty, in which case it may or may not increment the StateTag by 1.
This method may throw any of the I/O related exceptions detailed above.
member Skip: chars: char[] * charsIndex: int * length: int -> bool
Skips over the next length chars in the stream if these chars match the chars in the
passed array chars at the indices charIndex to charsIndex + length - 1. Returns true if the chars match.
If not all the chars match or if there are not enough chars remaining in the stream, false is returned and the position within the CharStream is
not changed. If length is 0, true
is returned. chars must not be null.
When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag, except if length is 0, in which case it may or may not increment the StateTag by 1.
An ArgumentOutOfRangeException is thrown if the arguments do not satisfy the following conditions: charsIndex ≥ 0, length ≥ 0 and charsIndex + length ≤ chars.Length. This method may also throw any
of the I/O related exceptions detailed above.
member Skip: chars: NativePtr<char> * length: int -> bool
This method is not available in the Low‐Trust version of FParsec.
Skips over the next length chars in the stream if these chars match the chars at the
pointer address chars. Returns true
if the chars match.
If not all the chars match or if there are not enough chars remaining in the stream, false is returned and the position within the CharStream is
not changed. If length is 0, true
is returned.
When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag, except if length is 0, in which case it may or may not increment the StateTag by 1.
If length is negative, an ArgumentOutOfRangeException is thrown. This method may also throw any of the I/O related exceptions detailed above.
member SkipCaseFolded: caseFoldedChar: char -> bool
Behaves like Skip(caseFoldedChar), except that the next char in the stream is case‐folded before it is
compared with caseFoldedChar.
While the char in the stream is case‐folded before it is matched, the char caseFoldedChar is assumed to already be case‐folded (e.g. with the help of FParsec.Text.FoldCase). Please also see the above remarks on case‐insensitive matching.
member SkipCaseFolded: caseFoldedChars: string -> bool
Behaves like Skip(caseFoldedChars), except that the chars in the stream are case‐folded before they are
compared with caseFoldedChars.
While the chars in the CharStream are case‐folded before they are matched, the chars
in the string argument caseFoldedChars are assumed to already be case‐folded (e.g.
with the help of FParsec.Text.FoldCase). Please also see the above remarks on case‐insensitive matching.
member SkipCaseFolded: caseFoldedChars: NativePtr<char> * length:int -> bool
This method is not available in the Low‐Trust version of FParsec.
Behaves like Skip(caseFoldedChars), except that the chars in the stream are
case‐folded before they are compared with the chars at the pointer address caseFoldedChars.
While the chars in the CharStream are case‐folded before they are matched, the chars
at the pointer address caseFoldedChars are assumed to already be case‐folded (e.g.
with the help of FParsec.Text.FoldCase). Please also see the above remarks on case‐insensitive matching.
member Read: unit -> char
Skips over the next char in the stream. Returns the skipped char.
At the end of the stream Read() does not change the stream
position and returns the EndOfStreamChar ('\uFFFF').
When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag.
This method may throw any of the I/O related exceptions detailed above.
member Read: length: int -> string
Skips over the next length chars in the stream. Returns the skipped chars as a string.
If less than length chars are remaining in the stream, only the remaining chars are
skipped and returned.
When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag, except if length is 0, in which case it may or may not increment the StateTag by 1.
If length is negative, an ArgumentOutOfRangeException is thrown. This method may also throw any of the I/O related exceptions detailed above.
member Read: buffer: char[] * bufferIndex: int * length: int -> int
Skips over the next length stream chars and copies the skipped chars into buffer. Returns the number of copied and skipped chars.
The chars are written into buffer beginning at the index bufferIndex. If less than length chars are remaining in the
stream, only the remaining chars are copied and skipped.
When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag, except if length is 0, in which case it may or may not increment the StateTag by 1.
An ArgumentOutOfRangeException is thrown if the arguments do not satisfy the following conditions: bufferIndex ≥ 0, length ≥ 0 and bufferIndex + length ≤ buffer.Length. This method may also throw any
of the I/O related exceptions detailed above.
member Read: buffer: NativePtr<char> * length: int -> int
This method is not available in the Low‐Trust version of FParsec.
Skips over the next length stream chars and copies the skipped chars into the buffer at
the given pointer address. Returns the number of copied and skipped chars.
If less than length chars are remaining in the stream, only the remaining chars are
copied and skipped.
When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag, except if length is 0, in which case it may or may not increment the StateTag by 1.
If length is negative, an ArgumentOutOfRangeException is thrown. This method may also throw any of the I/O related exceptions detailed above.
member ReadFrom: indexOfFirstChar: CharStreamIndexToken -> string
Returns a string with the chars between the stream index indexOfFirstChar (inclusive)
and the current Index of the stream
(exclusive).
This method trows
-
an
ArgumentOutOfRangeException, ifIndex < indexOfFirstChar, and -
an
ArgumentException, if theCharStreamIndexTokenis a zero‐initialized instance (i.e. constructed with the default value type constructor).
It may also throw any of the I/O related exceptions detailed above.
You may only pass CharStreamToken values that were retrieved from the CharStream instance on which you’re calling ReadFrom. Passing a CharStreamToken value that was created
for another CharStream instance triggers an assert exception in debug builds and
will otherwise lead to undefined behaviour.
member SkipWhitespace: unit -> bool
Skips over any sequence of space (' '), tab ('\t') or newline ('\r', '\n') chars. Returns true if it
skips at least one char, otherwise false.
This method registers any skipped standard newline ("\n", "\r\n" or "\r").
When this method skips at least one char, it increments the StateTag by 1; otherwise, it does not change the StateTag.
This method may throw any of the I/O related exceptions detailed above.
member SkipUnicodeWhitespace: unit -> bool
Skips over any sequence of unicode whitespace chars (as identified by System.Char.IsWhiteSpace). Returns true if it skips at least one char, otherwise false.
This method registers any skipped unicode newline ("\n", "\r\n", "\r", "\u0085", "\u000C", "\u2028" or "\u2029").
This method recognizes the form feed char '\f' ('\u000C') as a Unicode whitespace character, but not as a newline character.
When this method skips at least one char, it increments the StateTag by 1; otherwise, it does not change the StateTag.
This method may throw any of the I/O related exceptions detailed above.
member SkipNewline: unit -> bool
Skips over a standard newline ("\n", "\r\n" or "\r"). Returns true if a
newline is skipped, otherwise false.
When this method skips a newline, it also registers it.
When this method skips a newline, it increments the StateTag by 1, otherwise it does not change the StateTag.
This method may throw any of the I/O related exceptions detailed above.
member SkipUnicodeNewline: unit -> bool
Skips over a unicode newline ("\n", "\r\n", "\r", "\u0085", "\u2028", or "\u2029"). Returns true if a
newline is skipped, otherwise false.
This method does not recognize the form feed char '\f' ('\u000C') as a newline character.
When this method skips a newline, it also registers it.
When this method skips a newline, it increments the StateTag by 1, otherwise it does not change the StateTag.
This method may throw any of the I/O related exceptions detailed above.
member SkipNewlineThenWhitespace: powerOf2TabStopDistance: int * allowFormFeed: bool -> int
Skips over a newline ("\n", "\r\n" or "\r") followed by any (possibly empty) sequence of whitespace chars (' ', '\t', '\r', '\n' and optionally '\f').
If this method skips no chars because the next stream char is no newline char, it returns ‒1. Otherwise it returns the indentation of the first line with non‐whitespace characters.
The indentation is calculated as follows:
-
Any newline char (
'\r'or'\n') or form feed char ('\f') resets the indentation to 0. -
Any space char (
' ') increments the indentation by 1. -
Any tab char (
'\t') increments the indentation by
powerOf2TabStopDistance‐ (indentation modulopowerOf2TabStopDistance).
The maximum indentation is 231 ‐ 1. If skipping a whitespace char would cause the indentation to overflow, the char is not skipped and the method returns the indentation up to that char.
An ArgumentOutOfRangeException is thrown if powerOf2TabStopDistance is not a positive power of 2.
The value of the allowFormFeed argument determines whether this method accepts the form
feed char '\f' as a whitespace char.
This method registers all skipped standard newlines ("\n", "\r\n" or "\r").
When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag.
This method may throw any of the I/O related exceptions detailed above.
member SkipRestOfLine: skipNewline: bool -> unit
Skips over any chars before the next newline ("\n", "\r\n" or "\r") or the end of the stream. If skipNewline is true and a newline is
present, the newline is also skipped.
When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag.
This method may throw any of the I/O related exceptions detailed above.
member ReadRestOfLine: skipNewline: bool -> string
ReadRestOfLine(skipNewline) behaves like SkipRestOfLine(skipNewline), except
that it returns a string with the skipped chars (without a newline).
member ReadCharOrNewline: unit -> char
Skips over any single char or standard newline ("\n", "\r\n" or "\r").
This method returns '\n' when it skips a newline. Otherwise, it returns the skipped char, except at the end of the stream, where
it returns the EndOfStreamChar ('\uffff').
When this method skips a newline, it also registers it.
When this method skips a char or newline, it increments the StateTag by 1; otherwise, it does not change the StateTag.
This method may throw any of the I/O related exceptions detailed above.
member SkipCharsOrNewlines: maxCount: int -> int
Skips over up to maxCount chars. Returns the number of skipped chars.
The number of actually skipped chars is less than maxCount if the end of the stream is
reached after less than maxCount chars.
This method counts standard newlines ("\n", "\r\n" or "\r") as single chars. When this method skips a newline, it also registers it.
When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag.
An ArgumentOutOfRangeException is thrown if maxCount is
negative. This method may also throw any of the I/O related exceptions detailed above.
member ReadCharsOrNewlines: maxCount: int * normalizeNewlines: bool -> string
Behaves like SkipCharsOrNewlines(maxCount), except that it returns a string with the skipped
chars.
The normalizeNewlines parameter determines whether all newlines ("\n", "\r\n" or "\r") in the returned string are normalized to '\n' or whether they are preserved in the original form
they are encountered in the input.
member SkipCharsOrNewlinesWhile: predicate: (char -> bool) -> int
Skips over a sequence of chars that satisfy the predicate function. Stops at the first
char for which predicate returns false. Returns the number of skipped chars.
This method counts standard newlines ("\n", "\r\n" or "\r") as single chars and passes them to the predicate function as single '\n' chars. When
this method skips a newline, it also registers it.
When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag.
The predicate function must not access the CharStream instance itself, because SkipCharsOrNewlinesWhile relies on predicate not having any
side‐effect on the internal state of the stream.
This method may throw any of the I/O related exceptions detailed above.
member SkipCharsOrNewlinesWhile: predicateForFirstChar: (char -> bool) * predicate: (char -> bool) -> int
Behaves like SkipCharsOrNewlinesWhile(predicate),
except that the first char to be skipped must satisfy predicateForFirstChar instead of
predicate.
member SkipCharsOrNewlinesWhile: predicate: (char -> bool) * minCount: int * maxCount: int -> int
Skips over a sequence of up to maxCount chars that satisfy the predicate function, but backtracks to the start if it can only skip less than minCount chars. Returns the number of skipped chars.
This method counts standard newlines ("\n", "\r\n" or "\r") as single chars and passes them to the predicate function as single '\n' chars. When
this method skips a newline, it also registers it.
An ArgumentOutOfRangeException is thrown if maxCount is
negative. This method may also throw any of the I/O related exceptions detailed above.
The predicate function must not access the CharStream instance itself, because SkipCharsOrNewlinesWhile relies on predicate not having any
side‐effect on the internal state of the stream.
member SkipCharsOrNewlinesWhile: predicateForFirstChar: (char -> bool) * predicate: (char -> bool) * minCount: int * maxCount: int -> int
Behaves like SkipCharsOrNewlinesWhile(predicate, nMin, nMax), except that the first char to
be skipped must satisfy predicateForFirstChar instead of predicate.
member ReadCharsOrNewlinesWhile: predicate: (char -> bool) * normalizeNewlines: bool -> string
Behaves like SkipCharsOrNewlinesWhile(predicate),
except that it returns a string with the skipped chars.
The normalizeNewlines parameter determines whether all newlines ("\n", "\r\n" or "\r") in the returned string are normalized to '\n' or whether they are preserved in the original form
they are encountered in the input.
member ReadCharsOrNewlinesWhile: predicateForFirstChar: (char -> bool) * predicate: (char -> bool) * normalizeNewlines: bool -> string
Behaves like ReadCharsOrNewlinesWhile(predicate, normalizeNewlines), except that the first char to be skipped must satisfy predicateForFirstChar instead of predicate.
member ReadCharsOrNewlinesWhile: predicate: (char -> bool) * minCount: int * maxCount: int * normalizeNewlines: bool -> string
Behaves like SkipCharsOrNewlinesWhile(predicate, minCount, maxCount), except that it
returns a string with the skipped chars.
The normalizeNewlines parameter determines whether all newlines ("\n", "\r\n" or "\r") in the returned string are normalized to '\n' or whether they are preserved in the original form
they are encountered in the input.
member ReadCharsOrNewlinesWhile: predicateForFirstChar: (char -> bool) * predicate: (char -> bool) * minCount: int * maxCount: int * normalizeNewlines: bool -> string
Behaves like ReadCharsOrNewlinesWhile(predicate, minCount, maxCount, normalizeNewlines), except that the first char to be skipped must satisfy predicateForFirstChar instead of predicate.
member SkipCharsOrNewlinesUntilString: str: string * maxCount: int * foundString: out<bool> -> int
Skips over all stream chars before the first occurrence of the specified string or the end of the stream, but not over more than maxCount chars. Assigns true to the
output parameter if the string is found, otherwise false.
This method registers skipped newlines ("\n", "\r\n" or "\r") and counts them as single chars. However, no newline normalization takes
place when the argument string str is matched with the stream chars. Hence, str should either contain no newlines or only in the form they occur in the stream. If str starts with '\n', then SkipCharsOrNewlinesUntilString will not find occurences of str
in the stream that start in the middle of an "\r\n" newline.
When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag.
This method throws
-
an
ArgumentException, if the string argument is empty, and -
an
ArgumentOutRangeException, ifnMaxis negative.
It may also throw any of the I/O related exceptions detailed above.
member SkipCharsOrNewlinesUntilString: str: string * maxCount: int * normalizeNewlines: bool * skippedCharsIfStringFoundOtherwiseNull: out<string> -> int
Behaves like SkipCharsOrNewlinesUntilString(str, maxCount, outBool), except that its output
parameter is a string instead of a boolean. If str is found, a string with the skipped
chars is assigned to this output parameter; otherwise, null is assigned to the output
parameter.
The normalizeNewlines parameter determines whether all newlines ("\n", "\r\n" or "\r") in the output string are normalized to '\n' or are preserved in the original form they are
encountered in the input.
member SkipCharsOrNewlinesUntilCaseFoldedString: caseFoldedString: string * maxCount: int * foundString: out<bool> -> int
Behaves like SkipCharsOrNewlinesUntilString(caseFoldedString, maxCount, foundString), except that the chars in the stream are case‐folded before they are compared with caseFoldedChars.
While the chars in the CharStream are case‐folded before they are matched, the chars
in the string argument caseFoldedString are assumed to already be case‐folded (e.g.
with the help of FParsec.Text.FoldCase). Please also see the above remarks on case‐insensitive matching.
member SkipCharsOrNewlinesUntilCaseFoldedString: caseFoldedString: string * maxCount: int * normalizeNewlines: bool * skippedCharsIfStringFoundOtherwiseNull: out<string> -> int
Behaves like SkipCharsOrNewlinesUntilString(caseFoldedString, maxCount, normalizeNewlines, skippedCharsIfStringFoundOtherwiseNull), except that the chars in the stream are case‐folded before they are compared with caseFoldedChars.
While the chars in the stream are case‐folded before they are matched, the chars in the string argument caseFoldedString are assumed to already be case‐folded (e.g. with the help of FParsec.Text.FoldCase). Please also see the above remarks on case‐insensitive matching.
6.11.2 CharStream<TUserState>
Provides read‐access to a sequence of UTF‐16 chars.
6.11.2.1 Interface
[<Sealed>] type CharStream<'TUserState> = inherit CharStream // has the same constructors as CharStream member UserState: 'TUserState with get, set member State: CharStreamState<'TUserState> member BacktrackTo: CharStreamState<'TUserState> -> unit member ReadFrom: stateWhereStringBegins: CharStreamState<'TUserState> * normalizeNewlines: bool -> string member CreateSubstream<'TSubStreamUserState>: stateWhereSubstreamBegins: CharStreamState<'TUserState> -> CharStream<'TSubStreamUserState>
6.11.2.2 Remarks
The CharStream<'TUserState> class adds a user definable state component to its base class CharStream.
The user state is accessible through the property UserState. It has the type 'TUserState.
You can retrieve a snapshot of the complete stream state, including the user state, from the State property. The value returned from the State property has the type CharStreamState<'TUserState>. You can pass a CharStreamState value to the BacktrackTo method in order to restore a previous state of the CharStream.
'TUserState must be an immutable type or at least be treated as an immutable type if
you want BacktrackTo to completely
restore old values of the user state. Hence, when you need to change the user state, you should set a new 'TUserState value to the UserState property of the CharStream instance, not mutate the existing 'TUserState value.
6.11.2.3 Members
member UserState: 'TUserState with get, set
The current user state value.
Setting the UserState value increments the StateTag by 1, independent of whether the new value is different
from the previous one.
member State: CharStreamState<'TUserState>
member BacktrackTo: CharStreamState<'TUserState> -> unit
Restores the stream to the state represented by the given CharStreamState value.
For example:
fun (stream: CharStream<'u>) -> let state = stream.State // ... (do something with stream that might change the state) stream.BacktrackTo(state) // restores stream to previous state // ...
This method throws an ArgumentException if the CharStreamState instance is zero‐initialized (i.e. constructed with the default value type constructor). It
may also throw any of the I/O related exceptions detailed above.
You may only pass CharStreamState values that were
retrieved from the CharStream instance on which you’re
calling BacktrackTo. Passing a CharStreamState value that was created for another CharStream instance triggers an assert exception in debug builds and will otherwise lead to undefined
behaviour.
member ReadFrom: stateWhereStringBegins: CharStreamState<'TUserState> * normalizeNewlines: bool -> string
Returns a string with the chars between the index of the stateWhereStringBegins
(inclusive) and the current Index of the stream
(exclusive).
The normalizeNewlines parameter determines whether all newlines ("\n", "\r\n" or "\r") in the returned string are normalized to '\n' or whether they are preserved in the original form
they are encountered in the input. (If stateWhereStringBegins.Line equals the current Line, this method will never normalize any newlines in the returned
string.)
This method trows
-
an
ArgumentOutOfRangeException, ifIndex < GetIndex(stateWhereStringBegins), and -
an
ArgumentException, if theCharStreamStateinstance is zero‐initialized (i.e. constructed with the default value type constructor).
It may also throw any of the I/O related exceptions detailed above.
You may only pass CharStreamState values that were
retrieved from the CharStream instance on which you’re
calling ReadFrom. Passing a CharStreamState value that was created for another CharStream instance triggers an assert exception in debug builds and will otherwise lead to undefined
behaviour.
member CreateSubstream<'TSubStreamUserState>: stateWhereSubstreamBegins: CharStreamState<'TUserState> -> CharStream<'TSubStreamUserState>
Creates a new CharStream<'TUserState> instance with the stream chars between the index of the stateWhereSubstreamBegins (inclusive) and the current Index of the stream (exclusive).
The state of the substream is initialized to stateWhereSubstreamBegin, so that the
stream and the substream will report the same position (Index, Line, LineBegin and Name) for corresponding chars. However, the beginning and end will
normally differ between stream and substream, in particular the IndexOfFirstChar and IndexOfLastCharPlus1 values will normally differ between stream and substream.
An example:
open FParsec open FParsec.Primitives open FParsec.CharParsers open FParsec.Error let embeddedBlock (beginDelim: string) (endDelim: string) : Parser<_,_> = let expectedEmbeddedBlock = expected "embedded block" fun stream -> if stream.Skip(beginDelim) then let stateAtBegin = stream.State let mutable foundString = false let maxChars = System.Int32.MaxValue stream.SkipCharsOrNewlinesUntilString(endDelim, maxChars, &foundString) |> ignore if foundString then // create substream with content between beginDelim and endDelim use substream = stream.CreateSubstream<unit>(stateAtBegin) // here we would normally work with the substream, // in this example we will just extract the string content let str = substream.ReadCharsOrNewlines(System.Int32.MaxValue, true) Reply(str) else Reply(Error, expectedString endDelim) else Reply(Error, expectedEmbeddedBlock)
> run (embeddedBlock "/*" "*/") "/*substream content*/";; val it : ParserResult<string,unit> = Success: "substream content"
This note does not apply to the Low‐Trust version of FParsec.
If you create a substream for a CharStream instance with
more than one block, the content of the substream needs to be copied. Thus, you can minimize the overhead associated with creating a
substream by ensuring that the CharStream has only one
block, either by choosing a sufficiently large blockSize, or by creating the CharStream from a string or char buffer.
You may use a stream and its substreams concurrently. However, notice the following warning:
This note does not apply to the Low‐Trust version of FParsec.
You may not dispose a stream before all of its substreams are disposed. Disposing a stream before all its substreams are disposed
triggers an assert exception in debug builds and otherwise lead to undefined behaviour.
This method trows
-
an
ArgumentOutOfRangeException, ifIndex < GetIndex(stateWhereSubstreamBegins), and -
an
ArgumentException, if theCharStreamStateinstance is zero‐initialized (i.e. constructed with the default value type constructor).
It may also throw any of the I/O related exceptions detailed above.
You may only pass CharStreamState values that were
retrieved from the CharStream instance on which you’re
calling CreateSubstream. Passing a CharStreamState value that was created for another CharStream instance triggers an assert exception in debug builds and will
otherwise lead to undefined behaviour.
6.11.3 CharStreamIndexToken
An opaque representation of a CharStream char index.
type CharStreamIndexToken = struct member GetIndex: CharStream -> int64 end
CharStream methods can handle CharStreamIndexToken values more efficiently than integer char indices.
You can retrieve CharStreamIndexToken values from the CharStream.IndexToken and CharStreamState<_>.IndexToken properties.
You can get the char index corresponding to a given CharStreamIndexToken value by calling
its GetIndex method with the CharStream instance from which the token was retrieved.
Zero‐initialized CharStreamIndexToken values constructed with the default value type
constructor are not valid and trying to call a CharStream
method with such an instance will trigger an exception.
A CharStreamIndexToken instance may only be used together with the CharSteam instance it was created for.
member GetIndex: CharStream -> int64
Returns the stream index represented by the CharStreamIndexToken instance.
The CharStream instance passed as the argument must be the
CharStream instance from which the CharStreamIndexToken was retrieved. Passing a different CharStream instance triggers an assert exception in debug builds and will otherwise lead to undefined
behaviour.
An InvalidOperationException is thrown if the CharStreamIndexToken value is zero‐initialized (i.e. constructed with the default value type constructor).
6.11.4 CharStreamState
An immutable value type representation of the state of a CharStream.
type CharStreamState<'TUserState> = struct member Tag: int64 member IndexToken: CharStreamIndexToken member Line: int64 member LineBegin: int64 member Name: string member UserState: 'TUserState member GetIndex: CharStream<'TUserState> -> int64 member GetPosition: CharStream<'TUserState> -> Position end
You can retrieve CharStreamState values from the CharStream<_>.State property. By passing a CharStreamState value to the BacktrackTo method of a CharStream<_> instance, you can restore the stream to the state
represented by the CharStreamState value.
Zero‐initialized CharStreamState values constructed with the default value type constructor
are not valid and trying to call a CharStream method with
such an instance will trigger an exception.
A CharStreamState instance may only be used together with the CharSteam instance it was created for.
member GetIndex: CharStream<'TUserState> -> int64
state.GetIndex(stream) is an optimized implementation of state.IndexToken.GetIndex(stream).
The CharStream<'TUserState> instance passed as the argument must be the CharStream instance from which the CharStreamState was retrieved. Passing a different CharStream instance triggers an assert exception in debug builds and will otherwise lead to undefined
behaviour.
An InvalidOperationException is thrown if the CharStreamState instance is zero‐initialized (i.e. constructed with the default value type constructor).
member GetPosition: CharStream<'TUserState> -> Position
state.GetPosition(stream) is an optimized implementation of new Position(state.Name, state.GetIndex(stream), state.Line, state.Column).
The CharStream<'TUserState> instance passed as the argument must be the CharStream instance from which the CharStreamState was retrieved. Passing a different CharStream instance triggers an assert exception in debug builds and will otherwise lead to undefined
behaviour.
An InvalidOperationException is thrown if the CharStreamState instance is zero‐initialized (i.e. constructed with the default value type constructor).
6.11.5 TwoChars
An immutable value type representation of two chars:
type TwoChars = struct new: char0: char * char1: char -> TwoChars val Char0: char val Char1: char end
| [1] |
The detection of invalid byte sequences by the .NET decoders is not entirely reliable. For example, System.Text.UnicodeEncoding (UTF‐16) has an alignment related bug in .NET versions prior to 4.0 that sometimes leads to invalid
surrogate pairs not being detected. The implementations of more complicated encodings, like GB18030, ISO‐2022 and ISCII, also have several issues with regard to the detection
of invalid input data.
|
|---|