6.11 FParsec.CharStream
6.11.1 CharStream
Provides read‐access to a sequence of UTF‐16 chars.
6.11.1.1 Interface
// FParsecCS.dll namespace FParsec type CharStream = interface System.IDisposable new: chars: string * index: int * length: int -> CharStream new: chars: string * index: int * length: int * streamBeginIndex: int64 -> CharStream new: chars: char[] * index: int * length: int -> CharStream new: chars: char[] * index: int * length: int * streamBeginIndex: int64 -> CharStream new: chars: NativePtr<char> * length: int -> CharStream new: chars: NativePtr<char> * length: int * streamBeginIndex: int64 -> CharStream new: path: string * encoding: System.Text.Encoding -> CharStream new: path: string * encoding: System.Text.Encoding * detectEncodingFromByteOrderMarks: bool -> CharStream new: path: string * encoding: System.Text.Encoding * detectEncodingFromByteOrderMarks: bool * blockSize: int * blockOverlap: int * minRegexSpace: int * byteBufferLength: int -> CharStream new: stream: System.IO.Stream * encoding: System.Text.Encoding -> CharStream new: stream: System.IO.Stream * leaveOpen: bool * encoding: System.Text.Encoding -> CharStream new: stream: System.IO.Stream * leaveOpen: bool * encoding: System.Text.Encoding * detectEncodingFromByteOrderMarks: bool -> CharStream new: stream: System.IO.Stream * leaveOpen: bool * encoding: System.Text.Encoding * detectEncodingFromByteOrderMarks: bool * blockSize: int * blockOverlap: int * minRegexSpace: int * byteBufferLength: int -> CharStream member Dispose: unit -> unit member BlockOverlap: int member IndexOfFirstChar: int64 member IndexOfLastCharPlus1: int64 member IsBeginOfStream: bool member IsEndOfStream: bool member Index: int64 member IndexToken: CharStreamIndexToken member Line: int64 member LineBegin: int64 member Column: int64 member Name: string with get, set member Position: Position val mutable StateTag: uint64 member Seek: index: int64 -> unit member Seek: indexToken: CharStreamIndexToken -> unit static val EndOfStreamChar: char member Peek: unit -> char member Peek2: unit -> TwoChars member Peek: utf16Offset: int -> char member Peek: utf16Offset: uint32 -> char member PeekString: length: int -> string member PeekString: buffer: char[] * bufferIndex: int * length: int -> int member PeekString: buffer: NativePtr<char> * length: int -> int member Match: char -> bool member Match: chars: string -> bool member Match: chars: char[] * charsIndex: int * length: int -> bool member Match: chars: NativePtr<char> * length: int -> bool member MatchCaseFolded: caseFoldedChar: char -> bool member MatchCaseFolded: caseFoldedChars: string -> bool member MatchCaseFolded: caseFoldedChars: NativePtr<char> * length:int -> bool member Match: System.Text.RegularExpressions.Regex -> System.Text.RegularExpressions.Match member MinRegexSpace: int with get, set member RegisterNewline: unit -> bool member RegisterNewlines: lineOffset: int -> newColumnMinus1: int -> bool member RegisterNewlines: lineOffset: int64 -> newColumnMinus1: int64 -> bool // The following methods require manual registration of skipped newlines member Skip: unit -> unit member Skip: utf16Offset: int -> unit member Skip: utf16Offset: uint32 -> unit member Skip: utf16Offset: int64 -> unit member SkipAndPeek: unit -> char member SkipAndPeek: utf16Offset: int -> char member SkipAndPeek: utf16Offset: uint32 -> char member Skip: char -> bool member Skip: TwoChars -> bool member Skip: chars: string -> bool member Skip: chars: char[] * charsIndex: int * length: int -> bool member Skip: chars: NativePtr<char> * length: int -> bool member SkipCaseFolded: caseFoldedChar: char -> bool member SkipCaseFolded: caseFoldedChars: string -> bool member SkipCaseFolded: caseFoldedChars: NativePtr<char> * length:int -> bool member Read: unit -> char member Read: length: int -> string member Read: buffer: char[] * bufferIndex: int * length: int -> int member Read: buffer: NativePtr<char> * length: int -> int member ReadFrom: indexOfFirstChar: CharStreamIndexToken -> string // The following methods automatically register skipped newlines member SkipWhitespace: unit -> bool member SkipUnicodeWhitespace: unit -> bool member SkipNewline: unit -> bool member SkipUnicodeNewline: unit -> bool member SkipNewlineThenWhitespace: powerOf2TabStopDistance: int * allowFormFeed: bool -> int member SkipRestOfLine: skipNewline: bool -> unit member ReadRestOfLine: skipNewline: bool -> string member ReadCharOrNewline: unit -> char member SkipCharsOrNewlines: maxCount: int -> int member ReadCharsOrNewlines: maxCount: int * normalizeNewlines: bool -> string member SkipCharsOrNewlinesWhile: predicate: (char -> bool) -> int member SkipCharsOrNewlinesWhile: predicateForFirstChar: (char -> bool) * predicate: (char -> bool) -> int member SkipCharsOrNewlinesWhile: predicate: (char -> bool) * minCount: int * maxCount: int -> int member SkipCharsOrNewlinesWhile: predicateForFirstChar: (char -> bool) * predicate: (char -> bool) * minCount: int * maxCount: int -> int member ReadCharsOrNewlinesWhile: predicate: (char -> bool) * normalizeNewlines: bool -> string member ReadCharsOrNewlinesWhile: predicateForFirstChar: (char -> bool) * predicate: (char -> bool) * normalizeNewlines: bool -> string member ReadCharsOrNewlinesWhile: predicate: (char -> bool) * minCount: int * maxCount: int * normalizeNewlines: bool -> string member ReadCharsOrNewlinesWhile: predicateForFirstChar: (char -> bool) * predicate: (char -> bool) * minCount: int * maxCount: int * normalizeNewlines: bool -> string member SkipCharsOrNewlinesUntilString: str: string * maxCount: int * foundString: out<bool> -> int member SkipCharsOrNewlinesUntilString: str: string * maxCount: int * normalizeNewlines: bool * skippedCharsIfStringFoundOtherwiseNull: out<string> -> int member SkipCharsOrNewlinesUntilCaseFoldedString: caseFoldedString: string * maxCount: int * foundString: out<bool> -> int member SkipCharsOrNewlinesUntilCaseFoldedString: caseFoldedString: string * maxCount: int * normalizeNewlines: bool * skippedCharsIfStringFoundOtherwiseNull: out<string> -> int
6.11.1.2 Remarks
The CharStream
class provides a unified interface for efficiently reading UTF‐16 chars from
a binary stream or an in‐memory char buffer (e.g. a string). It is optimized for the use in backtracking parser applications and supports
arbitrary char‐based seeking, even for streams larger than the addressable memory (on 32‐bit platforms).
The CharStream
class is the base class of CharStream<'TUserState>
, which adds a user‐definable state component and some convenience methods for working with the state
of a CharStream
instance.
A CharStream
constructed from a System.IO.Stream
or a file path reads the stream
block‐wise and only holds the most recently accessed block in memory. The blocks overlap in order to provide efficient access on the
boundary between blocks.
If the char content is already available as a string or a char array, a CharStream
can be
directly constructed from the char buffer (without needing to copy the buffer). The overhead of accessing an in‐memory char buffer through a
CharStream
is minimal.
- Position information
-
The position of the next char in the stream is described by the following 4 properties:
Among these properties the char index is the most important one, as the
CharStream
uses it to uniquely identify a UTF‐16 char in the stream.The other 3 properties further describe the text location of the char identified by the index, but they are not necessary for the core functionality of the
CharStream
class. TheCharStream
class keeps track of this additional position information to provide a more convenient interface to higher‐level library functions, in particular to assist debugging and error reporting purposes. - Newlines
-
For performance reasons the most basic stream operations do not automatically recognize newlines (end‐of‐line markers) in the stream content. If you skip any newline with these methods, you have to manually register the newline afterwards with one of the
RegisterNewline
methods (otherwise the line and column count becomes incorrect).In order to provide a convenient interface for parser routines, the
CharStream
class also provides some more advanced methods that automatically register any skipped standard newline ("\n"
,"\r\n"
and"\r"
). Additionally, it provides two methods that automatically register any Unicode newline (SkipUnicodeWhitespace
andSkipUnicodeNewline
).It should be obvious from the method names which methods automatically register newlines and which don’t.
- Case‐insensitive matching
-
The
MatchCaseFolded
andSkipCaseFolded
members match the content of the stream “case‐insensitively” with a reference string. In this instance “case‐insensitive” means that before the chars are matched with the reference string they are mapped to a canonical form where case differences are erased. For performance reasonsMatchCaseFolded
only applies the (non‐Turkic) 1‐to‐1 case folding mappings (v. 8.0.0) for Unicode code points in the Basic Multilingual Plane, i.e. code points below 0x10000. These mappings are sufficient for many case‐insensitive parser grammars encountered in practice, but they are not appropriate for matching arbitrary natural language content. Please also note that theCharStream
class performs no Unicode normalization. - Non‐sequential access
-
This note does not apply to the Low‐Trust version of FParsec.
If you construct aCharStream
from aSystem.IO.Stream
or a file path and you backtrack over a distance long enough to require theCharStream
to reread a previous block, then the underlying byte stream needs to support seeking, otherwise aNotSupportedException
is thrown. Furthermore, the Decoder for the input Encoding must be serializable if you backtrack to a block other than the first in the stream. Note that file streams created for regular disk files are always seekable and all the .NET standard decoders are serializable. In order to support non‐seekable streams for applications which don’t require extensive backtracking, no exception will be thrown before an operation actually requires backtracking and the necessary capabilities of the stream or decoder are not available. - Decoder errors
-
A
CharStream
constructed from a binary input stream decodes the input data with the help of aDecoder
instance obtained via theEncodings
’sGetDecoder
method. Depending on the configuration of the encoding the decoder might throw an exception if it encounters invalid byte sequences, usually aSystem.Text.DecoderFallbackException
or aSystem.IO.ArgumentException
. [1] - Disposable interface
-
This note does not apply to the Low‐Trust version of FParsec.
ACharStream
holds managed and unmanaged resources that need to be explicitly released. Hence, it is very important thatCharStream
objects are promptly disposed after use. Where possibleCharStream
objects should only be used within a “using” block (C#), a “use” expression( F#) or similar constructs in other languages. - Thread safety
-
CharStream
members are not thread‐safe. - Low‐Trust version
-
If you compile FParsec with the
LOW_TRUST
conditional compiler symbol, theCharStream
class differs from the normal version as follows:- No unverifiable code involving pointers is used. This allows FParsec to be executed in an environment with reduced trust, such as medium trust ASP.NET applications or Silverlight applications.
-
A
CharStream
that is constructed from aSystem.IO.Stream
or a file path reads the complete file into a single string during construction. This severely limits the maximum practical stream size. -
Although the
CharStream
class still supports theIDisposable
interface, disposing theCharStream
instances is no longer necessary, since no resources are hold that need to be explicitly released.
See also section 3.5.
6.11.1.3 I/O exceptions
If you construct a CharStream
from a System.IO.Stream
or a file path, the constructor and any CharStream
operation that requires reading chars from the underlying byte stream may throw one of
the following exceptions.
In the Low‐Trust version, the constructor decodes the complete byte stream and hence only the constructor may throw one of these exceptions.
Doing actual work in a constructor and potentially throwing exceptions seems to be a somewhat controversial design. We think it’s the right
choice for the CharStream
class, because this way you can a have a reasonable
expectation that the CharStream
actually works after you’ve successfully constructed it.
In general it is not safe to continue to use a CharStream
instance after one of
these exceptions was thrown, though calling Dispose()
is always safe.
NotSupportedException
-
Seeking of the underlying byte stream is required, but the byte stream does not support seeking or the
Encoding
’sDecoder
is not serializable. See also the remarks above on non‐sequential access. IOException
-
An I/O occurred while reading data from the underlying byte stream.
ArgumentException
-
The underlying byte stream contains invalid bytes and the
Encoding
was constructed with thethrowOnInvalidBytes
option. DecoderFallbackException
-
The underlying byte stream contains invalid bytes for which the decoder fallback threw this exception.
The byte index of the invalid bytes in the stream is stored as a boxed
System.Int64
in the"Stream.Position"
entry of theData
member of the exception instance. The precision of the index depends on the precision of theDecoderFallbackException
’sIndex
member. If the underlyingSystem.IO.Stream
is not seekable, the byte index only takes into account the bytes read by theCharStream
, but not any bytes read before theCharStream
was constructed.
6.11.1.4 Members
new: chars: string * index: int * length: int -> CharStream
Is equivalent to new CharStream(chars, index, length, 0L)
.
new: chars: string * index: int * length: int * streamBeginIndex: int64 -> CharStream
Constructs a CharStream
from the chars in the string argument between the indices index
(inclusive) and index + length
(exclusive). By directly referencing the chars in the string this constructor
avoids any copy of the string content.
The first char in the stream is assigned the index streamBeginIndex
. A positive streamBeginIndex
allows you for example to create a substream of another CharStream
, i.e. a CharStream
instance
that only contains a sub‐segment of another char stream but is accessible through the same char indices.
chars
must not be null. An ArgumentOutOfRangeException
is thrown if the arguments do not satisfy the following conditions:
-
index
≥ 0,length
≥ 0,index
+length
≤chars.Length
and - 0 ≤
streamBeginIndex
< 260.
This note does not apply to the Low‐Trust version of
FParsec.
The given string is “pinned” until the CharStream
is disposed. Pinning the string prevents the GC from moving it around in memory during garbage
collection. On .NET (at least in versions up to and including 4.0) the pinning has no effect if the string is large enough to be
allocated on the Large Object Heap, i.e. has a length of about 42500 chars or more. However, pinning smaller strings does constrain the
normal operations of the GC. Thus, to minimize the negative impact on the GC, you should dispose CharStream
instances constructed from small strings as soon as you’re done parsing it. If you keep a
large number of CharStream
instances constructed from small strings around for an
extended period of time, you risk fragmenting the heap.
new: chars: char[] * index: int * length: int -> CharStream
This constructor is not available in the Low‐Trust version of FParsec.
Is equivalent to new CharStream(chars, index, length, 0L)
.
new: chars: char[] * index: int * length: int * streamBeginIndex: int64 -> CharStream
This constructor is not available in the Low‐Trust version of FParsec.
Constructs a CharStream
from the chars in the char array argument between the indices
index
(inclusive) and index + length
(exclusive). By directly referencing the chars in the char array this constructor
avoids any copy of the char array content.
The first char in the stream is assigned the index streamBeginIndex
. A positive streamBeginIndex
allows you for example to create a substream of another CharStream
, i.e. a CharStream
instance
that only contains a sub‐segment of another char stream but is accessible through the same char indices.
chars
must not be null. An ArgumentOutOfRangeException
is thrown if the arguments do not satisfy the following conditions:
-
index
≥ 0,length
≥ 0,index
+length
≤chars.Length
and - 0 ≤
streamBeginIndex
< 260.
A CharStream
constructed from a char array does not support .NET regex matching via
the Match
method.
The given char array is “pinned” until the CharStream
is disposed. Pinning the char array prevents the GC from moving it around in
memory during garbage collection. On .NET (at least in versions up to and including 4.0) the pinning has no effect if the char array is
large enough to be allocated on the Large Object Heap, i.e. has a length of about 42500 chars or more. However, pinning smaller char
arrays does constrain the normal operations of the GC. Thus, to minimize the negative impact on the GC, you should dispose CharStream
instances constructed from small char arrays as soon as you’re done parsing
it. If you keep a large number of CharStream
instances constructed from
small char arrays around for an extended period of time, you risk fragmenting the heap.
new: chars: NativePtr<char> * length: int -> CharStream
This constructor is not available in the Low‐Trust version
of FParsec.
Is equivalent to new CharStream(chars, length, 0L)
.
new: chars: NativePtr<char> * length: int * streamBeginIndex: int64 -> CharStream
This constructor is not available in the Low‐Trust version of FParsec.
Constructs a CharStream
from the length
chars at the pointer address. By directly referencing the chars at the pointer address this constructor
avoids any copy of the char buffer.
The first char in the stream is assigned the index streamBeginIndex
. A positive streamBeginIndex
allows you for example to create a substream of another CharStream
, i.e. a CharStream
instance
that only contains a sub‐segment of another char stream but is accessible through the same char indices.
chars
must not be null. An ArgumentOutOfRangeException
is thrown if the arguments do not satisfy the following conditions:
-
length
≥ 0,chars + length
must not overflow and - 0 ≤
streamBeginIndex
< 260.
A CharStream
constructed from a pointer does not support .NET regex matching via the
Match
method.
new: path: string * encoding: System.Text.Encoding -> CharStream
Is equivalent to new CharStream(path, encoding, true)
.
new: path: string * encoding: System.Text.Encoding * detectEncodingFromByteOrderMarks: bool -> CharStream
Is equivalent to
new CharStream( path, encoding, detectEncodingFromByteOrderMarks, blockSize = DefaultBlockSize (* = 3*2^16 ≈ 200k *), blockOverlap = DefaultBlockSize/3, minRegexSpace = ((DefaultBlockSize/3)*2)/3, byteBufferLength = DefaultByteBufferLength )
new: path: string * encoding: System.Text.Encoding * detectEncodingFromByteOrderMarks: bool * blockSize: int * blockOverlap: int * minRegexSpace: int * byteBufferLength: int -> CharStream
Constructs a CharStream
from a FileStream
as if by calling
new CharStream( new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, FileOptions.SequentialScan), leaveOpen = false, encoding = encoding, detectEncoding = true, blockSize = DefaultBlockSize (* = 3*2^16 ≈ 200k *), blockOverlap = DefaultBlockSize/3, minRegexSpace = ((DefaultBlockSize/3)*2)/3, byteBufferLength = DefaultByteBufferLength )
If an exception occurs after the FileStream
is constructed but before the CharStream
constructor is finished, the FileStream
is disposed.
The FileStream
constructor might throw an exception, too.
new: stream: System.IO.Stream * encoding: System.Text.Encoding -> CharStream
Is equivalent to new CharStream(stream, false, encoding, true)
.
new: stream: System.IO.Stream * leaveOpen: bool * encoding: System.Text.Encoding -> CharStream
Is equivalent to new CharStream(stream, leaveOpen, encoding, true)
.
new: stream: System.IO.Stream * leaveOpen: bool * encoding: System.Text.Encoding * detectEncodingFromByteOrderMarks: bool -> CharStream
Is equivalent to
new CharStream( stream, leaveOpen, encoding, detectEncodingFromByteOrderMarks, blockSize = DefaultBlockSize (* = 3*2^16 ≈ 200k *), blockOverlap = DefaultBlockSize/3, minRegexSpace = ((DefaultBlockSize/3)*2)/3, byteBufferLength = DefaultByteBufferLength )
new: stream: System.IO.Stream * leaveOpen: bool * encoding: System.Text.Encoding * detectEncodingFromByteOrderMarks: bool * blockSize: int * blockOverlap: int * minRegexSpace: int * byteBufferLength: int -> CharStream
Constructs a CharStream
from a System.IO.Stream
.
The normal version of the CharStream
class supports stream sizes up to approximately
(231/p)×(blockSize
‐ blockOverlap
) chars, where p is 4 on a 32‐bit CLR and 8 on a 64‐bit CLR.
The Low‐Trust version only supports streams small enough that the complete content can be read into a
single string.
This constructor reads the first block of chars from the input stream and hence can throw any of the I/O related exceptions detailed in the exceptions section above.
Arguments:
stream
-
The byte stream providing the input. If
stream.CanRead
returnsfalse
, anArgumentException
is thrown. leaveOpen
-
Indicates whether the
stream
should be left open when theCharStream
has finished reading it. encoding
-
The default
Encoding
used for decoding the byte stream into chars.If the preamble returned by
encoding.GetPreamble()
is present at the beginning of the stream, theCharStream
will skip over it. detectEncodingFromByteOrderMarks
-
Indicates whether the constructor should detect the encoding from a unicode byte‐order mark at the beginning of the stream. An encoding detected from a byte‐order mark overrides the default
encoding
. The standard byte‐order marks for the following encodings are supported: UTF‐8, UTF‐16 LE/BE and UTF‐32 LE/BE. blockSize
-
The number of chars per block. The value is rounded up to the first positive multiple of 1536. The default is 3×216 ≈ 200k.
-
blockOverlap
-
The number of chars at the end of a block that are preserved when reading the next block into into its internal char buffer. If this value is less than
encoding.GetMaxCharCount(1)
or not less thanblockSize/2
, the default value is used instead. The default isblockSize/3
. - byteBufferLength
-
The size of the byte buffer used for decoding purposes. The default is 212 = 4KB.
member Dispose: unit -> unit
Releases all resources used by the CharStream
. If the CharStream
was constructed from a System.IO.Stream
or a file path and the constructor was not called
with leaveOpen = true
, the byte stream
is closed.
member BlockOverlap: int
The number of chars at the end of a block that are preserved when the CharStream
reads
the next block into its internal char buffer.
This value is only relevant for optimization purposes and as the maximum value for MinRegexSpace
.
This value can only be set at construction time with the respective constructor parameter.
If the CharStream
is constructed from a string, char array or char pointer or only
contains 1 block, then this value is 0. In the Low‐Trust version this value is always 0.
member IndexOfFirstChar: int64
The index of the first char in the stream. This value is determined by the streamIndexOffset
argument of some of the CharStream
constructors. By default this value is 0.
member IndexOfLastCharPlus1: int64
The index of the last char of the stream plus 1, or Int64.MaxValue
if the end of the stream has not yet been detected.
member IsBeginOfStream: bool
Indicates whether the next char in the stream is the first char, i.e. whether Index
equals IndexOfFirstChar
.
If the stream is empty, this value is always true
.
member IsEndOfStream: bool
Indicates whether there is no char remaining in the stream, i.e. whether Index
equals IndexOfLastCharPlus1
.
If the stream is empty, this value is always true
.
member Index: int64
The stream index of the next char.
member IndexToken: CharStreamIndexToken
A CharStreamIndexToken
value representing the
current Index
value.
member Line: int64
The line number for the next char. (The line count starts with 1.)
member LineBegin: int64
The stream index of the first char of the line that also contains the next char.
member Name: string with get, set
This string is used in error messages to describe the input stream.
If the CharStream
is constructed from a file path, the constructor initializes the
Name
value with the file path value. Otherwise, Name
is initialized to null
.
If the stream content is the concatenated content of multiple input files, you can improve error messages and help debugging by setting the name and resetting the line and column count at the transitions between the different content pieces.
Setting the Name
value increments the StateTag
by 1, independent of whether the new value is different
from the previous one.
val mutable StateTag: uint64
The StateTag
’s purpose is to provide an efficient way to determine whether the
publically visible state of the CharStream
has changed after a series of method calls.
For the purpose of this property, the state is defined as the aggregate of the Index
, Line
, LineBegin
and Name
values. The UserState
value of CharStream<'UserState>
instances is also part of the CharStream
state. If a
method or property setter changes one or more of these state values it increments the StateTag
’s by 1. Thus, to determine whether a series of method calls has changed the CharStream
, it is often enough to compare the StateTag
values
from before and after the method calls.
The StateTag
property is primarily meant for use in the implementation of parser
combinators. If you directly call CharStream
methods, you normally don’t need the StateTag
to determine whether the state has changed, because that is usually obvious from
either the method’s return value or the context in which it was called. Please see section 5.4.3 for more details on the design rationale behind the
StateTag
.
member Seek: index: int64 -> unit
Seeks the CharStream
to the char with the specified index in the stream.
If you pass an index larger than the index of the last char in the stream, this method seeks the stream to the end of the stream, i.e. to one char past the last char in the stream.
The index is zero‐based, except if the CharStream
was constructed with a positive streamIndexOffset
argument, in which case the index of the first char equals the value of the
streamIndexOffset
argument (and the IndexOfFirstChar
value).
When this method changes the stream position, it increments the StateTag
by 1. When it does not change the position, it may or may not increment the StateTag
by 1.
An ArgumentOutOfRangeException
is thrown if the index is less than the IndexOfFirstChar
. This method may also throw any of the I/O related exceptions detailed above.
member Seek: indexToken: CharStreamIndexToken -> unit
static val EndOfStreamChar: char
member Peek: unit -> char
Returns the next char without changing the state of the CharStream
.
At the end of the CharStream
the EndOfStreamChar
('\uFFFF'
) is returned.
member Peek: utf16Offset: int -> char
Returns the char at the stream index Index + utf16Offset
, without changing the state of the CharStream
.
If Index + utf16Offset
is smaller than the index of the first char in the stream or larger than the index of the last char in
the stream, the EndOfStreamChar
('\uFFFF'
) is
returned.
This method may throw any of the I/O related exceptions detailed above.
member Peek: utf16Offset: uint32 -> char
This method is an optimized implementation of Peek(int)
for uint32
arguments.
member PeekString: length: int -> string
Returns a string with the next length
stream chars, without changing the state of the
CharStream
.
If less than length
chars are remaining in the stream, only the remaining chars are
returned.
This note does not apply to the Low‐Trust version of
FParsec.
If length
is greater than the number of remaining chars
in the stream, a temporary string with length
chars may be allocated. For very large
length
values this might lead to an OutOfMemoryException
even though a string with only the remaining chars in the stream would comfortably fit
into memory.
Please also note that the maximum length of a string on .NET is less than 230. Allocating a string larger than the maximum
length will always yield an OutOfMemoryException
, even on 64‐bit systems with enough
physical memory.
If length
is negative, an ArgumentOutOfRangeException
is thrown. This method may also throw any of the I/O related exceptions detailed above.
member PeekString: buffer: char[] * bufferIndex: int * length: int -> int
Copies the next length
stream chars into buffer
, without changing the state of the CharStream
. Returns
the number of chars copied.
The chars are written into buffer
beginning at the index bufferIndex
. If less than length
chars are remaining in the
stream, only the remaining chars are copied.
An ArgumentOutOfRangeException
is thrown if the arguments do not satisfy the following conditions: bufferIndex
≥ 0, length
≥ 0 and bufferIndex
+ length
≤ buffer.Length
. This method may also throw any
of the I/O related exceptions detailed above.
member PeekString: buffer: NativePtr<char> * length: int -> int
This method is not available in the Low‐Trust version of FParsec.
Copies the next length
stream chars into the buffer at the specified pointer address,
without changing the state of the CharStream
. Returns the number of chars copied.
If less than length
chars are remaining in the stream, only the remaining chars are
copied.
If length
is negative, an ArgumentOutOfRangeException
is thrown. This method may also throw any of the I/O related exceptions detailed above.
member Match: char -> bool
Returns true
if the next char in the stream matches the specified char. At the end of
the stream Match
always returns false
.
This method does not change the state of the CharStream
.
This method may throw any of the I/O related exceptions detailed above.
member Match: chars: string -> bool
Returns true
if the passed string chars
matches the next chars.Length
stream chars.
If not all the chars match or if there are not enough chars remaining in the stream, false
is returned. If chars
is empty, true
is returned. chars
must not be null
.
This method does not change the state of the CharStream
.
This method may throw any of the I/O related exceptions detailed above.
member Match: chars: char[] * charsIndex: int * length: int -> bool
Returns true
if the next length
stream chars match the chars in the array chars
at the indices charIndex
to charsIndex + length - 1
.
If not all the chars match or if there are not enough chars remaining in the stream, false
is returned. If length
is 0, true
is returned. chars
must not be null
.
This method does not change the state of the CharStream
.
An ArgumentOutOfRangeException
is thrown if the arguments do not satisfy the following conditions: charsIndex
≥ 0, length
≥ 0 and charsIndex
+ length
≤ chars.Length
. This method may also throw any
of the I/O related exceptions detailed above.
member Match: chars: NativePtr<char> * length: int -> bool
This method is not available in the Low‐Trust version of FParsec.
Returns true
if the next length
stream chars match the chars at the specified pointer address.
If not all the chars match or if there are not enough chars remaining in the stream, false
is returned. If length
is 0, true
is returned.
This method does not change the state of the CharStream
.
If length
is negative, an ArgumentOutOfRangeException
is thrown. This method may also throw any of the I/O related exceptions detailed above.
member MatchCaseFolded: caseFoldedChar: char -> bool
Behaves like Match(caseFoldedChar)
, except that the next char in the stream is case‐folded before it is
compared with caseFoldedChar
.
While the char in the stream is case‐folded before it is matched, the char caseFoldedChar
is assumed to already be case‐folded (e.g. with the help of FParsec.Text.FoldCase
). Please also see the above remarks on case‐insensitive matching.
member MatchCaseFolded: caseFoldedChars: string -> bool
Behaves like Match(caseFoldedChars)
, except that the chars in the stream are
case‐folded before they are compared with caseFoldedChars
.
While the chars in the CharStream
are case‐folded before they are matched, the chars
in the string argument caseFoldedChars
are assumed to already be case‐folded (e.g.
with the help of FParsec.Text.FoldCase
). Please also see the above remarks on case‐insensitive matching.
member MatchCaseFolded: caseFoldedChars: NativePtr<char> * length:int -> bool
This method is not available in the Low‐Trust version of FParsec.
Behaves like Match(caseFoldedChars, length)
, except that the chars in the stream are case‐folded before they are compared with the chars at the pointer
address caseFoldedChars
.
While the chars in the CharStream
are case‐folded before they are matched, the chars
at the pointer address caseFoldedChars
are assumed to already be case‐folded (e.g.
with the help of FParsec.Text.FoldCase
). Please also see the above remarks on case‐insensitive matching.
member Match: System.Text.RegularExpressions.Regex -> System.Text.RegularExpressions.Match
Applies the given regular expression to the stream chars beginning with the next char. Returns the resulting Match
object.
For performance reasons you should specify the regular expression such that it can only match at the beginning of a string, for example by
prepending "\\A"
.
For CharStream
instances constructed from strings the regular expression is applied to
a string containing all the remaining chars in the stream.
For CharStream
instances constructed from large binary streams (with more than 1 block)
the regular expression is not applied to a string containing all the remaining chars in the stream. Here the MinRegexSpace
value determines the minimum number of
chars that are guaranteed to be visible to the regular expression (assuming there are still enough chars remaining in the stream). The
exact number of chars visible to the regular expression may be affected even by calls to CharStream
methods like Peek
or Match
that otherwise guarantee to not change the (outwardly visible) state of the CharStream
.
This method may throw any of the I/O related exceptions detailed above.
This note does not apply to the Low‐Trust version of
FParsec.
This method is not supported by CharStream
instances
constructed directly from char arrays or pointers. A NotSupportedException
is thrown if this method is called on such a CharStream
instance.
This note does not apply to the Low‐Trust version of
FParsec.
If the CharStream
was constructed from a System.IO.Stream
or a file path, the
regular expression is applied to an internal mutable buffer. Since the Match
object may work lazily, i.e. compute return values not before they are needed, you need to retrieve
all the required information from the Match
object before you continue to access the
CharStream
, otherwise you might get back invalid match results. Note that all
strings returned by the Match
object are, of course, immutable.
member MinRegexSpace: int with get, set
The number of chars that are guaranteed to be visible to a regular expression when it is matched by Match
(assuming there are enough chars remaining in the stream).
The value must be non‐negative and not greater than BlockOverlap
. The default value is 2/3 of BlockOverlap
.
If the CharStream
is constructed from a string, char array or char pointer or has only
1 block, then this value has no relevance and calling the property setter has no effect. (No Low‐Trust version CharStream
instance has more
than 1 block.)
The MinRegexSpace
value is not recorded in CharStreamState
instances and setting its value does not affect the StateTag
.
An ArgumentOutOfRangeException
is thrown if you try to set the property on a multi‐block CharStream
instance to a negative value or a value larger than the BlockOverlap
.
member RegisterNewline: unit -> bool
Registers a newline (an end‐of‐line character) at the previous stream char, i.e. increments the Line
value by 1 and sets the LineBegin
to Index
.
The previous LineBegin
value must not equal
Index
. (For performance reasons this condition
is only checked by an assert check in the debug build).
This method also increments the StateTag
by
1.
member RegisterNewlines: lineOffset: int -> newColumnMinus1: int -> bool
Increments the Line
value by lineOffset
and sets the LineBegin
value to Index - newColumnMinus1
(so that the Column
value becomes newColumnMinus1
+ 1).
The lineOffset
must not be 0, the new Line
value must be greater than 0 and and the new LineBegin
value must be different from the
previous one. (For performance reasons these conditions are only checked by assert checks in the debug build).
This method also increments the StateTag
by
1.
member RegisterNewlines: lineOffset: int64 -> newColumnMinus1: int64 -> bool
This method is a variant of RegisterNewlines
for int64
arguments.
member Skip: unit -> unit
Advances the position within the stream by 1 char, except at the end of the stream, where it does nothing.
When this method changes the stream position, it increments the StateTag
by 1; otherwise, it does not change the StateTag
.
This method may throw any of the I/O related exceptions detailed above.
member Skip: utf16Offset: int -> unit
Advances the position within the stream by utf16Offset
chars.
The new position within the stream will be max(Index + utf16Offset, IndexOfLastCharPlus1)
. This means you can’t move past the end of the stream, because any position beyond the last char in the stream
is interpreted as precisely one char beyond the last char.
An ArgumentOutOfRangeException
is thrown if the new position would lie before the beginning of the CharStream
, i.e. if the new index would be less than IndexOfFirstChar
. This method may also throw any of the I/O related exceptions detailed above.
member Skip: utf16Offset: uint32 -> unit
This method is an optimized implementation of Skip
for uint32
offsets.
member Skip: utf16Offset: int64 -> unit
This method is a variant of Skip
for int64
offsets.
member SkipAndPeek: unit -> char
member SkipAndPeek: utf16Offset: int -> char
c <- SkipAndPeek(utf16Offset)
is an optimized implementation of Skip(utf16Offset); c <- Peek()
, with the following exception for
negative offsets n
:
If the new position would lie before the beginning of
the CharStream
, i.e. if the new index would be less than IndexOfFirstChar
, then SkipAndPeek(n)
does not throw an exception
like stream.Skip(n)
would do. Instead it sets the
position of the stream to IndexOfFirstChar
and returns the EndOfStreamChar
('\uFFFF'
).
member SkipAndPeek: utf16Offset: uint32 -> char
member Skip: char -> bool
Skips over the next char in the stream if this char matches the passed argument char. Returns true
if the chars match; otherwise, false
. At the end of the
stream this method always returns false
.
When this method changes the stream position, it increments the StateTag
by 1; otherwise, it does not change the StateTag
.
This method may throw any of the I/O related exceptions detailed above.
Skips over the next two chars in the stream if these chars match the two chars in the passed TwoChars
value. Returns true
if the chars match.
If not both chars match or if there are less than 2 chars remaining in the stream, no char is skipped and false
is returned.
When this method changes the stream position, it increments the StateTag
by 1; otherwise, it does not change the StateTag
.
This method may throw any of the I/O related exceptions detailed above.
member Skip: chars: string -> bool
Skips over the next chars.Length
chars
in the stream if these chars match the passed string chars
. Returns true
if the chars match.
If not all the chars match or if there are not enough chars remaining in the stream, no char is skipped and false
is returned. If chars
is empty, true
is returned. chars
must not be null
.
When this method changes the stream position, it increments the StateTag
by 1; otherwise, it does not change the StateTag
, except if chars
is empty, in which case it may or may not increment the StateTag
by 1.
This method may throw any of the I/O related exceptions detailed above.
member Skip: chars: char[] * charsIndex: int * length: int -> bool
Skips over the next length
chars in the stream if these chars match the chars in the
passed array chars
at the indices charIndex
to charsIndex + length - 1
. Returns true
if the chars match.
If not all the chars match or if there are not enough chars remaining in the stream, false
is returned and the position within the CharStream
is
not changed. If length
is 0, true
is returned. chars
must not be null
.
When this method changes the stream position, it increments the StateTag
by 1; otherwise, it does not change the StateTag
, except if length
is 0, in which case it may or may not increment the StateTag
by 1.
An ArgumentOutOfRangeException
is thrown if the arguments do not satisfy the following conditions: charsIndex
≥ 0, length
≥ 0 and charsIndex
+ length
≤ chars.Length
. This method may also throw any
of the I/O related exceptions detailed above.
member Skip: chars: NativePtr<char> * length: int -> bool
This method is not available in the Low‐Trust version of FParsec.
Skips over the next length
chars in the stream if these chars match the chars at the
pointer address chars
. Returns true
if the chars match.
If not all the chars match or if there are not enough chars remaining in the stream, false
is returned and the position within the CharStream
is
not changed. If length
is 0, true
is returned.
When this method changes the stream position, it increments the StateTag
by 1; otherwise, it does not change the StateTag
, except if length
is 0, in which case it may or may not increment the StateTag
by 1.
If length
is negative, an ArgumentOutOfRangeException
is thrown. This method may also throw any of the I/O related exceptions detailed above.
member SkipCaseFolded: caseFoldedChar: char -> bool
Behaves like Skip(caseFoldedChar)
, except that the next char in the stream is case‐folded before it is
compared with caseFoldedChar
.
While the char in the stream is case‐folded before it is matched, the char caseFoldedChar
is assumed to already be case‐folded (e.g. with the help of FParsec.Text.FoldCase
). Please also see the above remarks on case‐insensitive matching.
member SkipCaseFolded: caseFoldedChars: string -> bool
Behaves like Skip(caseFoldedChars)
, except that the chars in the stream are case‐folded before they are
compared with caseFoldedChars
.
While the chars in the CharStream
are case‐folded before they are matched, the chars
in the string argument caseFoldedChars
are assumed to already be case‐folded (e.g.
with the help of FParsec.Text.FoldCase
). Please also see the above remarks on case‐insensitive matching.
member SkipCaseFolded: caseFoldedChars: NativePtr<char> * length:int -> bool
This method is not available in the Low‐Trust version of FParsec.
Behaves like Skip(caseFoldedChars)
, except that the chars in the stream are
case‐folded before they are compared with the chars at the pointer address caseFoldedChars
.
While the chars in the CharStream
are case‐folded before they are matched, the chars
at the pointer address caseFoldedChars
are assumed to already be case‐folded (e.g.
with the help of FParsec.Text.FoldCase
). Please also see the above remarks on case‐insensitive matching.
member Read: unit -> char
Skips over the next char in the stream. Returns the skipped char.
At the end of the stream Read()
does not change the stream
position and returns the EndOfStreamChar
('\uFFFF'
).
When this method changes the stream position, it increments the StateTag
by 1; otherwise, it does not change the StateTag
.
This method may throw any of the I/O related exceptions detailed above.
member Read: length: int -> string
Skips over the next length
chars in the stream. Returns the skipped chars as a string.
If less than length
chars are remaining in the stream, only the remaining chars are
skipped and returned.
When this method changes the stream position, it increments the StateTag
by 1; otherwise, it does not change the StateTag
, except if length
is 0, in which case it may or may not increment the StateTag
by 1.
If length
is negative, an ArgumentOutOfRangeException
is thrown. This method may also throw any of the I/O related exceptions detailed above.
member Read: buffer: char[] * bufferIndex: int * length: int -> int
Skips over the next length
stream chars and copies the skipped chars into buffer
. Returns the number of copied and skipped chars.
The chars are written into buffer
beginning at the index bufferIndex
. If less than length
chars are remaining in the
stream, only the remaining chars are copied and skipped.
When this method changes the stream position, it increments the StateTag
by 1; otherwise, it does not change the StateTag
, except if length
is 0, in which case it may or may not increment the StateTag
by 1.
An ArgumentOutOfRangeException
is thrown if the arguments do not satisfy the following conditions: bufferIndex
≥ 0, length
≥ 0 and bufferIndex
+ length
≤ buffer.Length
. This method may also throw any
of the I/O related exceptions detailed above.
member Read: buffer: NativePtr<char> * length: int -> int
This method is not available in the Low‐Trust version of FParsec.
Skips over the next length
stream chars and copies the skipped chars into the buffer at
the given pointer address. Returns the number of copied and skipped chars.
If less than length
chars are remaining in the stream, only the remaining chars are
copied and skipped.
When this method changes the stream position, it increments the StateTag
by 1; otherwise, it does not change the StateTag
, except if length
is 0, in which case it may or may not increment the StateTag
by 1.
If length
is negative, an ArgumentOutOfRangeException
is thrown. This method may also throw any of the I/O related exceptions detailed above.
member ReadFrom: indexOfFirstChar: CharStreamIndexToken -> string
Returns a string with the chars between the stream index indexOfFirstChar
(inclusive)
and the current Index
of the stream
(exclusive).
This method trows
-
an
ArgumentOutOfRangeException
, ifIndex < indexOfFirstChar
, and -
an
ArgumentException
, if theCharStreamIndexToken
is a zero‐initialized instance (i.e. constructed with the default value type constructor).
It may also throw any of the I/O related exceptions detailed above.
You may only pass CharStreamToken
values that were retrieved from the CharStream
instance on which you’re calling ReadFrom
. Passing a CharStreamToken
value that was created
for another CharStream
instance triggers an assert exception in debug builds and
will otherwise lead to undefined behaviour.
member SkipWhitespace: unit -> bool
Skips over any sequence of space (' '
), tab ('\t'
) or newline ('\r'
, '\n'
) chars. Returns true
if it
skips at least one char, otherwise false
.
This method registers any skipped standard newline ("\n"
, "\r\n"
or "\r"
).
When this method skips at least one char, it increments the StateTag
by 1; otherwise, it does not change the StateTag
.
This method may throw any of the I/O related exceptions detailed above.
member SkipUnicodeWhitespace: unit -> bool
Skips over any sequence of unicode whitespace chars (as identified by System.Char.IsWhiteSpace
). Returns true
if it skips at least one char, otherwise false
.
This method registers any skipped unicode newline ("\n"
, "\r\n"
, "\r"
, "\u0085"
, "\u000C"
, "\u2028"
or "\u2029"
).
This method recognizes the form feed char '\f'
('\u000C'
) as a Unicode whitespace character, but not as a newline character.
When this method skips at least one char, it increments the StateTag
by 1; otherwise, it does not change the StateTag
.
This method may throw any of the I/O related exceptions detailed above.
member SkipNewline: unit -> bool
Skips over a standard newline ("\n"
, "\r\n"
or "\r"
). Returns true
if a
newline is skipped, otherwise false
.
When this method skips a newline, it also registers it.
When this method skips a newline, it increments the StateTag
by 1, otherwise it does not change the StateTag
.
This method may throw any of the I/O related exceptions detailed above.
member SkipUnicodeNewline: unit -> bool
Skips over a unicode newline ("\n"
, "\r\n"
, "\r"
, "\u0085"
, "\u2028"
, or "\u2029"
). Returns true
if a
newline is skipped, otherwise false
.
This method does not recognize the form feed char '\f'
('\u000C'
) as a newline character.
When this method skips a newline, it also registers it.
When this method skips a newline, it increments the StateTag
by 1, otherwise it does not change the StateTag
.
This method may throw any of the I/O related exceptions detailed above.
member SkipNewlineThenWhitespace: powerOf2TabStopDistance: int * allowFormFeed: bool -> int
Skips over a newline ("\n"
, "\r\n"
or "\r"
) followed by any (possibly empty) sequence of whitespace chars (' '
, '\t'
, '\r'
, '\n'
and optionally '\f'
).
If this method skips no chars because the next stream char is no newline char, it returns ‒1. Otherwise it returns the indentation of the first line with non‐whitespace characters.
The indentation is calculated as follows:
-
Any newline char (
'\r'
or'\n'
) or form feed char ('\f'
) resets the indentation to 0. -
Any space char (
' '
) increments the indentation by 1. -
Any tab char (
'\t'
) increments the indentation by
powerOf2TabStopDistance
‐ (indentation modulopowerOf2TabStopDistance
).
The maximum indentation is 231 ‐ 1. If skipping a whitespace char would cause the indentation to overflow, the char is not skipped and the method returns the indentation up to that char.
An ArgumentOutOfRangeException
is thrown if powerOf2TabStopDistance
is not a positive power of 2.
The value of the allowFormFeed
argument determines whether this method accepts the form
feed char '\f'
as a whitespace char.
This method registers all skipped standard newlines ("\n"
, "\r\n"
or "\r"
).
When this method changes the stream position, it increments the StateTag
by 1; otherwise, it does not change the StateTag
.
This method may throw any of the I/O related exceptions detailed above.
member SkipRestOfLine: skipNewline: bool -> unit
Skips over any chars before the next newline ("\n"
, "\r\n"
or "\r"
) or the end of the stream. If skipNewline
is true
and a newline is
present, the newline is also skipped.
When this method changes the stream position, it increments the StateTag
by 1; otherwise, it does not change the StateTag
.
This method may throw any of the I/O related exceptions detailed above.
member ReadRestOfLine: skipNewline: bool -> string
ReadRestOfLine(skipNewline)
behaves like SkipRestOfLine(skipNewline)
, except
that it returns a string with the skipped chars (without a newline).
member ReadCharOrNewline: unit -> char
Skips over any single char or standard newline ("\n"
, "\r\n"
or "\r"
).
This method returns '\n'
when it skips a newline. Otherwise, it returns the skipped char, except at the end of the stream, where
it returns the EndOfStreamChar
('\uffff'
).
When this method skips a newline, it also registers it.
When this method skips a char or newline, it increments the StateTag
by 1; otherwise, it does not change the StateTag
.
This method may throw any of the I/O related exceptions detailed above.
member SkipCharsOrNewlines: maxCount: int -> int
Skips over up to maxCount
chars. Returns the number of skipped chars.
The number of actually skipped chars is less than maxCount
if the end of the stream is
reached after less than maxCount
chars.
This method counts standard newlines ("\n"
, "\r\n"
or "\r"
) as single chars. When this method skips a newline, it also registers it.
When this method changes the stream position, it increments the StateTag
by 1; otherwise, it does not change the StateTag
.
An ArgumentOutOfRangeException
is thrown if maxCount
is
negative. This method may also throw any of the I/O related exceptions detailed above.
member ReadCharsOrNewlines: maxCount: int * normalizeNewlines: bool -> string
Behaves like SkipCharsOrNewlines(maxCount)
, except that it returns a string with the skipped
chars.
The normalizeNewlines
parameter determines whether all newlines ("\n"
, "\r\n"
or "\r"
) in the returned string are normalized to '\n'
or whether they are preserved in the original form
they are encountered in the input.
member SkipCharsOrNewlinesWhile: predicate: (char -> bool) -> int
Skips over a sequence of chars that satisfy the predicate
function. Stops at the first
char for which predicate
returns false
. Returns the number of skipped chars.
This method counts standard newlines ("\n"
, "\r\n"
or "\r"
) as single chars and passes them to the predicate function as single '\n'
chars. When
this method skips a newline, it also registers it.
When this method changes the stream position, it increments the StateTag
by 1; otherwise, it does not change the StateTag
.
The predicate
function must not access the CharStream
instance itself, because SkipCharsOrNewlinesWhile
relies on predicate
not having any
side‐effect on the internal state of the stream.
This method may throw any of the I/O related exceptions detailed above.
member SkipCharsOrNewlinesWhile: predicateForFirstChar: (char -> bool) * predicate: (char -> bool) -> int
Behaves like SkipCharsOrNewlinesWhile(predicate)
,
except that the first char to be skipped must satisfy predicateForFirstChar
instead of
predicate
.
member SkipCharsOrNewlinesWhile: predicate: (char -> bool) * minCount: int * maxCount: int -> int
Skips over a sequence of up to maxCount
chars that satisfy the predicate
function, but backtracks to the start if it can only skip less than minCount
chars. Returns the number of skipped chars.
This method counts standard newlines ("\n"
, "\r\n"
or "\r"
) as single chars and passes them to the predicate function as single '\n'
chars. When
this method skips a newline, it also registers it.
An ArgumentOutOfRangeException
is thrown if maxCount
is
negative. This method may also throw any of the I/O related exceptions detailed above.
The predicate
function must not access the CharStream
instance itself, because SkipCharsOrNewlinesWhile
relies on predicate
not having any
side‐effect on the internal state of the stream.
member SkipCharsOrNewlinesWhile: predicateForFirstChar: (char -> bool) * predicate: (char -> bool) * minCount: int * maxCount: int -> int
Behaves like SkipCharsOrNewlinesWhile(predicate, nMin, nMax)
, except that the first char to
be skipped must satisfy predicateForFirstChar
instead of predicate
.
member ReadCharsOrNewlinesWhile: predicate: (char -> bool) * normalizeNewlines: bool -> string
Behaves like SkipCharsOrNewlinesWhile(predicate)
,
except that it returns a string with the skipped chars.
The normalizeNewlines
parameter determines whether all newlines ("\n"
, "\r\n"
or "\r"
) in the returned string are normalized to '\n'
or whether they are preserved in the original form
they are encountered in the input.
member ReadCharsOrNewlinesWhile: predicateForFirstChar: (char -> bool) * predicate: (char -> bool) * normalizeNewlines: bool -> string
Behaves like ReadCharsOrNewlinesWhile(predicate, normalizeNewlines)
, except that the first char to be skipped must satisfy predicateForFirstChar
instead of predicate
.
member ReadCharsOrNewlinesWhile: predicate: (char -> bool) * minCount: int * maxCount: int * normalizeNewlines: bool -> string
Behaves like SkipCharsOrNewlinesWhile(predicate, minCount, maxCount)
, except that it
returns a string with the skipped chars.
The normalizeNewlines
parameter determines whether all newlines ("\n"
, "\r\n"
or "\r"
) in the returned string are normalized to '\n'
or whether they are preserved in the original form
they are encountered in the input.
member ReadCharsOrNewlinesWhile: predicateForFirstChar: (char -> bool) * predicate: (char -> bool) * minCount: int * maxCount: int * normalizeNewlines: bool -> string
Behaves like ReadCharsOrNewlinesWhile(predicate, minCount, maxCount, normalizeNewlines)
, except that the first char to be skipped must satisfy predicateForFirstChar
instead of predicate
.
member SkipCharsOrNewlinesUntilString: str: string * maxCount: int * foundString: out<bool> -> int
Skips over all stream chars before the first occurrence of the specified string or the end of the stream, but not over more than maxCount
chars. Assigns true
to the
output parameter if the string is found, otherwise false
.
This method registers skipped newlines ("\n"
, "\r\n"
or "\r"
) and counts them as single chars. However, no newline normalization takes
place when the argument string str
is matched with the stream chars. Hence, str
should either contain no newlines or only in the form they occur in the stream. If str
starts with '\n'
, then SkipCharsOrNewlinesUntilString
will not find occurences of str
in the stream that start in the middle of an "\r\n"
newline.
When this method changes the stream position, it increments the StateTag
by 1; otherwise, it does not change the StateTag
.
This method throws
-
an
ArgumentException
, if the string argument is empty, and -
an
ArgumentOutRangeException
, ifnMax
is negative.
It may also throw any of the I/O related exceptions detailed above.
member SkipCharsOrNewlinesUntilString: str: string * maxCount: int * normalizeNewlines: bool * skippedCharsIfStringFoundOtherwiseNull: out<string> -> int
Behaves like SkipCharsOrNewlinesUntilString(str, maxCount, outBool)
, except that its output
parameter is a string instead of a boolean. If str
is found, a string with the skipped
chars is assigned to this output parameter; otherwise, null
is assigned to the output
parameter.
The normalizeNewlines
parameter determines whether all newlines ("\n"
, "\r\n"
or "\r"
) in the output string are normalized to '\n'
or are preserved in the original form they are
encountered in the input.
member SkipCharsOrNewlinesUntilCaseFoldedString: caseFoldedString: string * maxCount: int * foundString: out<bool> -> int
Behaves like SkipCharsOrNewlinesUntilString(caseFoldedString, maxCount, foundString)
, except that the chars in the stream are case‐folded before they are compared with caseFoldedChars
.
While the chars in the CharStream
are case‐folded before they are matched, the chars
in the string argument caseFoldedString
are assumed to already be case‐folded (e.g.
with the help of FParsec.Text.FoldCase
). Please also see the above remarks on case‐insensitive matching.
member SkipCharsOrNewlinesUntilCaseFoldedString: caseFoldedString: string * maxCount: int * normalizeNewlines: bool * skippedCharsIfStringFoundOtherwiseNull: out<string> -> int
Behaves like SkipCharsOrNewlinesUntilString(caseFoldedString, maxCount,
normalizeNewlines, skippedCharsIfStringFoundOtherwiseNull)
, except that the chars in the stream are case‐folded before they are compared with caseFoldedChars
.
While the chars in the stream are case‐folded before they are matched, the chars in the string argument caseFoldedString
are assumed to already be case‐folded (e.g. with the help of FParsec.Text.FoldCase
). Please also see the above remarks on case‐insensitive matching.
6.11.2 CharStream<TUserState>
Provides read‐access to a sequence of UTF‐16 chars.
6.11.2.1 Interface
[<Sealed>] type CharStream<'TUserState> = inherit CharStream // has the same constructors as CharStream member UserState: 'TUserState with get, set member State: CharStreamState<'TUserState> member BacktrackTo: CharStreamState<'TUserState> -> unit member ReadFrom: stateWhereStringBegins: CharStreamState<'TUserState> * normalizeNewlines: bool -> string member CreateSubstream<'TSubStreamUserState>: stateWhereSubstreamBegins: CharStreamState<'TUserState> -> CharStream<'TSubStreamUserState>
6.11.2.2 Remarks
The CharStream<'TUserState>
class adds a user definable state component to its base class CharStream
.
The user state is accessible through the property UserState
. It has the type 'TUserState
.
You can retrieve a snapshot of the complete stream state, including the user state, from the State
property. The value returned from the State
property has the type CharStreamState<'TUserState>
. You can pass a CharStreamState
value to the BacktrackTo
method in order to restore a previous state of the CharStream
.
'TUserState
must be an immutable type or at least be treated as an immutable type if
you want BacktrackTo
to completely
restore old values of the user state. Hence, when you need to change the user state, you should set a new 'TUserState
value to the UserState
property of the CharStream
instance, not mutate the existing 'TUserState
value.
6.11.2.3 Members
member UserState: 'TUserState with get, set
The current user state value.
Setting the UserState
value increments the StateTag
by 1, independent of whether the new value is different
from the previous one.
member State: CharStreamState<'TUserState>
member BacktrackTo: CharStreamState<'TUserState> -> unit
Restores the stream to the state represented by the given CharStreamState
value.
For example:
fun (stream: CharStream<'u>) -> let state = stream.State // ... (do something with stream that might change the state) stream.BacktrackTo(state) // restores stream to previous state // ...
This method throws an ArgumentException
if the CharStreamState
instance is zero‐initialized (i.e. constructed with the default value type constructor). It
may also throw any of the I/O related exceptions detailed above.
You may only pass CharStreamState
values that were
retrieved from the CharStream
instance on which you’re
calling BacktrackTo
. Passing a CharStreamState
value that was created for another CharStream
instance triggers an assert exception in debug builds and will otherwise lead to undefined
behaviour.
member ReadFrom: stateWhereStringBegins: CharStreamState<'TUserState> * normalizeNewlines: bool -> string
Returns a string with the chars between the index of the stateWhereStringBegins
(inclusive) and the current Index
of the stream
(exclusive).
The normalizeNewlines
parameter determines whether all newlines ("\n"
, "\r\n"
or "\r"
) in the returned string are normalized to '\n'
or whether they are preserved in the original form
they are encountered in the input. (If stateWhereStringBegins.Line
equals the current Line
, this method will never normalize any newlines in the returned
string.)
This method trows
-
an
ArgumentOutOfRangeException
, ifIndex < GetIndex(stateWhereStringBegins)
, and -
an
ArgumentException
, if theCharStreamState
instance is zero‐initialized (i.e. constructed with the default value type constructor).
It may also throw any of the I/O related exceptions detailed above.
You may only pass CharStreamState
values that were
retrieved from the CharStream
instance on which you’re
calling ReadFrom
. Passing a CharStreamState
value that was created for another CharStream
instance triggers an assert exception in debug builds and will otherwise lead to undefined
behaviour.
member CreateSubstream<'TSubStreamUserState>: stateWhereSubstreamBegins: CharStreamState<'TUserState> -> CharStream<'TSubStreamUserState>
Creates a new CharStream<'TUserState>
instance with the stream chars between the index of the stateWhereSubstreamBegins
(inclusive) and the current Index
of the stream (exclusive).
The state of the substream is initialized to stateWhereSubstreamBegin
, so that the
stream and the substream will report the same position (Index
, Line
, LineBegin
and Name
) for corresponding chars. However, the beginning and end will
normally differ between stream and substream, in particular the IndexOfFirstChar
and IndexOfLastCharPlus1
values will normally differ between stream and substream.
An example:
open FParsec open FParsec.Primitives open FParsec.CharParsers open FParsec.Error let embeddedBlock (beginDelim: string) (endDelim: string) : Parser<_,_> = let expectedEmbeddedBlock = expected "embedded block" fun stream -> if stream.Skip(beginDelim) then let stateAtBegin = stream.State let mutable foundString = false let maxChars = System.Int32.MaxValue stream.SkipCharsOrNewlinesUntilString(endDelim, maxChars, &foundString) |> ignore if foundString then // create substream with content between beginDelim and endDelim use substream = stream.CreateSubstream<unit>(stateAtBegin) // here we would normally work with the substream, // in this example we will just extract the string content let str = substream.ReadCharsOrNewlines(System.Int32.MaxValue, true) Reply(str) else Reply(Error, expectedString endDelim) else Reply(Error, expectedEmbeddedBlock)
> run (embeddedBlock "/*" "*/") "/*substream content*/";; val it : ParserResult<string,unit> = Success: "substream content"
This note does not apply to the Low‐Trust version of FParsec.
If you create a substream for a CharStream
instance with
more than one block, the content of the substream needs to be copied. Thus, you can minimize the overhead associated with creating a
substream by ensuring that the CharStream
has only one
block, either by choosing a sufficiently large blockSize
, or by creating the CharStream
from a string or char buffer.
You may use a stream and its substreams concurrently. However, notice the following warning:
This note does not apply to the Low‐Trust version of FParsec.
You may not dispose a stream before all of its substreams are disposed. Disposing a stream before all its substreams are disposed
triggers an assert exception in debug builds and otherwise lead to undefined behaviour.
This method trows
-
an
ArgumentOutOfRangeException
, ifIndex < GetIndex(stateWhereSubstreamBegins)
, and -
an
ArgumentException
, if theCharStreamState
instance is zero‐initialized (i.e. constructed with the default value type constructor).
It may also throw any of the I/O related exceptions detailed above.
You may only pass CharStreamState
values that were
retrieved from the CharStream
instance on which you’re
calling CreateSubstream
. Passing a CharStreamState
value that was created for another CharStream
instance triggers an assert exception in debug builds and will
otherwise lead to undefined behaviour.
6.11.3 CharStreamIndexToken
An opaque representation of a CharStream
char index.
type CharStreamIndexToken = struct member GetIndex: CharStream -> int64 end
CharStream
methods can handle CharStreamIndexToken
values more efficiently than integer char indices.
You can retrieve CharStreamIndexToken
values from the CharStream.IndexToken
and CharStreamState<_>.IndexToken
properties.
You can get the char index corresponding to a given CharStreamIndexToken
value by calling
its GetIndex
method with the CharStream
instance from which the token was retrieved.
Zero‐initialized CharStreamIndexToken
values constructed with the default value type
constructor are not valid and trying to call a CharStream
method with such an instance will trigger an exception.
A CharStreamIndexToken
instance may only be used together with the CharSteam
instance it was created for.
member GetIndex: CharStream -> int64
Returns the stream index represented by the CharStreamIndexToken
instance.
The CharStream
instance passed as the argument must be the
CharStream
instance from which the CharStreamIndexToken
was retrieved. Passing a different CharStream
instance triggers an assert exception in debug builds and will otherwise lead to undefined
behaviour.
An InvalidOperationException
is thrown if the CharStreamIndexToken
value is zero‐initialized (i.e. constructed with the default value type constructor).
6.11.4 CharStreamState
An immutable value type representation of the state of a CharStream
.
type CharStreamState<'TUserState> = struct member Tag: int64 member IndexToken: CharStreamIndexToken member Line: int64 member LineBegin: int64 member Name: string member UserState: 'TUserState member GetIndex: CharStream<'TUserState> -> int64 member GetPosition: CharStream<'TUserState> -> Position end
You can retrieve CharStreamState
values from the CharStream<_>.State
property. By passing a CharStreamState
value to the BacktrackTo
method of a CharStream<_>
instance, you can restore the stream to the state
represented by the CharStreamState
value.
Zero‐initialized CharStreamState
values constructed with the default value type constructor
are not valid and trying to call a CharStream
method with
such an instance will trigger an exception.
A CharStreamState
instance may only be used together with the CharSteam
instance it was created for.
member GetIndex: CharStream<'TUserState> -> int64
state.GetIndex(stream)
is an optimized implementation of state.IndexToken.GetIndex(stream)
.
The CharStream<'TUserState>
instance passed as the argument must be the CharStream
instance from which the CharStreamState
was retrieved. Passing a different CharStream
instance triggers an assert exception in debug builds and will otherwise lead to undefined
behaviour.
An InvalidOperationException
is thrown if the CharStreamState
instance is zero‐initialized (i.e. constructed with the default value type constructor).
member GetPosition: CharStream<'TUserState> -> Position
state.GetPosition(stream)
is an optimized implementation of new Position(state.Name, state.GetIndex(stream), state.Line, state.Column)
.
The CharStream<'TUserState>
instance passed as the argument must be the CharStream
instance from which the CharStreamState
was retrieved. Passing a different CharStream
instance triggers an assert exception in debug builds and will otherwise lead to undefined
behaviour.
An InvalidOperationException
is thrown if the CharStreamState
instance is zero‐initialized (i.e. constructed with the default value type constructor).
6.11.5 TwoChars
An immutable value type representation of two chars:
type TwoChars = struct new: char0: char * char1: char -> TwoChars val Char0: char val Char1: char end
[1] |
The detection of invalid byte sequences by the .NET decoders is not entirely reliable. For example, System.Text.UnicodeEncoding (UTF‐16) has an alignment related bug in .NET versions prior to 4.0 that sometimes leads to invalid
surrogate pairs not being detected. The implementations of more complicated encodings, like GB18030, ISO‐2022 and ISCII, also have several issues with regard to the detection
of invalid input data.
|
---|