6.11 FParsec.CharStream

6.11.1 CharStream

Provides read‐access to a sequence of UTF‐16 chars.

6.11.1.1 Interface

// FParsecCS.dll

namespace FParsec

type CharStream =
  interface System.IDisposable

  new:    chars: string * index: int * length: int
       -> CharStream
  new:    chars: string * index: int * length: int * streamBeginIndex: int64
       -> CharStream

  new:    chars: char[] * index: int * length: int
       -> CharStream
  new:    chars: char[] * index: int * length: int * streamBeginIndex: int64
       -> CharStream

  new:    chars: NativePtr<char> * length: int
       -> CharStream
  new:    chars: NativePtr<char> * length: int * streamBeginIndex: int64
       -> CharStream

  new:    path: string * encoding: System.Text.Encoding
       -> CharStream
  new:    path: string
        * encoding: System.Text.Encoding * detectEncodingFromByteOrderMarks: bool
       -> CharStream
  new:    path: string
        * encoding: System.Text.Encoding * detectEncodingFromByteOrderMarks: bool
        * blockSize: int * blockOverlap: int * minRegexSpace: int
        * byteBufferLength: int
       -> CharStream

  new:    stream: System.IO.Stream * encoding: System.Text.Encoding
       -> CharStream
  new:    stream: System.IO.Stream * leaveOpen: bool
        * encoding: System.Text.Encoding
       -> CharStream
  new:    stream: System.IO.Stream * leaveOpen: bool
        * encoding: System.Text.Encoding * detectEncodingFromByteOrderMarks: bool
       -> CharStream
  new:    stream: System.IO.Stream * leaveOpen: bool
        * encoding: System.Text.Encoding * detectEncodingFromByteOrderMarks: bool
        * blockSize: int * blockOverlap: int * minRegexSpace: int
        * byteBufferLength: int
       -> CharStream

  member Dispose: unit -> unit

  member BlockOverlap: int

  member IndexOfFirstChar: int64
  member IndexOfLastCharPlus1: int64

  member IsBeginOfStream: bool
  member IsEndOfStream: bool

  member Index: int64
  member IndexToken: CharStreamIndexToken
  member Line: int64
  member LineBegin: int64
  member Column: int64
  member Name: string with get, set
  member Position: Position

  val mutable StateTag: uint64

  member Seek: index: int64 -> unit
  member Seek: indexToken: CharStreamIndexToken -> unit

  static val EndOfStreamChar: char

  member Peek:  unit -> char
  member Peek2: unit -> TwoChars
  member Peek:  utf16Offset: int    -> char
  member Peek:  utf16Offset: uint32 -> char

  member PeekString: length: int -> string
  member PeekString: buffer: char[] * bufferIndex: int * length: int -> int
  member PeekString: buffer: NativePtr<char> * length: int -> int

  member Match: char -> bool
  member Match: chars: string -> bool
  member Match: chars: char[] * charsIndex: int * length: int -> bool
  member Match: chars: NativePtr<char> * length: int -> bool

  member MatchCaseFolded: caseFoldedChar: char -> bool
  member MatchCaseFolded: caseFoldedChars: string -> bool
  member MatchCaseFolded: caseFoldedChars: NativePtr<char> * length:int -> bool

  member Match: System.Text.RegularExpressions.Regex
                -> System.Text.RegularExpressions.Match
  member MinRegexSpace: int with get, set

  member RegisterNewline: unit -> bool
  member RegisterNewlines: lineOffset: int   -> newColumnMinus1: int   -> bool
  member RegisterNewlines: lineOffset: int64 -> newColumnMinus1: int64 -> bool

  // The following methods require manual registration of skipped newlines

  member Skip: unit -> unit
  member Skip: utf16Offset: int    -> unit
  member Skip: utf16Offset: uint32 -> unit
  member Skip: utf16Offset: int64  -> unit

  member SkipAndPeek: unit   -> char
  member SkipAndPeek: utf16Offset: int    -> char
  member SkipAndPeek: utf16Offset: uint32 -> char

  member Skip: char -> bool
  member Skip: TwoChars -> bool
  member Skip: chars: string -> bool
  member Skip: chars: char[] * charsIndex: int * length: int -> bool
  member Skip: chars: NativePtr<char> * length: int -> bool

  member SkipCaseFolded: caseFoldedChar: char -> bool
  member SkipCaseFolded: caseFoldedChars: string -> bool
  member SkipCaseFolded: caseFoldedChars: NativePtr<char> * length:int -> bool

  member Read: unit -> char
  member Read: length: int -> string
  member Read: buffer: char[] * bufferIndex: int * length: int -> int
  member Read: buffer: NativePtr<char> * length: int -> int

  member ReadFrom: indexOfFirstChar: CharStreamIndexToken -> string

  // The following methods automatically register skipped newlines

  member SkipWhitespace: unit -> bool
  member SkipUnicodeWhitespace: unit -> bool

  member SkipNewline: unit -> bool
  member SkipUnicodeNewline: unit -> bool

  member SkipNewlineThenWhitespace:
      powerOf2TabStopDistance: int * allowFormFeed: bool -> int

  member SkipRestOfLine: skipNewline: bool -> unit
  member ReadRestOfLine: skipNewline: bool -> string

  member ReadCharOrNewline: unit -> char

  member SkipCharsOrNewlines: maxCount: int -> int
  member ReadCharsOrNewlines: maxCount: int * normalizeNewlines: bool -> string

  member SkipCharsOrNewlinesWhile:
      predicate: (char -> bool) -> int
  member SkipCharsOrNewlinesWhile:
      predicateForFirstChar: (char -> bool) * predicate: (char -> bool) -> int
  member SkipCharsOrNewlinesWhile:
      predicate: (char -> bool) * minCount: int * maxCount: int -> int
  member SkipCharsOrNewlinesWhile:
      predicateForFirstChar: (char -> bool) * predicate: (char -> bool)
    * minCount: int * maxCount: int -> int

  member ReadCharsOrNewlinesWhile:
      predicate: (char -> bool)
    * normalizeNewlines: bool -> string
  member ReadCharsOrNewlinesWhile:
      predicateForFirstChar: (char -> bool) * predicate: (char -> bool)
    * normalizeNewlines: bool -> string
  member ReadCharsOrNewlinesWhile:
      predicate: (char -> bool)
    * minCount: int * maxCount: int * normalizeNewlines: bool -> string
  member ReadCharsOrNewlinesWhile:
      predicateForFirstChar: (char -> bool) * predicate: (char -> bool)
    * minCount: int * maxCount: int * normalizeNewlines: bool -> string

  member SkipCharsOrNewlinesUntilString:
      str: string * maxCount: int
    * foundString: out<bool> -> int
  member SkipCharsOrNewlinesUntilString:
      str: string * maxCount: int * normalizeNewlines: bool
    * skippedCharsIfStringFoundOtherwiseNull: out<string> -> int

  member SkipCharsOrNewlinesUntilCaseFoldedString:
      caseFoldedString: string * maxCount: int
    * foundString: out<bool> -> int
  member SkipCharsOrNewlinesUntilCaseFoldedString:
      caseFoldedString: string * maxCount: int * normalizeNewlines: bool
    * skippedCharsIfStringFoundOtherwiseNull: out<string> -> int

6.11.1.2 Remarks

The CharStream class provides a unified interface for efficiently reading UTF‐16 chars from a binary stream or an in‐memory char buffer (e.g. a string). It is optimized for the use in backtracking parser applications and supports arbitrary char‐based seeking, even for streams larger than the addressable memory (on 32‐bit platforms).

The CharStream class is the base class of CharStream<'TUserState>, which adds a user‐definable state component and some convenience methods for working with the state of a CharStream instance.

A CharStream constructed from a System.IO.Stream or a file path reads the stream block‐wise and only holds the most recently accessed block in memory. The blocks overlap in order to provide efficient access on the boundary between blocks.

If the char content is already available as a string or a char array, a CharStream can be directly constructed from the char buffer (without needing to copy the buffer). The overhead of accessing an in‐memory char buffer through a CharStream is minimal.

Position information

The position of the next char in the stream is described by the following 4 properties:

  • Index, the index of the UTF‐16 char in the stream,
  • Line, the line number for the next char,
  • LineBegin, the index of the first char of the line that also contains the next char,
  • Name, a description or identifier for the stream.

The LineBegin can be combined with the Index to calculate a Column number.

Among these properties the char index is the most important one, as the CharStream uses it to uniquely identify a UTF‐16 char in the stream.

The other 3 properties further describe the text location of the char identified by the index, but they are not necessary for the core functionality of the CharStream class. The CharStream class keeps track of this additional position information to provide a more convenient interface to higher‐level library functions, in particular to assist debugging and error reporting purposes.

Newlines

For performance reasons the most basic stream operations do not automatically recognize newlines (end‐of‐line markers) in the stream content. If you skip any newline with these methods, you have to manually register the newline afterwards with one of the RegisterNewline methods (otherwise the line and column count becomes incorrect).

In order to provide a convenient interface for parser routines, the CharStream class also provides some more advanced methods that automatically register any skipped standard newline ("\n", "\r\n" and "\r"). Additionally, it provides two methods that automatically register any Unicode newline (SkipUnicodeWhitespace and SkipUnicodeNewline).

It should be obvious from the method names which methods automatically register newlines and which don’t.

Case‐insensitive matching

The MatchCaseFolded and SkipCaseFolded members match the content of the stream “case‐insensitively” with a reference string. In this instance “case‐insensitive” means that before the chars are matched with the reference string they are mapped to a canonical form where case differences are erased. For performance reasons MatchCaseFolded only applies the (non‐Turkic) 1‐to‐1 case folding mappings (v. 8.0.0) for Unicode code points in the Basic Multilingual Plane, i.e. code points below 0x10000. These mappings are sufficient for many case‐insensitive parser grammars encountered in practice, but they are not appropriate for matching arbitrary natural language content. Please also note that the CharStream class performs no Unicode normalization.

Non‐sequential access

This note does not apply to the Low‐Trust version of FParsec.
If you construct a CharStream from a System.IO.Stream or a file path and you backtrack over a distance long enough to require the CharStream to reread a previous block, then the underlying byte stream needs to support seeking, otherwise a NotSupportedException is thrown. Furthermore, the Decoder for the input Encoding must be serializable if you backtrack to a block other than the first in the stream. Note that file streams created for regular disk files are always seekable and all the .NET standard decoders are serializable. In order to support non‐seekable streams for applications which don’t require extensive backtracking, no exception will be thrown before an operation actually requires backtracking and the necessary capabilities of the stream or decoder are not available.

Decoder errors

A CharStream constructed from a binary input stream decodes the input data with the help of a Decoder instance obtained via the Encodings’s GetDecoder method. Depending on the configuration of the encoding the decoder might throw an exception if it encounters invalid byte sequences, usually a System.Text.DecoderFallbackException or a System.IO.ArgumentException. [1]

Disposable interface

This note does not apply to the Low‐Trust version of FParsec.
A CharStream holds managed and unmanaged resources that need to be explicitly released. Hence, it is very important that CharStream objects are promptly disposed after use. Where possible CharStream objects should only be used within a “using” block (C#), a “use” expression( F#) or similar constructs in other languages.

Thread safety

CharStream members are not thread‐safe.

Low‐Trust version

If you compile FParsec with the LOW_TRUST conditional compiler symbol, the CharStream class differs from the normal version as follows:

  • No unverifiable code involving pointers is used. This allows FParsec to be executed in an environment with reduced trust, such as medium trust ASP.NET applications or Silverlight applications.
  • A CharStream that is constructed from a System.IO.Stream or a file path reads the complete file into a single string during construction. This severely limits the maximum practical stream size.
  • Although the CharStream class still supports the IDisposable interface, disposing the CharStream instances is no longer necessary, since no resources are hold that need to be explicitly released.

See also section 3.5.

6.11.1.3 I/O exceptions

If you construct a CharStream from a System.IO.Stream or a file path, the constructor and any CharStream operation that requires reading chars from the underlying byte stream may throw one of the following exceptions.

In the Low‐Trust version, the constructor decodes the complete byte stream and hence only the constructor may throw one of these exceptions.

Note

Doing actual work in a constructor and potentially throwing exceptions seems to be a somewhat controversial design. We think it’s the right choice for the CharStream class, because this way you can a have a reasonable expectation that the CharStream actually works after you’ve successfully constructed it.

In general it is not safe to continue to use a CharStream instance after one of these exceptions was thrown, though calling Dispose() is always safe.

NotSupportedException

Seeking of the underlying byte stream is required, but the byte stream does not support seeking or the Encoding’s Decoder is not serializable. See also the remarks above on non‐sequential access.

IOException

An I/O occurred while reading data from the underlying byte stream.

ArgumentException

The underlying byte stream contains invalid bytes and the Encoding was constructed with the throwOnInvalidBytes option.

DecoderFallbackException

The underlying byte stream contains invalid bytes for which the decoder fallback threw this exception.

The byte index of the invalid bytes in the stream is stored as a boxed System.Int64 in the "Stream.Position" entry of the Data member of the exception instance. The precision of the index depends on the precision of the DecoderFallbackException’s Index member. If the underlying System.IO.Stream is not seekable, the byte index only takes into account the bytes read by the CharStream, but not any bytes read before the CharStream was constructed.

6.11.1.4 Members

new:    chars: string * index: int * length: int
     -> CharStream

Is equivalent to new CharStream(chars, index, length, 0L).

new:    chars: string * index: int * length: int * streamBeginIndex: int64
     -> CharStream

Constructs a CharStream from the chars in the string argument between the indices index (inclusive) and index + length (exclusive). By directly referencing the chars in the string this constructor avoids any copy of the string content.

The first char in the stream is assigned the index streamBeginIndex. A positive streamBeginIndex allows you for example to create a substream of another CharStream, i.e. a CharStream instance that only contains a sub‐segment of another char stream but is accessible through the same char indices.

chars must not be null. An ArgumentOutOfRangeException is thrown if the arguments do not satisfy the following conditions:

  • index ≥ 0, length ≥ 0, index + lengthchars.Length and
  • 0 ≤ streamBeginIndex < 260.
Important

This note does not apply to the Low‐Trust version of FParsec.
The given string is “pinned” until the CharStream is disposed. Pinning the string prevents the GC from moving it around in memory during garbage collection. On .NET (at least in versions up to and including 4.0) the pinning has no effect if the string is large enough to be allocated on the Large Object Heap, i.e. has a length of about 42500 chars or more. However, pinning smaller strings does constrain the normal operations of the GC. Thus, to minimize the negative impact on the GC, you should dispose CharStream instances constructed from small strings as soon as you’re done parsing it. If you keep a large number of CharStream instances constructed from small strings around for an extended period of time, you risk fragmenting the heap.

new:    chars: char[] * index: int * length: int
     -> CharStream

This constructor is not available in the Low‐Trust version of FParsec.

Is equivalent to new CharStream(chars, index, length, 0L).

new:    chars: char[] * index: int * length: int * streamBeginIndex: int64
     -> CharStream

This constructor is not available in the Low‐Trust version of FParsec.

Constructs a CharStream from the chars in the char array argument between the indices index (inclusive) and index + length (exclusive). By directly referencing the chars in the char array this constructor avoids any copy of the char array content.

The first char in the stream is assigned the index streamBeginIndex. A positive streamBeginIndex allows you for example to create a substream of another CharStream, i.e. a CharStream instance that only contains a sub‐segment of another char stream but is accessible through the same char indices.

chars must not be null. An ArgumentOutOfRangeException is thrown if the arguments do not satisfy the following conditions:

  • index ≥ 0, length ≥ 0, index + lengthchars.Length and
  • 0 ≤ streamBeginIndex < 260.
Note

A CharStream constructed from a char array does not support .NET regex matching via the Match method.

Important

The given char array is “pinned” until the CharStream is disposed. Pinning the char array prevents the GC from moving it around in memory during garbage collection. On .NET (at least in versions up to and including 4.0) the pinning has no effect if the char array is large enough to be allocated on the Large Object Heap, i.e. has a length of about 42500 chars or more. However, pinning smaller char arrays does constrain the normal operations of the GC. Thus, to minimize the negative impact on the GC, you should dispose CharStream instances constructed from small char arrays as soon as you’re done parsing it. If you keep a large number of CharStream instances constructed from small char arrays around for an extended period of time, you risk fragmenting the heap.

new:    chars: NativePtr<char> * length: int
     -> CharStream

This constructor is not available in the Low‐Trust version of FParsec.

Is equivalent to new CharStream(chars, length, 0L).

new:    chars: NativePtr<char> * length: int * streamBeginIndex: int64
     -> CharStream

This constructor is not available in the Low‐Trust version of FParsec.

Constructs a CharStream from the length chars at the pointer address. By directly referencing the chars at the pointer address this constructor avoids any copy of the char buffer.

The first char in the stream is assigned the index streamBeginIndex. A positive streamBeginIndex allows you for example to create a substream of another CharStream, i.e. a CharStream instance that only contains a sub‐segment of another char stream but is accessible through the same char indices.

chars must not be null. An ArgumentOutOfRangeException is thrown if the arguments do not satisfy the following conditions:

  • length ≥ 0, chars + length must not overflow and
  • 0 ≤ streamBeginIndex < 260.
Note

A CharStream constructed from a pointer does not support .NET regex matching via the Match method.

new:    path: string * encoding: System.Text.Encoding
     -> CharStream

Is equivalent to new CharStream(path, encoding, true).

new:    path: string
      * encoding: System.Text.Encoding * detectEncodingFromByteOrderMarks: bool
     -> CharStream

Is equivalent to

new CharStream(
    path, encoding, detectEncodingFromByteOrderMarks,
    blockSize = DefaultBlockSize (* = 3*2^16 ≈ 200k *),
    blockOverlap = DefaultBlockSize/3,
    minRegexSpace = ((DefaultBlockSize/3)*2)/3,
    byteBufferLength = DefaultByteBufferLength
)
new:    path: string
      * encoding: System.Text.Encoding * detectEncodingFromByteOrderMarks: bool
      * blockSize: int * blockOverlap: int * minRegexSpace: int
      * byteBufferLength: int
     -> CharStream

Constructs a CharStream from a FileStream as if by calling

new CharStream(
    new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read, 4096,
                   FileOptions.SequentialScan),
    leaveOpen = false,
    encoding = encoding,
    detectEncoding = true,
    blockSize = DefaultBlockSize (* = 3*2^16 ≈ 200k *),
    blockOverlap = DefaultBlockSize/3,
    minRegexSpace = ((DefaultBlockSize/3)*2)/3,
    byteBufferLength = DefaultByteBufferLength
)

If an exception occurs after the FileStream is constructed but before the CharStream constructor is finished, the FileStream is disposed.

Note

The FileStream constructor might throw an exception, too.

new:    stream: System.IO.Stream * encoding: System.Text.Encoding
     -> CharStream

Is equivalent to new CharStream(stream, false, encoding, true).

new:    stream: System.IO.Stream * leaveOpen: bool
      * encoding: System.Text.Encoding
     -> CharStream

Is equivalent to new CharStream(stream, leaveOpen, encoding, true).

new:    stream: System.IO.Stream * leaveOpen: bool
      * encoding: System.Text.Encoding * detectEncodingFromByteOrderMarks: bool
     -> CharStream

Is equivalent to

new CharStream(
    stream, leaveOpen, encoding, detectEncodingFromByteOrderMarks,
    blockSize = DefaultBlockSize (* = 3*2^16 ≈ 200k *),
    blockOverlap = DefaultBlockSize/3,
    minRegexSpace = ((DefaultBlockSize/3)*2)/3,
    byteBufferLength = DefaultByteBufferLength
)
new:    stream: System.IO.Stream * leaveOpen: bool
      * encoding: System.Text.Encoding * detectEncodingFromByteOrderMarks: bool
      * blockSize: int * blockOverlap: int * minRegexSpace: int
      * byteBufferLength: int
     -> CharStream

Constructs a CharStream from a System.IO.Stream.

The normal version of the CharStream class supports stream sizes up to approximately (231/p)×(blockSizeblockOverlap) chars, where p is 4 on a 32‐bit CLR and 8 on a 64‐bit CLR.
The Low‐Trust version only supports streams small enough that the complete content can be read into a single string.

Note

This constructor reads the first block of chars from the input stream and hence can throw any of the I/O related exceptions detailed in the exceptions section above.

Arguments:

stream

The byte stream providing the input. If stream.CanRead returns false, an ArgumentException is thrown.

leaveOpen

Indicates whether the stream should be left open when the CharStream has finished reading it.

encoding

The default Encoding used for decoding the byte stream into chars.

If the preamble returned by encoding.GetPreamble() is present at the beginning of the stream, the CharStream will skip over it.

detectEncodingFromByteOrderMarks

Indicates whether the constructor should detect the encoding from a unicode byte‐order mark at the beginning of the stream. An encoding detected from a byte‐order mark overrides the default encoding. The standard byte‐order marks for the following encodings are supported: UTF‐8, UTF‐16 LE/BE and UTF‐32 LE/BE.

blockSize

The number of chars per block. The value is rounded up to the first positive multiple of 1536. The default is 3×216 ≈ 200k.

blockOverlap

The number of chars at the end of a block that are preserved when reading the next block into into its internal char buffer. If this value is less than encoding.GetMaxCharCount(1) or not less than blockSize/2, the default value is used instead. The default is blockSize/3.

byteBufferLength

The size of the byte buffer used for decoding purposes. The default is 212 = 4KB.

member Dispose: unit -> unit

Releases all resources used by the CharStream. If the CharStream was constructed from a System.IO.Stream or a file path and the constructor was not called with leaveOpen = true, the byte stream is closed.

member BlockOverlap: int

The number of chars at the end of a block that are preserved when the CharStream reads the next block into its internal char buffer.

This value is only relevant for optimization purposes and as the maximum value for MinRegexSpace.

This value can only be set at construction time with the respective constructor parameter.

If the CharStream is constructed from a string, char array or char pointer or only contains 1 block, then this value is 0. In the Low‐Trust version this value is always 0.

member IndexOfFirstChar: int64

The index of the first char in the stream. This value is determined by the streamIndexOffset argument of some of the CharStream constructors. By default this value is 0.

member IndexOfLastCharPlus1: int64

The index of the last char of the stream plus 1, or Int64.MaxValue if the end of the stream has not yet been detected.

member IsBeginOfStream: bool

Indicates whether the next char in the stream is the first char, i.e. whether Index equals IndexOfFirstChar.

If the stream is empty, this value is always true.

member IsEndOfStream: bool

Indicates whether there is no char remaining in the stream, i.e. whether Index equals IndexOfLastCharPlus1.

If the stream is empty, this value is always true.

member Index: int64

The stream index of the next char.

member IndexToken: CharStreamIndexToken

A CharStreamIndexToken value representing the current Index value.

member Line: int64

The line number for the next char. (The line count starts with 1.)

member LineBegin: int64

The stream index of the first char of the line that also contains the next char.

member Column: int64

The UTF‐16 column number of the next char, i.e. IndexLineBegin + 1.

member Name: string with get, set

This string is used in error messages to describe the input stream.

If the CharStream is constructed from a file path, the constructor initializes the Name value with the file path value. Otherwise, Name is initialized to null.

If the stream content is the concatenated content of multiple input files, you can improve error messages and help debugging by setting the name and resetting the line and column count at the transitions between the different content pieces.

Setting the Name value increments the StateTag by 1, independent of whether the new value is different from the previous one.

member Position: Position

Returns new Position(Name, Index, Line, Column).

val mutable StateTag: uint64

The StateTag’s purpose is to provide an efficient way to determine whether the publically visible state of the CharStream has changed after a series of method calls. For the purpose of this property, the state is defined as the aggregate of the Index, Line, LineBegin and Name values. The UserState value of CharStream<'UserState> instances is also part of the CharStream state. If a method or property setter changes one or more of these state values it increments the StateTag’s by 1. Thus, to determine whether a series of method calls has changed the CharStream, it is often enough to compare the StateTag values from before and after the method calls.

The StateTag property is primarily meant for use in the implementation of parser combinators. If you directly call CharStream methods, you normally don’t need the StateTag to determine whether the state has changed, because that is usually obvious from either the method’s return value or the context in which it was called. Please see section 5.4.3 for more details on the design rationale behind the StateTag.

member Seek: index: int64 -> unit

Seeks the CharStream to the char with the specified index in the stream.

If you pass an index larger than the index of the last char in the stream, this method seeks the stream to the end of the stream, i.e. to one char past the last char in the stream.

The index is zero‐based, except if the CharStream was constructed with a positive streamIndexOffset argument, in which case the index of the first char equals the value of the streamIndexOffset argument (and the IndexOfFirstChar value).

When this method changes the stream position, it increments the StateTag by 1. When it does not change the position, it may or may not increment the StateTag by 1.

An ArgumentOutOfRangeException is thrown if the index is less than the IndexOfFirstChar. This method may also throw any of the I/O related exceptions detailed above.

member Seek: indexToken: CharStreamIndexToken -> unit

This method is an optimized implementation of Seek(GetIndex(indexToken)).

static val EndOfStreamChar: char

The char returned by Peek and Read at the end of the stream.

The value is '\uFFFF'.

member Peek: unit -> char

Returns the next char without changing the state of the CharStream.

At the end of the CharStream the EndOfStreamChar ('\uFFFF') is returned.

member Peek2: unit -> TwoChars

Peek2() is an optimized implementation of new TwoChars(Peek(), Peek(1)).

member Peek: utf16Offset: int -> char

Returns the char at the stream index Index + utf16Offset, without changing the state of the CharStream.

If Index + utf16Offset is smaller than the index of the first char in the stream or larger than the index of the last char in the stream, the EndOfStreamChar ('\uFFFF') is returned.

This method may throw any of the I/O related exceptions detailed above.

member Peek: utf16Offset: uint32 -> char

This method is an optimized implementation of Peek(int) for uint32 arguments.

member PeekString: length: int -> string

Returns a string with the next length stream chars, without changing the state of the CharStream.

If less than length chars are remaining in the stream, only the remaining chars are returned.

Note

This note does not apply to the Low‐Trust version of FParsec.
If length is greater than the number of remaining chars in the stream, a temporary string with length chars may be allocated. For very large length values this might lead to an OutOfMemoryException even though a string with only the remaining chars in the stream would comfortably fit into memory.

Please also note that the maximum length of a string on .NET is less than 230. Allocating a string larger than the maximum length will always yield an OutOfMemoryException, even on 64‐bit systems with enough physical memory.

If length is negative, an ArgumentOutOfRangeException is thrown. This method may also throw any of the I/O related exceptions detailed above.

member PeekString: buffer: char[] * bufferIndex: int * length: int -> int

Copies the next length stream chars into buffer, without changing the state of the CharStream. Returns the number of chars copied.

The chars are written into buffer beginning at the index bufferIndex. If less than length chars are remaining in the stream, only the remaining chars are copied.

An ArgumentOutOfRangeException is thrown if the arguments do not satisfy the following conditions: bufferIndex ≥ 0, length ≥ 0 and bufferIndex + lengthbuffer.Length. This method may also throw any of the I/O related exceptions detailed above.

member PeekString: buffer: NativePtr<char> * length: int -> int

This method is not available in the Low‐Trust version of FParsec.

Copies the next length stream chars into the buffer at the specified pointer address, without changing the state of the CharStream. Returns the number of chars copied.

If less than length chars are remaining in the stream, only the remaining chars are copied.

If length is negative, an ArgumentOutOfRangeException is thrown. This method may also throw any of the I/O related exceptions detailed above.

member Match: char -> bool

Returns true if the next char in the stream matches the specified char. At the end of the stream Match always returns false.

This method does not change the state of the CharStream.

This method may throw any of the I/O related exceptions detailed above.

member Match: chars: string -> bool

Returns true if the passed string chars matches the next chars.Length stream chars.

If not all the chars match or if there are not enough chars remaining in the stream, false is returned. If chars is empty, true is returned. chars must not be null.

This method does not change the state of the CharStream.

This method may throw any of the I/O related exceptions detailed above.

member Match: chars: char[] * charsIndex: int * length: int -> bool

Returns true if the next length stream chars match the chars in the array chars at the indices charIndex to charsIndex + length - 1.

If not all the chars match or if there are not enough chars remaining in the stream, false is returned. If length is 0, true is returned. chars must not be null.

This method does not change the state of the CharStream.

An ArgumentOutOfRangeException is thrown if the arguments do not satisfy the following conditions: charsIndex ≥ 0, length ≥ 0 and charsIndex + lengthchars.Length. This method may also throw any of the I/O related exceptions detailed above.

member Match: chars: NativePtr<char> * length: int -> bool

This method is not available in the Low‐Trust version of FParsec.

Returns true if the next length stream chars match the chars at the specified pointer address.

If not all the chars match or if there are not enough chars remaining in the stream, false is returned. If length is 0, true is returned.

This method does not change the state of the CharStream.

If length is negative, an ArgumentOutOfRangeException is thrown. This method may also throw any of the I/O related exceptions detailed above.

member MatchCaseFolded: caseFoldedChar: char -> bool

Behaves like Match(caseFoldedChar), except that the next char in the stream is case‐folded before it is compared with caseFoldedChar.

Note

While the char in the stream is case‐folded before it is matched, the char caseFoldedChar is assumed to already be case‐folded (e.g. with the help of FParsec.Text.FoldCase). Please also see the above remarks on case‐insensitive matching.

member MatchCaseFolded: caseFoldedChars: string -> bool

Behaves like Match(caseFoldedChars), except that the chars in the stream are case‐folded before they are compared with caseFoldedChars.

Note

While the chars in the CharStream are case‐folded before they are matched, the chars in the string argument caseFoldedChars are assumed to already be case‐folded (e.g. with the help of FParsec.Text.FoldCase). Please also see the above remarks on case‐insensitive matching.

member MatchCaseFolded: caseFoldedChars: NativePtr<char> * length:int -> bool

This method is not available in the Low‐Trust version of FParsec.

Behaves like Match(caseFoldedChars, length), except that the chars in the stream are case‐folded before they are compared with the chars at the pointer address caseFoldedChars.

Note

While the chars in the CharStream are case‐folded before they are matched, the chars at the pointer address caseFoldedChars are assumed to already be case‐folded (e.g. with the help of FParsec.Text.FoldCase). Please also see the above remarks on case‐insensitive matching.

Applies the given regular expression to the stream chars beginning with the next char. Returns the resulting Match object.

For performance reasons you should specify the regular expression such that it can only match at the beginning of a string, for example by prepending "\\A".

For CharStream instances constructed from strings the regular expression is applied to a string containing all the remaining chars in the stream.

For CharStream instances constructed from large binary streams (with more than 1 block) the regular expression is not applied to a string containing all the remaining chars in the stream. Here the MinRegexSpace value determines the minimum number of chars that are guaranteed to be visible to the regular expression (assuming there are still enough chars remaining in the stream). The exact number of chars visible to the regular expression may be affected even by calls to CharStream methods like Peek or Match that otherwise guarantee to not change the (outwardly visible) state of the CharStream.

This method may throw any of the I/O related exceptions detailed above.

Important

This note does not apply to the Low‐Trust version of FParsec.
This method is not supported by CharStream instances constructed directly from char arrays or pointers. A NotSupportedException is thrown if this method is called on such a CharStream instance.

Important

This note does not apply to the Low‐Trust version of FParsec.
If the CharStream was constructed from a System.IO.Stream or a file path, the regular expression is applied to an internal mutable buffer. Since the Match object may work lazily, i.e. compute return values not before they are needed, you need to retrieve all the required information from the Match object before you continue to access the CharStream, otherwise you might get back invalid match results. Note that all strings returned by the Match object are, of course, immutable.

member MinRegexSpace: int with get, set

The number of chars that are guaranteed to be visible to a regular expression when it is matched by Match (assuming there are enough chars remaining in the stream).

The value must be non‐negative and not greater than BlockOverlap. The default value is 2/3 of BlockOverlap.

If the CharStream is constructed from a string, char array or char pointer or has only 1 block, then this value has no relevance and calling the property setter has no effect. (No Low‐Trust version CharStream instance has more than 1 block.)

The MinRegexSpace value is not recorded in CharStreamState instances and setting its value does not affect the StateTag.

An ArgumentOutOfRangeException is thrown if you try to set the property on a multi‐block CharStream instance to a negative value or a value larger than the BlockOverlap.

member RegisterNewline: unit -> bool

Registers a newline (an end‐of‐line character) at the previous stream char, i.e. increments the Line value by 1 and sets the LineBegin to Index.

The previous LineBegin value must not equal Index. (For performance reasons this condition is only checked by an assert check in the debug build).

This method also increments the StateTag by 1.

member RegisterNewlines: lineOffset: int -> newColumnMinus1: int -> bool

Increments the Line value by lineOffset and sets the LineBegin value to Index - newColumnMinus1 (so that the Column value becomes newColumnMinus1 + 1).

The lineOffset must not be 0, the new Line value must be greater than 0 and and the new LineBegin value must be different from the previous one. (For performance reasons these conditions are only checked by assert checks in the debug build).

This method also increments the StateTag by 1.

member RegisterNewlines: lineOffset: int64 -> newColumnMinus1: int64 -> bool

This method is a variant of RegisterNewlines for int64 arguments.

member Skip: unit -> unit

Advances the position within the stream by 1 char, except at the end of the stream, where it does nothing.

When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag.

This method may throw any of the I/O related exceptions detailed above.

member Skip: utf16Offset: int -> unit

Advances the position within the stream by utf16Offset chars.

The new position within the stream will be max(Index + utf16Offset, IndexOfLastCharPlus1). This means you can’t move past the end of the stream, because any position beyond the last char in the stream is interpreted as precisely one char beyond the last char.

An ArgumentOutOfRangeException is thrown if the new position would lie before the beginning of the CharStream, i.e. if the new index would be less than IndexOfFirstChar. This method may also throw any of the I/O related exceptions detailed above.

When this method changes the stream position, it increments the StateTag by 1. When it does not change the position (because the given offset is 0 or because the stream has already reached the end and the offset is positive), it may or may not increment the StateTag by 1.

member Skip: utf16Offset: uint32 -> unit

This method is an optimized implementation of Skip for uint32 offsets.

member Skip: utf16Offset: int64 -> unit

This method is a variant of Skip for int64 offsets.

member SkipAndPeek: unit -> char

c <- SkipAndPeek() is an optimized implementation of Skip(); c <- Peek().

member SkipAndPeek: utf16Offset: int -> char

c <- SkipAndPeek(utf16Offset) is an optimized implementation of Skip(utf16Offset); c <- Peek(), with the following exception for negative offsets n:
If the new position would lie before the beginning of the CharStream, i.e. if the new index would be less than IndexOfFirstChar, then SkipAndPeek(n) does not throw an exception like stream.Skip(n) would do. Instead it sets the position of the stream to IndexOfFirstChar and returns the EndOfStreamChar ('\uFFFF').

member SkipAndPeek: utf16Offset: uint32 -> char

c <- SkipAndPeek(utf16Offset) is an optimized implementation of Skip(utf16Offset); c <- Peek().

member Skip: char -> bool

Skips over the next char in the stream if this char matches the passed argument char. Returns true if the chars match; otherwise, false. At the end of the stream this method always returns false.

When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag.

This method may throw any of the I/O related exceptions detailed above.

member Skip: TwoChars -> bool

Skips over the next two chars in the stream if these chars match the two chars in the passed TwoChars value. Returns true if the chars match.

If not both chars match or if there are less than 2 chars remaining in the stream, no char is skipped and false is returned.

When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag.

This method may throw any of the I/O related exceptions detailed above.

member Skip: chars: string -> bool

Skips over the next chars.Length chars in the stream if these chars match the passed string chars. Returns true if the chars match.

If not all the chars match or if there are not enough chars remaining in the stream, no char is skipped and false is returned. If chars is empty, true is returned. chars must not be null.

When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag, except if chars is empty, in which case it may or may not increment the StateTag by 1.

This method may throw any of the I/O related exceptions detailed above.

member Skip: chars: char[] * charsIndex: int * length: int -> bool

Skips over the next length chars in the stream if these chars match the chars in the passed array chars at the indices charIndex to charsIndex + length - 1. Returns true if the chars match.

If not all the chars match or if there are not enough chars remaining in the stream, false is returned and the position within the CharStream is not changed. If length is 0, true is returned. chars must not be null.

When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag, except if length is 0, in which case it may or may not increment the StateTag by 1.

An ArgumentOutOfRangeException is thrown if the arguments do not satisfy the following conditions: charsIndex ≥ 0, length ≥ 0 and charsIndex + lengthchars.Length. This method may also throw any of the I/O related exceptions detailed above.

member Skip: chars: NativePtr<char> * length: int -> bool

This method is not available in the Low‐Trust version of FParsec.

Skips over the next length chars in the stream if these chars match the chars at the pointer address chars. Returns true if the chars match.

If not all the chars match or if there are not enough chars remaining in the stream, false is returned and the position within the CharStream is not changed. If length is 0, true is returned.

When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag, except if length is 0, in which case it may or may not increment the StateTag by 1.

If length is negative, an ArgumentOutOfRangeException is thrown. This method may also throw any of the I/O related exceptions detailed above.

member SkipCaseFolded: caseFoldedChar: char -> bool

Behaves like Skip(caseFoldedChar), except that the next char in the stream is case‐folded before it is compared with caseFoldedChar.

Note

While the char in the stream is case‐folded before it is matched, the char caseFoldedChar is assumed to already be case‐folded (e.g. with the help of FParsec.Text.FoldCase). Please also see the above remarks on case‐insensitive matching.

member SkipCaseFolded: caseFoldedChars: string -> bool

Behaves like Skip(caseFoldedChars), except that the chars in the stream are case‐folded before they are compared with caseFoldedChars.

Note

While the chars in the CharStream are case‐folded before they are matched, the chars in the string argument caseFoldedChars are assumed to already be case‐folded (e.g. with the help of FParsec.Text.FoldCase). Please also see the above remarks on case‐insensitive matching.

member SkipCaseFolded: caseFoldedChars: NativePtr<char> * length:int -> bool

This method is not available in the Low‐Trust version of FParsec.

Behaves like Skip(caseFoldedChars), except that the chars in the stream are case‐folded before they are compared with the chars at the pointer address caseFoldedChars.

Note

While the chars in the CharStream are case‐folded before they are matched, the chars at the pointer address caseFoldedChars are assumed to already be case‐folded (e.g. with the help of FParsec.Text.FoldCase). Please also see the above remarks on case‐insensitive matching.

member Read: unit -> char

Skips over the next char in the stream. Returns the skipped char.

At the end of the stream Read() does not change the stream position and returns the EndOfStreamChar ('\uFFFF').

When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag.

This method may throw any of the I/O related exceptions detailed above.

member Read: length: int -> string

Skips over the next length chars in the stream. Returns the skipped chars as a string.

If less than length chars are remaining in the stream, only the remaining chars are skipped and returned.

When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag, except if length is 0, in which case it may or may not increment the StateTag by 1.

If length is negative, an ArgumentOutOfRangeException is thrown. This method may also throw any of the I/O related exceptions detailed above.

member Read: buffer: char[] * bufferIndex: int * length: int -> int

Skips over the next length stream chars and copies the skipped chars into buffer. Returns the number of copied and skipped chars.

The chars are written into buffer beginning at the index bufferIndex. If less than length chars are remaining in the stream, only the remaining chars are copied and skipped.

When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag, except if length is 0, in which case it may or may not increment the StateTag by 1.

An ArgumentOutOfRangeException is thrown if the arguments do not satisfy the following conditions: bufferIndex ≥ 0, length ≥ 0 and bufferIndex + lengthbuffer.Length. This method may also throw any of the I/O related exceptions detailed above.

member Read: buffer: NativePtr<char> * length: int -> int

This method is not available in the Low‐Trust version of FParsec.

Skips over the next length stream chars and copies the skipped chars into the buffer at the given pointer address. Returns the number of copied and skipped chars.

If less than length chars are remaining in the stream, only the remaining chars are copied and skipped.

When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag, except if length is 0, in which case it may or may not increment the StateTag by 1.

If length is negative, an ArgumentOutOfRangeException is thrown. This method may also throw any of the I/O related exceptions detailed above.

member ReadFrom: indexOfFirstChar: CharStreamIndexToken -> string

Returns a string with the chars between the stream index indexOfFirstChar (inclusive) and the current Index of the stream (exclusive).

This method trows

It may also throw any of the I/O related exceptions detailed above.

Note

You may only pass CharStreamToken values that were retrieved from the CharStream instance on which you’re calling ReadFrom. Passing a CharStreamToken value that was created for another CharStream instance triggers an assert exception in debug builds and will otherwise lead to undefined behaviour.

member SkipWhitespace: unit -> bool

Skips over any sequence of space (' '), tab ('\t') or newline ('\r', '\n') chars. Returns true if it skips at least one char, otherwise false.

This method registers any skipped standard newline ("\n", "\r\n" or "\r").

When this method skips at least one char, it increments the StateTag by 1; otherwise, it does not change the StateTag.

This method may throw any of the I/O related exceptions detailed above.

member SkipUnicodeWhitespace: unit -> bool

Skips over any sequence of unicode whitespace chars (as identified by System.Char.IsWhiteSpace). Returns true if it skips at least one char, otherwise false.

This method registers any skipped unicode newline ("\n", "\r\n", "\r", "\u0085", "\u000C", "\u2028" or "\u2029").

Note

This method recognizes the form feed char '\f' ('\u000C') as a Unicode whitespace character, but not as a newline character.

When this method skips at least one char, it increments the StateTag by 1; otherwise, it does not change the StateTag.

This method may throw any of the I/O related exceptions detailed above.

member SkipNewline: unit -> bool

Skips over a standard newline ("\n", "\r\n" or "\r"). Returns true if a newline is skipped, otherwise false.

When this method skips a newline, it also registers it.

When this method skips a newline, it increments the StateTag by 1, otherwise it does not change the StateTag.

This method may throw any of the I/O related exceptions detailed above.

member SkipUnicodeNewline: unit -> bool

Skips over a unicode newline ("\n", "\r\n", "\r", "\u0085", "\u2028", or "\u2029"). Returns true if a newline is skipped, otherwise false.

Note

This method does not recognize the form feed char '\f' ('\u000C') as a newline character.

When this method skips a newline, it also registers it.

When this method skips a newline, it increments the StateTag by 1, otherwise it does not change the StateTag.

This method may throw any of the I/O related exceptions detailed above.

member SkipNewlineThenWhitespace:
    powerOf2TabStopDistance: int * allowFormFeed: bool -> int

Skips over a newline ("\n", "\r\n" or "\r") followed by any (possibly empty) sequence of whitespace chars (' ', '\t', '\r', '\n' and optionally '\f').

If this method skips no chars because the next stream char is no newline char, it returns ‒1. Otherwise it returns the indentation of the first line with non‐whitespace characters.

The indentation is calculated as follows:

  • Any newline char ('\r' or '\n') or form feed char ('\f') resets the indentation to 0.
  • Any space char (' ') increments the indentation by 1.
  • Any tab char ('\t') increments the indentation by
    powerOf2TabStopDistance ‐ (indentation modulo powerOf2TabStopDistance).

The maximum indentation is 231 ‐ 1. If skipping a whitespace char would cause the indentation to overflow, the char is not skipped and the method returns the indentation up to that char.

An ArgumentOutOfRangeException is thrown if powerOf2TabStopDistance is not a positive power of 2.

The value of the allowFormFeed argument determines whether this method accepts the form feed char '\f' as a whitespace char.

This method registers all skipped standard newlines ("\n", "\r\n" or "\r").

When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag.

This method may throw any of the I/O related exceptions detailed above.

member SkipRestOfLine: skipNewline: bool -> unit

Skips over any chars before the next newline ("\n", "\r\n" or "\r") or the end of the stream. If skipNewline is true and a newline is present, the newline is also skipped.

When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag.

This method may throw any of the I/O related exceptions detailed above.

member ReadRestOfLine: skipNewline: bool -> string

ReadRestOfLine(skipNewline) behaves like SkipRestOfLine(skipNewline), except that it returns a string with the skipped chars (without a newline).

member ReadCharOrNewline: unit -> char

Skips over any single char or standard newline ("\n", "\r\n" or "\r").

This method returns '\n' when it skips a newline. Otherwise, it returns the skipped char, except at the end of the stream, where it returns the EndOfStreamChar ('\uffff').

When this method skips a newline, it also registers it.

When this method skips a char or newline, it increments the StateTag by 1; otherwise, it does not change the StateTag.

This method may throw any of the I/O related exceptions detailed above.

member SkipCharsOrNewlines: maxCount: int -> int

Skips over up to maxCount chars. Returns the number of skipped chars.

The number of actually skipped chars is less than maxCount if the end of the stream is reached after less than maxCount chars.

This method counts standard newlines ("\n", "\r\n" or "\r") as single chars. When this method skips a newline, it also registers it.

When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag.

An ArgumentOutOfRangeException is thrown if maxCount is negative. This method may also throw any of the I/O related exceptions detailed above.

member ReadCharsOrNewlines: maxCount: int * normalizeNewlines: bool -> string

Behaves like SkipCharsOrNewlines(maxCount), except that it returns a string with the skipped chars.

The normalizeNewlines parameter determines whether all newlines ("\n", "\r\n" or "\r") in the returned string are normalized to '\n' or whether they are preserved in the original form they are encountered in the input.

member SkipCharsOrNewlinesWhile:
    predicate: (char -> bool) -> int

Skips over a sequence of chars that satisfy the predicate function. Stops at the first char for which predicate returns false. Returns the number of skipped chars.

This method counts standard newlines ("\n", "\r\n" or "\r") as single chars and passes them to the predicate function as single '\n' chars. When this method skips a newline, it also registers it.

When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag.

Caution

The predicate function must not access the CharStream instance itself, because SkipCharsOrNewlinesWhile relies on predicate not having any side‐effect on the internal state of the stream.

This method may throw any of the I/O related exceptions detailed above.

member SkipCharsOrNewlinesWhile:
    predicateForFirstChar: (char -> bool) * predicate: (char -> bool) -> int

Behaves like SkipCharsOrNewlinesWhile(predicate), except that the first char to be skipped must satisfy predicateForFirstChar instead of predicate.

member SkipCharsOrNewlinesWhile:
    predicate: (char -> bool) * minCount: int * maxCount: int -> int

Skips over a sequence of up to maxCount chars that satisfy the predicate function, but backtracks to the start if it can only skip less than minCount chars. Returns the number of skipped chars.

This method counts standard newlines ("\n", "\r\n" or "\r") as single chars and passes them to the predicate function as single '\n' chars. When this method skips a newline, it also registers it.

An ArgumentOutOfRangeException is thrown if maxCount is negative. This method may also throw any of the I/O related exceptions detailed above.

Caution

The predicate function must not access the CharStream instance itself, because SkipCharsOrNewlinesWhile relies on predicate not having any side‐effect on the internal state of the stream.

member SkipCharsOrNewlinesWhile:
    predicateForFirstChar: (char -> bool) * predicate: (char -> bool)
  * minCount: int * maxCount: int -> int

Behaves like SkipCharsOrNewlinesWhile(predicate, nMin, nMax), except that the first char to be skipped must satisfy predicateForFirstChar instead of predicate.

member ReadCharsOrNewlinesWhile:
    predicate: (char -> bool)
  * normalizeNewlines: bool -> string

Behaves like SkipCharsOrNewlinesWhile(predicate), except that it returns a string with the skipped chars.

The normalizeNewlines parameter determines whether all newlines ("\n", "\r\n" or "\r") in the returned string are normalized to '\n' or whether they are preserved in the original form they are encountered in the input.

member ReadCharsOrNewlinesWhile:
    predicateForFirstChar: (char -> bool) * predicate: (char -> bool)
  * normalizeNewlines: bool -> string

Behaves like ReadCharsOrNewlinesWhile(predicate, normalizeNewlines), except that the first char to be skipped must satisfy predicateForFirstChar instead of predicate.

member ReadCharsOrNewlinesWhile:
    predicate: (char -> bool)
  * minCount: int * maxCount: int * normalizeNewlines: bool -> string

Behaves like SkipCharsOrNewlinesWhile(predicate, minCount, maxCount), except that it returns a string with the skipped chars.

The normalizeNewlines parameter determines whether all newlines ("\n", "\r\n" or "\r") in the returned string are normalized to '\n' or whether they are preserved in the original form they are encountered in the input.

member ReadCharsOrNewlinesWhile:
    predicateForFirstChar: (char -> bool) * predicate: (char -> bool)
  * minCount: int * maxCount: int * normalizeNewlines: bool -> string

Behaves like ReadCharsOrNewlinesWhile(predicate, minCount, maxCount, normalizeNewlines), except that the first char to be skipped must satisfy predicateForFirstChar instead of predicate.

member SkipCharsOrNewlinesUntilString:
    str: string * maxCount: int
  * foundString: out<bool> -> int

Skips over all stream chars before the first occurrence of the specified string or the end of the stream, but not over more than maxCount chars. Assigns true to the output parameter if the string is found, otherwise false.

This method registers skipped newlines ("\n", "\r\n" or "\r") and counts them as single chars. However, no newline normalization takes place when the argument string str is matched with the stream chars. Hence, str should either contain no newlines or only in the form they occur in the stream. If str starts with '\n', then SkipCharsOrNewlinesUntilString will not find occurences of str in the stream that start in the middle of an "\r\n" newline.

When this method changes the stream position, it increments the StateTag by 1; otherwise, it does not change the StateTag.

This method throws

  • an ArgumentException, if the string argument is empty, and
  • an ArgumentOutRangeException, if nMax is negative.

It may also throw any of the I/O related exceptions detailed above.

member SkipCharsOrNewlinesUntilString:
    str: string * maxCount: int * normalizeNewlines: bool
  * skippedCharsIfStringFoundOtherwiseNull: out<string> -> int

Behaves like SkipCharsOrNewlinesUntilString(str, maxCount, outBool), except that its output parameter is a string instead of a boolean. If str is found, a string with the skipped chars is assigned to this output parameter; otherwise, null is assigned to the output parameter.

The normalizeNewlines parameter determines whether all newlines ("\n", "\r\n" or "\r") in the output string are normalized to '\n' or are preserved in the original form they are encountered in the input.

member SkipCharsOrNewlinesUntilCaseFoldedString:
    caseFoldedString: string * maxCount: int
  * foundString: out<bool> -> int

Behaves like SkipCharsOrNewlinesUntilString(caseFoldedString, maxCount, foundString), except that the chars in the stream are case‐folded before they are compared with caseFoldedChars.

Note

While the chars in the CharStream are case‐folded before they are matched, the chars in the string argument caseFoldedString are assumed to already be case‐folded (e.g. with the help of FParsec.Text.FoldCase). Please also see the above remarks on case‐insensitive matching.

member SkipCharsOrNewlinesUntilCaseFoldedString:
    caseFoldedString: string * maxCount: int * normalizeNewlines: bool
  * skippedCharsIfStringFoundOtherwiseNull: out<string> -> int

Behaves like SkipCharsOrNewlinesUntilString(caseFoldedString, maxCount, normalizeNewlines, skippedCharsIfStringFoundOtherwiseNull), except that the chars in the stream are case‐folded before they are compared with caseFoldedChars.

Note

While the chars in the stream are case‐folded before they are matched, the chars in the string argument caseFoldedString are assumed to already be case‐folded (e.g. with the help of FParsec.Text.FoldCase). Please also see the above remarks on case‐insensitive matching.

6.11.2 CharStream<TUserState>

Provides read‐access to a sequence of UTF‐16 chars.

6.11.2.1 Interface

[<Sealed>]
type CharStream<'TUserState> =
  inherit CharStream

  // has the same constructors as CharStream

  member UserState: 'TUserState with get, set

  member State: CharStreamState<'TUserState>

  member BacktrackTo: CharStreamState<'TUserState> -> unit

  member ReadFrom:
      stateWhereStringBegins: CharStreamState<'TUserState>
    * normalizeNewlines: bool
   -> string

  member CreateSubstream<'TSubStreamUserState>:
      stateWhereSubstreamBegins: CharStreamState<'TUserState>
   -> CharStream<'TSubStreamUserState>

6.11.2.2 Remarks

The CharStream<'TUserState> class adds a user definable state component to its base class CharStream.

The user state is accessible through the property UserState. It has the type 'TUserState.

You can retrieve a snapshot of the complete stream state, including the user state, from the State property. The value returned from the State property has the type CharStreamState<'TUserState>. You can pass a CharStreamState value to the BacktrackTo method in order to restore a previous state of the CharStream.

Important

'TUserState must be an immutable type or at least be treated as an immutable type if you want BacktrackTo to completely restore old values of the user state. Hence, when you need to change the user state, you should set a new 'TUserState value to the UserState property of the CharStream instance, not mutate the existing 'TUserState value.

6.11.2.3 Members

member UserState: 'TUserState with get, set

The current user state value.

Setting the UserState value increments the StateTag by 1, independent of whether the new value is different from the previous one.

member State: CharStreamState<'TUserState>

Returns a snapshot of the current StateTag, Index, Line, LineBegin, Name, and UserState values in the form of an immutable CharStreamState value.

member BacktrackTo: CharStreamState<'TUserState> -> unit

Restores the stream to the state represented by the given CharStreamState value.

For example:

fun (stream: CharStream<'u>) ->
    let state = stream.State
    // ... (do something with stream that might change the state)
    stream.BacktrackTo(state) // restores stream to previous state
    // ...

This method throws an ArgumentException if the CharStreamState instance is zero‐initialized (i.e. constructed with the default value type constructor). It may also throw any of the I/O related exceptions detailed above.

Note

You may only pass CharStreamState values that were retrieved from the CharStream instance on which you’re calling BacktrackTo. Passing a CharStreamState value that was created for another CharStream instance triggers an assert exception in debug builds and will otherwise lead to undefined behaviour.

member ReadFrom:
    stateWhereStringBegins: CharStreamState<'TUserState>
  * normalizeNewlines: bool
 -> string

Returns a string with the chars between the index of the stateWhereStringBegins (inclusive) and the current Index of the stream (exclusive).

The normalizeNewlines parameter determines whether all newlines ("\n", "\r\n" or "\r") in the returned string are normalized to '\n' or whether they are preserved in the original form they are encountered in the input. (If stateWhereStringBegins.Line equals the current Line, this method will never normalize any newlines in the returned string.)

This method trows

It may also throw any of the I/O related exceptions detailed above.

Note

You may only pass CharStreamState values that were retrieved from the CharStream instance on which you’re calling ReadFrom. Passing a CharStreamState value that was created for another CharStream instance triggers an assert exception in debug builds and will otherwise lead to undefined behaviour.

member CreateSubstream<'TSubStreamUserState>:
    stateWhereSubstreamBegins: CharStreamState<'TUserState>
 -> CharStream<'TSubStreamUserState>

Creates a new CharStream<'TUserState> instance with the stream chars between the index of the stateWhereSubstreamBegins (inclusive) and the current Index of the stream (exclusive).

The state of the substream is initialized to stateWhereSubstreamBegin, so that the stream and the substream will report the same position (Index, Line, LineBegin and Name) for corresponding chars. However, the beginning and end will normally differ between stream and substream, in particular the IndexOfFirstChar and IndexOfLastCharPlus1 values will normally differ between stream and substream.

An example:

open FParsec
open FParsec.Primitives
open FParsec.CharParsers
open FParsec.Error

let embeddedBlock (beginDelim: string) (endDelim: string) : Parser<_,_> =
  let expectedEmbeddedBlock = expected "embedded block"
  fun stream ->
    if stream.Skip(beginDelim) then
      let stateAtBegin = stream.State
      let mutable foundString = false
      let maxChars = System.Int32.MaxValue
      stream.SkipCharsOrNewlinesUntilString(endDelim, maxChars, &foundString)
      |> ignore
      if foundString then
        // create substream with content between beginDelim and endDelim
        use substream = stream.CreateSubstream<unit>(stateAtBegin)
        // here we would normally work with the substream,
        // in this example we will just extract the string content
        let str = substream.ReadCharsOrNewlines(System.Int32.MaxValue, true)
        Reply(str)
      else
        Reply(Error, expectedString endDelim)
    else
      Reply(Error, expectedEmbeddedBlock)
> run (embeddedBlock "/*" "*/") "/*substream content*/";;
val it : ParserResult<string,unit> = Success: "substream content"
Note

This note does not apply to the Low‐Trust version of FParsec.
If you create a substream for a CharStream instance with more than one block, the content of the substream needs to be copied. Thus, you can minimize the overhead associated with creating a substream by ensuring that the CharStream has only one block, either by choosing a sufficiently large blockSize, or by creating the CharStream from a string or char buffer.

You may use a stream and its substreams concurrently. However, notice the following warning:

Caution

This note does not apply to the Low‐Trust version of FParsec.
You may not dispose a stream before all of its substreams are disposed. Disposing a stream before all its substreams are disposed triggers an assert exception in debug builds and otherwise lead to undefined behaviour.

This method trows

It may also throw any of the I/O related exceptions detailed above.

Note

You may only pass CharStreamState values that were retrieved from the CharStream instance on which you’re calling CreateSubstream. Passing a CharStreamState value that was created for another CharStream instance triggers an assert exception in debug builds and will otherwise lead to undefined behaviour.

6.11.3 CharStreamIndexToken

An opaque representation of a CharStream char index.

type CharStreamIndexToken = struct
    member GetIndex: CharStream -> int64
end

CharStream methods can handle CharStreamIndexToken values more efficiently than integer char indices.

You can retrieve CharStreamIndexToken values from the CharStream.IndexToken and CharStreamState<_>.IndexToken properties.

You can get the char index corresponding to a given CharStreamIndexToken value by calling its GetIndex method with the CharStream instance from which the token was retrieved.

Zero‐initialized CharStreamIndexToken values constructed with the default value type constructor are not valid and trying to call a CharStream method with such an instance will trigger an exception.

Note

A CharStreamIndexToken instance may only be used together with the CharSteam instance it was created for.

member GetIndex: CharStream -> int64

Returns the stream index represented by the CharStreamIndexToken instance.

The CharStream instance passed as the argument must be the CharStream instance from which the CharStreamIndexToken was retrieved. Passing a different CharStream instance triggers an assert exception in debug builds and will otherwise lead to undefined behaviour.

An InvalidOperationException is thrown if the CharStreamIndexToken value is zero‐initialized (i.e. constructed with the default value type constructor).

6.11.4 CharStreamState

An immutable value type representation of the state of a CharStream.

type CharStreamState<'TUserState> = struct
    member Tag: int64
    member IndexToken: CharStreamIndexToken
    member Line: int64
    member LineBegin: int64
    member Name: string
    member UserState: 'TUserState

    member GetIndex:    CharStream<'TUserState> -> int64
    member GetPosition: CharStream<'TUserState> -> Position
end

You can retrieve CharStreamState values from the CharStream<_>.State property. By passing a CharStreamState value to the BacktrackTo method of a CharStream<_> instance, you can restore the stream to the state represented by the CharStreamState value.

Zero‐initialized CharStreamState values constructed with the default value type constructor are not valid and trying to call a CharStream method with such an instance will trigger an exception.

Note

A CharStreamState instance may only be used together with the CharSteam instance it was created for.

member GetIndex: CharStream<'TUserState> -> int64

state.GetIndex(stream) is an optimized implementation of state.IndexToken.GetIndex(stream).

The CharStream<'TUserState> instance passed as the argument must be the CharStream instance from which the CharStreamState was retrieved. Passing a different CharStream instance triggers an assert exception in debug builds and will otherwise lead to undefined behaviour.

An InvalidOperationException is thrown if the CharStreamState instance is zero‐initialized (i.e. constructed with the default value type constructor).

member GetPosition: CharStream<'TUserState> -> Position

state.GetPosition(stream) is an optimized implementation of new Position(state.Name, state.GetIndex(stream), state.Line, state.Column).

The CharStream<'TUserState> instance passed as the argument must be the CharStream instance from which the CharStreamState was retrieved. Passing a different CharStream instance triggers an assert exception in debug builds and will otherwise lead to undefined behaviour.

An InvalidOperationException is thrown if the CharStreamState instance is zero‐initialized (i.e. constructed with the default value type constructor).

6.11.5 TwoChars

An immutable value type representation of two chars:

type TwoChars = struct
    new: char0: char * char1: char -> TwoChars
    val Char0: char
    val Char1: char
end
Footnotes:
[1] The detection of invalid byte sequences by the .NET decoders is not entirely reliable. For example, System.Text.UnicodeEncoding (UTF‐16) has an alignment related bug in .NET versions prior to 4.0 that sometimes leads to invalid surrogate pairs not being detected. The implementations of more complicated encodings, like GB18030, ISO‐2022 and ISCII, also have several issues with regard to the detection of invalid input data.