package orsetto
Install
Dune Dependency
Authors
Maintainers
Sources
sha256=151ca6df499bd3de7aa89a4e1627411fbee24c4dea6e0e71ce21f06f181ee654
md5=00393728b481c2bf15919a8202732335
doc/orsetto.cf/Cf_regx/index.html
Module Cf_regx
Regular expression parsing, search and matching.
Overview
This module implements simple regular expression parsing, search and matching in pure Objective Caml for 8-bit extended ASCII text.
Use any of the following constructions in regular expressions:
\n
Matches LF ("newline") character.\t
Matches TAB character.\r
Matches RETURN character.\a
Matches an alphabetical character.\d
Matches a decimal digit character.\i
Matches an alphanumerical character.\s
Matches a TAB, LF, VT, FF, CR or SPACE (whitespace) character.\w
Matches a character other than a whitespace character.\xNN
Matches the character with hexadecimal codeNN
.\DDD
Matches the character with decimal codeDDD
, where DDD is a three digit number between000
and255
.\c_
Matches the control character corresponding to the subsequent printable character, e.g.\cA
is CONTROL-A, and\c[
is ESCAPE..
Matches any character except newline.*
(postfix) Matches the preceding expression, zero, one or several times in sequence.+
(postfix) Matches the preceding expression, one or several times in sequence.?
(postfix) Matches the preceding expression once or not at all.[..]
Character set. Ranges are denoted with'-'
, as in[a-z]
. An initial'^'
, as in[^0-9]
, complements the set. Special characters in the character set syntax may be included in the set by escaping them with a backtick, e.g.[`^```]]
is a set containing three characters: the carat, the backtick and the right bracket characters.(..|..)
Alternatives. Matches one of the expressions between the parentheses, which are separated by vertical bar characters.\_
Escaped special character. The special characters are'\\'
,'.'
,'*'
,'+'
,'?'
,'('
,'|'
,')'
,'['
.
Interface
module DFA : sig ... end
The deterministic finite automata on octet character symbols.
val of_string : string -> t
Use of_string s
to make a regular expression denoted by s
. Raises Invalid_argment
if s
does not denote a valid regular expression.
Use of_chars s
to make a regular expression denoted by the characters in s
. Raises Invalid_argment
if the characters do not denote a valid regular expression.
Use of_dfa_term s
to make a regular expression for recognizing the language term s
.
val test : t -> string -> bool
Use test r s
to test whether r
recognizes s
. Returns true
if all the characters in s
are not rejected and the DFA reaches at least one final state, otherwise returns false
.
val contains : t -> string -> bool
Use contains r s
to test whether r
recognizes any substring of s
.
Use search r s
to search with r
in a confluently persistent sequence s
for the first accepted subsequence. Returns None
if s
does not contain a matching subsequence. Otherwise, returns Some (start, limit)
where start
is the index of the first matching subsequence, and limit
is the index after the end of the longest matching subsequence.
val split : t -> string Cf_slice.t -> string Cf_slice.t Seq.t
Use split r s
to split s
into a sequence of slices comprising the substrings in s
that are separated by disjoint substrings matching r
, which are found by searching from left to right. If r
does not match any substring in s
, then a sequence containing just s
is returned, even if s
is an empty slice.
Use quote s
to make a copy of s
by converting all the special characters into escape sequences.