package orsetto

  1. Overview
  2. Docs
A library of assorted structured data interchange languages

Install

Dune Dependency

Authors

Maintainers

Sources

r1.0.3.tar.gz
sha256=151ca6df499bd3de7aa89a4e1627411fbee24c4dea6e0e71ce21f06f181ee654
md5=00393728b481c2bf15919a8202732335

doc/orsetto.cf/Cf_regx/index.html

Module Cf_regx

Regular expression parsing, search and matching.

Overview

This module implements simple regular expression parsing, search and matching in pure Objective Caml for 8-bit extended ASCII text.

Use any of the following constructions in regular expressions:

  • \n Matches LF ("newline") character.
  • \t Matches TAB character.
  • \r Matches RETURN character.
  • \a Matches an alphabetical character.
  • \d Matches a decimal digit character.
  • \i Matches an alphanumerical character.
  • \s Matches a TAB, LF, VT, FF, CR or SPACE (whitespace) character.
  • \w Matches a character other than a whitespace character.
  • \xNN Matches the character with hexadecimal code NN.
  • \DDD Matches the character with decimal code DDD, where DDD is a three digit number between 000 and 255.
  • \c_ Matches the control character corresponding to the subsequent printable character, e.g. \cA is CONTROL-A, and \c[ is ESCAPE.
  • . Matches any character except newline.
  • * (postfix) Matches the preceding expression, zero, one or several times in sequence.
  • + (postfix) Matches the preceding expression, one or several times in sequence.
  • ? (postfix) Matches the preceding expression once or not at all.
  • [..] Character set. Ranges are denoted with '-', as in [a-z]. An initial '^', as in [^0-9], complements the set. Special characters in the character set syntax may be included in the set by escaping them with a backtick, e.g. [`^```]] is a set containing three characters: the carat, the backtick and the right bracket characters.
  • (..|..) Alternatives. Matches one of the expressions between the parentheses, which are separated by vertical bar characters.
  • \_ Escaped special character. The special characters are '\\', '.', '*', '+', '?', '(', '|', ')', '['.
Interface
module DFA : sig ... end

The deterministic finite automata on octet character symbols.

type t

The type of a compiled regular expression.

val of_string : string -> t

Use of_string s to make a regular expression denoted by s. Raises Invalid_argment if s does not denote a valid regular expression.

val of_chars : char Seq.t -> t

Use of_chars s to make a regular expression denoted by the characters in s. Raises Invalid_argment if the characters do not denote a valid regular expression.

val of_dfa_term : DFA.term -> t

Use of_dfa_term s to make a regular expression for recognizing the language term s.

val test : t -> string -> bool

Use test r s to test whether r recognizes s. Returns true if all the characters in s are not rejected and the DFA reaches at least one final state, otherwise returns false.

val contains : t -> string -> bool

Use contains r s to test whether r recognizes any substring of s.

Use search r s to search with r in a confluently persistent sequence s for the first accepted subsequence. Returns None if s does not contain a matching subsequence. Otherwise, returns Some (start, limit) where start is the index of the first matching subsequence, and limit is the index after the end of the longest matching subsequence.

val split : t -> string Cf_slice.t -> string Cf_slice.t Seq.t

Use split r s to split s into a sequence of slices comprising the substrings in s that are separated by disjoint substrings matching r, which are found by searching from left to right. If r does not match any substring in s, then a sequence containing just s is returned, even if s is an empty slice.

val quote : string -> string

Use quote s to make a copy of s by converting all the special characters into escape sequences.

val unquote : string -> string

Use unquote s to make a copy of s by converting all the escape sequences into ordinary characters.

OCaml

Innovation. Community. Security.