package orsetto

  1. Overview
  2. Docs
A library of assorted structured data interchange languages

Install

Dune Dependency

Authors

Maintainers

Sources

r1.0.3.tar.gz
sha256=151ca6df499bd3de7aa89a4e1627411fbee24c4dea6e0e71ce21f06f181ee654
md5=00393728b481c2bf15919a8202732335

doc/orsetto.ucs/Ucs_regx/index.html

Module Ucs_regx

Regular expression parsing, search and matching with Unicode text.

Overview

This module implements Unicode regular expression parsing, search and matching in pure Objective Caml. Implementation claims support for Requirements Level 1 (Basic Unicode Support) with the following exceptions:

  1. No support for line boundaries.
  2. No support for word boundaries.
  3. No support for case insensitive matching.
  4. Additional support for the Block enumerated property.

At present, there is no support for the Script_Extensions property.

Modules
module DFA : sig ... end

Deterministic finite automata for Unicode code points.

type t

The type of a compiled regular expression.

val of_text : Ucs_text.t -> t

Use of_text s to make a regular expression denoted by s. Raises Invalid_argment if s does not denote a valid regular expression.

val of_uchars : Uchar.t Seq.t -> t

Use of_uchars s to make a regular expression denoted by the Unicode codepoints in s. Raises Invalid_argment if the characters do not denote a valid regular expression.

val of_dfa_term : DFA.term -> t

Use of_dfa_term s to make a regular expression for recognizing the language term s.

val test : t -> Ucs_text.t -> bool

Use test r t to test whether the text t matches the regular expression r.

val contains : t -> Ucs_text.t -> bool

Use contains r t to test whether r recognizes any substring of t.

Use search r s to search with r in a confluently persistent sequence s for the first accepted subsequence. Returns None if s does not contain a matching subsequence. Otherwise, returns Some (start, limit) where start is the index of the first matching subsequence, and limit is the index after the end of the longest matching subsequence.

val split : t -> Ucs_text.t -> Ucs_text.t Seq.t

Use split r s to split s into a sequence of slices comprising the substrings in s that are separated by disjoint substrings matching r, which are found by searching from left to right. If r does not match any substring in s, then a sequence containing just s is returned, even if s is an empty slice.

val quote : Ucs_text.t -> Ucs_text.t

Use quote s to make a copy of s by converting all the special characters into escape sequences.

val unquote : Ucs_text.t -> Ucs_text.t

Use unquote s to make a copy of s by converting all the escape sequences into ordinary characters.

OCaml

Innovation. Community. Security.