package re2

  1. Overview
  2. Docs
OCaml bindings for RE2, Google's regular expression library

Install

Dune Dependency

Authors

Maintainers

Sources

v0.17.0.tar.gz
sha256=cde2fdedcf38297afb77dafbad3ca2eacee8ac70f84e84e05e88cf32bb1fb0bd

doc/re2/Re2/index.html

Module Re2Source

Although OCaml strings and C++ strings may legally have internal null bytes, this library doesn't handle them correctly by doing conversions via C strings. The failure mode is the search stops early, which isn't bad considering how rare internal null bytes are in practice.

The strings are considered according to Options.encoding which is UTF-8 by default (the alternative is ISO 8859-1).

Basic Types
Sourcetype t
Sourceval sexp_of_t : t -> Sexplib0.Sexp.t
include Core.Comparable.S_plain with type t := t
include Base.Comparable.S with type t := t
include Base.Comparisons.S with type t := t
include Base.Comparisons.Infix with type t := t
Sourceval (>=) : t -> t -> bool
Sourceval (<=) : t -> t -> bool
Sourceval (=) : t -> t -> bool
Sourceval (>) : t -> t -> bool
Sourceval (<) : t -> t -> bool
Sourceval (<>) : t -> t -> bool
Sourceval equal : t -> t -> bool
Sourceval min : t -> t -> t
Sourceval max : t -> t -> t
Sourceval ascending : t -> t -> int

ascending is identical to compare. descending x y = ascending y x. These are intended to be mnemonic when used like List.sort ~compare:ascending and List.sort ~cmp:descending, since they cause the list to be sorted in ascending or descending order, respectively.

Sourceval descending : t -> t -> int
Sourceval between : t -> low:t -> high:t -> bool

between t ~low ~high means low <= t <= high

Sourceval clamp_exn : t -> min:t -> max:t -> t

clamp_exn t ~min ~max returns t', the closest value to t such that between t' ~low:min ~high:max is true.

Raises if not (min <= max).

Sourceval clamp : t -> min:t -> max:t -> t Base.Or_error.t
include Base.Comparator.S with type t := t
type comparator_witness
Sourceval validate_lbound : min:t Core.Maybe_bound.t -> t Validate.check
Sourceval validate_ubound : max:t Core.Maybe_bound.t -> t Validate.check
Sourceval validate_bound : min:t Core.Maybe_bound.t -> max:t Core.Maybe_bound.t -> t Validate.check
Sourcemodule Map : sig ... end
Sourcemodule Set : sig ... end
include Core.Hashable.S_plain with type t := t
include Ppx_compare_lib.Comparable.S with type t := t
Sourceval compare : t -> t -> int
include Ppx_hash_lib.Hashable.S with type t := t
Sourceval hash_fold_t : Base.Hash.state -> t -> Base.Hash.state
Sourceval hashable : t Base.Hashable.t
Sourcemodule Table : sig ... end
Sourcemodule Hash_set : sig ... end
Sourcemodule Hash_queue : sig ... end
Sourcetype regex = t
Sourcetype id_t = [
  1. | `Index of int
  2. | `Name of string
]

Subpatterns are referenced by name if labelled with the /(?P<...>...)/ syntax, or else by counting open-parens, with subpattern zero referring to the whole regex.

Sourceval index_of_id_exn : t -> id_t -> int

index_of_id t id resolves subpattern names and indices into indices. *

The sub keyword argument means, omit location information for subpatterns with index greater than sub.

Subpatterns are indexed by the number of opening parentheses preceding them:

~sub:(`Index 0) : only the whole match ~sub:(`Index 1) : the whole match and the first submatch, etc.

If you only care whether the pattern does match, you can request no location information at all by passing ~sub:(`Index -1).

With one exception, I quote from re2.h:443,

Don't ask for more match information than you will use:
runs much faster with nmatch == 1 than nmatch > 1, and
runs even faster if nmatch == 0.

For sub > 1, re2 executes in three steps: 1. run a DFA over the entire input to get the end of the whole match 2. run a DFA backward from the end position to get the start position 3. run an NFA from the match start to match end to extract submatches sub == 1 lets it stop after (2) and sub == 0 lets it stop after (1). (See re2.cc:692 or so.)

The one exception is for the functions get_matches, replace, and Iterator.next: Since they must iterate correctly through the whole string, they need at least the whole match (subpattern 0). These functions will silently rewrite ~sub to be non-negative.

Sourcemodule Options : sig ... end

See re2_c/libre2/re2/re2.h for documentation of these options.

Sourceval create : ?options:Options.t -> string -> t Core.Or_error.t
Sourceval create_exn : ?options:Options.t -> string -> t
include Core.Stringable with type t := t
Sourceval of_string : string -> t
Sourceval to_string : t -> string
Sourceval num_submatches : t -> int

num_submatches t returns 1 + the number of open-parens in the pattern.

N.B. num_submatches t == 1 + RE2::NumberOfCapturingGroups() because RE2::NumberOfCapturingGroups() ignores the whole match ("subpattern zero").

Sourceval get_named_capturing_groups : t -> Core.Int.t Core.String.Map.t

get_named_capturing_groups t returns a map from names of capturing groups in t to their indices.

Sourceval pattern : t -> string

pattern t returns the pattern from which the regex was constructed.

Sourceval options : t -> Options.t
Sourceval find_all : ?sub:id_t -> t -> string -> string list Core.Or_error.t

find_all t input a convenience function that returns all non-overlapping matches of t against input, in left-to-right order.

If sub is given, and the requested subpattern did not capture, then no match is returned at that position even if other parts of the regex did match.

Sourceval find_all_exn : ?sub:id_t -> t -> string -> string list
Sourceval find_first : ?sub:id_t -> t -> string -> string Core.Or_error.t

find_first ?sub pattern input finds the first match of pattern in input, and returns the subpattern specified by sub, or an error if the subpattern didn't capture.

Sourceval find_first_exn : ?sub:id_t -> t -> string -> string
Sourceval find_submatches : t -> string -> string option array Core.Or_error.t

find_submatches t input finds the first match and returns all submatches. Element 0 is the whole match and element 1 is the first parenthesized submatch, etc.

Sourceval find_submatches_exn : t -> string -> string option array
Sourceval matches : t -> string -> bool

matches pattern input

  • returns

    true iff pattern matches input

Sourceval matches_substring_no_context_exn : t -> string -> pos:int -> len:int -> bool

Same as matches, except it only matches a substring, completely ignoring the surrounding (i.e. treating the substring as if it's the full string). Raises if pos and len specify an invalid range (negative values, or the range is outside the string).

Sourceval split : ?max:int -> ?include_matches:bool -> t -> string -> string list

split pattern input

  • returns

    input broken into pieces where pattern matches. Subpatterns are ignored.

  • parameter max

    (default: unlimited) split only at the leftmost max matches

  • parameter include_matches

    (default: false) include the matched substrings in the returned list (e.g., the regex /[,()]/ on "foo(bar,baz)" gives ["foo"; "("; "bar"; ","; "baz"; ")"] instead of ["foo"; "bar"; "baz"])

If t never matches, the returned list has input as its one element.

Sourceval rewrite : t -> template:string -> string -> string Core.Or_error.t

rewrite pattern ~template input is a convenience function for replace: Instead of requiring an arbitrary transformation as a function, it accepts a template string with zero or more substrings of the form "\\n", each of which will be replaced by submatch n. For every match of pattern against input, the template will be specialized and then substituted for the matched substring.

Sourceval rewrite_exn : t -> template:string -> string -> string
Sourceval valid_rewrite_template : t -> template:string -> bool

valid_rewrite_template pattern ~template returns true iff template is a valid rewrite template for pattern

Sourceval escape : string -> string

escape nonregex returns a copy of nonregex with everything escaped (i.e., if the return value were t to regex, it would match exactly the original input)

Infix Operators
Sourcemodule Infix : sig ... end
Complicated Interface
Sourcetype 'a without_trailing_none
Sourceval sexp_of_without_trailing_none : ('a -> Sexplib0.Sexp.t) -> 'a without_trailing_none -> Sexplib0.Sexp.t
Sourceval without_trailing_none : 'a -> 'a without_trailing_none

This type marks call sites affected by a bugfix that eliminated a trailing None. When you add this wrapper, check that your call site does not still work around the bug by dropping the last element.

Sourcemodule Match : sig ... end
Sourceval get_matches : ?sub:id_t -> ?max:int -> t -> string -> Match.t list Core.Or_error.t

get_matches pattern input returns all non-overlapping matches of pattern against input

  • parameter max

    (default: unlimited) return only the leftmost max matches

  • parameter sub

    (default: all) returned Match.t's will contain only the first sub matches.

Sourceval get_matches_exn : ?sub:id_t -> ?max:int -> t -> string -> Match.t list
Sourceval to_sequence_exn : ?sub:id_t -> t -> string -> Match.t Core.Sequence.t
Sourceval first_match : t -> string -> Match.t Core.Or_error.t

first_match pattern input

  • returns

    the first match iff pattern matches input

Sourceval first_match_exn : t -> string -> Match.t
Sourceval replace : ?sub:id_t -> ?only:int -> f:(Match.t -> string) -> t -> string -> string Core.Or_error.t

replace ?sub ?max ~f pattern input

  • returns

    an edited copy of input with every substring matched by pattern transformed by f.

  • parameter only

    (default: all) replace only the nth match

Sourceval replace_exn : ?sub:id_t -> ?only:int -> f:(Match.t -> string) -> t -> string -> string
Sourcemodule Exceptions : sig ... end
Sourcemodule Multiple : sig ... end
Sourcemodule Stable : sig ... end
Sourcemodule Parser : sig ... end
OCaml

Innovation. Community. Security.