package containers
Install
Dune Dependency
Authors
Maintainers
Sources
md5=c54431781906065ead3a089883f94726
sha512=8c954cb5ff01dc73a4e04ce2ef8fb5ad5162f16b2e4b2463501dd60c428d388c03c5b641c62b5876cd0d5ac5dc244d6a44d4d92a18525a03bdbf9cca82321272
doc/containers/CCStringLabels/index.html
Module CCStringLabels
Source
Basic String Utils (Labeled version of CCString
)
Fast internal iterator.
include module type of struct include StringLabels end
Strings
The type for strings.
make n c
is a string of length n
with each index holding the character c
.
init n ~f
is a string of length n
with index i
holding the character f i
(called in increasing index order).
The empty string.
get s i
is the character at index i
in s
. This is the same as writing s.[i]
.
Return a new string that contains the same bytes as the given byte sequence.
Return a new byte sequence that contains the same bytes as the given string.
Concatenating
Note. The Stdlib.(^)
binary operator concatenates two strings.
concat ~sep ss
concatenates the list of strings ss
, inserting the separator string sep
between each.
cat s1 s2
concatenates s1 and s2 (s1 ^ s2
).
Predicates and comparisons
starts_with
~prefix s
is true
if and only if s
starts with prefix
.
ends_with
~suffix s
is true
if and only if s
ends with suffix
.
contains_from s start c
is true
if and only if c
appears in s
after position start
.
rcontains_from s stop c
is true
if and only if c
appears in s
before position stop+1
.
contains s c
is String.contains_from
s 0 c
.
Extracting substrings
sub s ~pos ~len
is a string of length len
, containing the substring of s
that starts at position pos
and has length len
.
Transforming
map f s
is the string resulting from applying f
to all the characters of s
in increasing order.
mapi ~f s
is like map
but the index of the character is also passed to f
.
fold_left f x s
computes f (... (f (f x s.[0]) s.[1]) ...) s.[n-1]
, where n
is the length of the string s
.
fold_right f s x
computes f s.[0] (f s.[1] ( ... (f s.[n-1] x) ...))
, where n
is the length of the string s
.
trim s
is s
without leading and trailing whitespace. Whitespace characters are: ' '
, '\x0C'
(form feed), '\n'
, '\r'
, and '\t'
.
escaped s
is s
with special characters represented by escape sequences, following the lexical conventions of OCaml.
All characters outside the US-ASCII printable range [0x20;0x7E] are escaped, as well as backslash (0x2F) and double-quote (0x22).
The function Scanf.unescaped
is a left inverse of escaped
, i.e. Scanf.unescaped (escaped s) = s
for any string s
(unless escaped s
fails).
Traversing
iteri
is like iter
, but the function is also given the corresponding character index.
Searching
index_from s i c
is the index of the first occurrence of c
in s
after position i
.
index_from_opt s i c
is the index of the first occurrence of c
in s
after position i
(if any).
rindex_from s i c
is the index of the last occurrence of c
in s
before position i+1
.
rindex_from_opt s i c
is the index of the last occurrence of c
in s
before position i+1
(if any).
index s c
is String.index_from
s 0 c
.
index_opt s c
is String.index_from_opt
s 0 c
.
rindex s c
is String.rindex_from
s (length s - 1) c
.
rindex_opt s c
is String.rindex_from_opt
s (length s - 1) c
.
Strings and Sequences
to_seqi s
is like to_seq
but also tuples the corresponding index.
UTF decoding and validations
UTF-8
get_utf_8_uchar b i
decodes an UTF-8 character at index i
in b
.
is_valid_utf_8 b
is true
if and only if b
contains valid UTF-8 data.
UTF-16BE
get_utf_16be_uchar b i
decodes an UTF-16BE character at index i
in b
.
is_valid_utf_16be b
is true
if and only if b
contains valid UTF-16BE data.
UTF-16LE
get_utf_16le_uchar b i
decodes an UTF-16LE character at index i
in b
.
is_valid_utf_16le b
is true
if and only if b
contains valid UTF-16LE data.
Binary decoding of integers
The functions in this section binary decode integers from strings.
All following functions raise Invalid_argument
if the characters needed at index i
to decode the integer are not available.
Little-endian (resp. big-endian) encoding means that least (resp. most) significant bytes are stored first. Big-endian is also known as network byte order. Native-endian encoding is either little-endian or big-endian depending on Sys.big_endian
.
32-bit and 64-bit integers are represented by the int32
and int64
types, which can be interpreted either as signed or unsigned numbers.
8-bit and 16-bit integers are represented by the int
type, which has more bits than the binary encoding. These extra bits are sign-extended (or zero-extended) for functions which decode 8-bit or 16-bit integers and represented them with int
values.
get_uint8 b i
is b
's unsigned 8-bit integer starting at character index i
.
get_int8 b i
is b
's signed 8-bit integer starting at character index i
.
get_uint16_ne b i
is b
's native-endian unsigned 16-bit integer starting at character index i
.
get_uint16_be b i
is b
's big-endian unsigned 16-bit integer starting at character index i
.
get_uint16_le b i
is b
's little-endian unsigned 16-bit integer starting at character index i
.
get_int16_ne b i
is b
's native-endian signed 16-bit integer starting at character index i
.
get_int16_be b i
is b
's big-endian signed 16-bit integer starting at character index i
.
get_int16_le b i
is b
's little-endian signed 16-bit integer starting at character index i
.
get_int32_ne b i
is b
's native-endian 32-bit integer starting at character index i
.
A seeded hash function for strings, with the same output value as Hashtbl.seeded_hash
. This function allows this module to be passed as argument to the functor Hashtbl.MakeSeeded
.
get_int32_be b i
is b
's big-endian 32-bit integer starting at character index i
.
get_int32_le b i
is b
's little-endian 32-bit integer starting at character index i
.
get_int64_ne b i
is b
's native-endian 64-bit integer starting at character index i
.
get_int64_be b i
is b
's big-endian 64-bit integer starting at character index i
.
get_int64_le b i
is b
's little-endian 64-bit integer starting at character index i
.
length s
returns the length (number of characters) of the given string s
.
blit ~src ~src_pos ~dst ~dst_pos ~len
copies len
characters from string src
starting at character indice src_pos
, to the Bytes sequence dst
starting at character indice dst_pos
. Like String.blit
. Compatible with the -safe-string
option.
fold ~f ~init s
folds on chars by increasing index. Computes f(… (f (f init s.[0]) s.[1]) …) s.[n-1]
.
foldi ~f init s
is just like fold
, but it also passes in the index of each chars as second argument to the folded function f
.
Conversions
to_iter s
returns the iter
of characters contained in the string s
.
to_seq s
returns the Seq.t
of characters contained in the string s
. Renamed from to std_seq
since 3.0.
to_list s
returns the list
of characters contained in the string s
.
pp_buf buf s
prints s
to the buffer buf
. Renamed from pp
since 2.0.
pp f s
prints the string s
within quotes to the formatter f
. Renamed from print
since 2.0.
Strings
equal s1 s2
returns true
iff the strings s1
and s2
are equal.
compare s1 s2
compares the strings s1
and s2
and returns an integer that indicates their relative position in the sort order.
is_empty s
returns true
iff s
is empty (i.e. its length is 0).
hash s
returns the hash value of s
.
rev s
returns the reverse of s
.
pad ?side ?c n s
ensures that the string s
is at least n
bytes long, and pads it on the side
with c
if it's not the case.
of_char 'a'
is "a"
.
of_seq seq
converts a seq
of characters to a string. Renamed from of_std_seq
since 3.0.
of_list lc
converts a list of characters lc
to a string.
of_array ac
converts an array of characters ac
to a string.
to_array s
returns the array of characters contained in the string s
.
find ?start ~sub s
returns the starting index of the first occurrence of sub
within s
or -1
.
find_all ?start ~sub s
finds all occurrences of sub
in s
, even overlapping instances and returns them in a generator gen
.
find_all_l ?start ~sub s
finds all occurrences of sub
in s
and returns them in a list.
mem ?start ~sub s
is true
iff sub
is a substring of s
.
rfind ~sub s
finds sub
in string s
from the right, returns its first index or -1
. Should only be used with very small sub
.
replace ?which ~sub ~by s
replaces some occurrences of sub
by by
in s
.
is_sub ~sub ~sub_pos s ~pos ~sub_len
returns true
iff the substring of sub
starting at position sub_pos
and of length sub_len
is a substring of s
starting at position pos
.
repeat s n
creates a string by repeating the string s
n
times.
prefix ~pre s
returns true
iff pre
is a prefix of s
.
suffix ~suf s
returns true
iff suf
is a suffix of s
.
chop_prefix ~pre s
removes pre
from s
if pre
really is a prefix of s
, returns None
otherwise.
chop_suffix ~suf s
removes suf
from s
if suf
really is a suffix of s
, returns None
otherwise.
take n s
keeps only the n
first chars of s
.
drop n s
removes the n
first chars of s
.
take_drop n s
is take n s, drop n s
.
lines s
returns a list of the lines of s
(splits along '\n').
lines_gen s
returns a generator gen
of the lines of s
(splits along '\n').
lines_iter s
returns the iter
of the lines of s
(splits along '\n').
lines_seq s
returns the Seq.t
of the lines of s
(splits along '\n').
concat_iter ~sep iter
concatenates all strings of iter
, separated with sep
.
concat_gen ~sep gen
concatenates all strings of gen
, separated with sep
.
concat_seq ~sep seq
concatenates all strings of seq
, separated with sep
.
unlines ls
concatenates all strings of ls
, separated with '\n'.
unlines_gen gen
concatenates all strings of gen
, separated with '\n'.
unlines_iter iter
concatenates all strings of iter
, separated with '\n'.
unlines_seq seq
concatenates all strings of seq
, separated with '\n'.
set s i c
creates a new string which is a copy of s
, except for index i
, which becomes c
.
iter ~f s
applies function f
on each character of s
. Alias to String.iter
.
filter_map ~f s
calls (f a0) (f a1) … (f an)
where a0 … an
are the characters of s. It returns the string of characters ci
such as f ai = Some ci
(when f
returns None
, the corresponding element of s
is discarded).
filter ~f s
discards characters of s
not satisfying f
.
uniq ~eq s
remove consecutive duplicate characters in s
.
flat_map ?sep ~f s
maps each chars of s
to a string, then concatenates them all.
for_all ~f s
is true
iff all characters of s
satisfy the predicate f
.
exists ~f s
is true
iff some character of s
satisfy the predicate f
.
drop_while ~f s
discards any characters of s
starting from the left, up to the first character c
not satisfying f c
.
rdrop_while ~f s
discards any characters of s
starting from the right, up to the first character c
not satisfying f c
.
ltrim s
trims space on the left (see String.trim
for more details).
rtrim s
trims space on the right (see String.trim
for more details).
Operations on 2 strings
map2 ~f s1 s2
maps pairs of chars.
iter2 ~f s1 s2
iterates on pairs of chars.
iteri2 ~f s1 s2
iterates on pairs of chars with their index.
fold2 ~f ~init s1 s2
folds on pairs of chars.
for_all2 ~f s1 s2
returns true
iff all pairs of chars satisfy the predicate f
.
exists2 ~f s1 s2
returns true
iff a pair of chars satisfy the predicate f
.
Ascii functions
Those functions are deprecated in String
since 4.03, so we provide a stable alias for them even in older versions.
capitalize_ascii s
returns a copy of s
with the first character set to uppercase using the US-ASCII character set. See String
.
uncapitalize_ascii s
returns a copy of s
with the first character set to lowercase using the US-ASCII character set. See String
.
uppercase_ascii s
returns a copy of s
with all lowercase letters translated to uppercase using the US-ASCII character set. See String
.
lowercase_ascii s
returns a copy of s
with all uppercase letters translated to lowercase using the US-ASCII character set. See String
.
equal_caseless s1 s2
compares s1
and s2
without respect to ascii lowercase.
Convert a string with arbitrary content into a hexadecimal string.
Convert a string in hex into a string with arbitrary content.
Finding
A relatively efficient algorithm for finding sub-strings.
Splitting
split_on_char ~by s
splits the string s
along the given char by
.
split ~by s
splits the string s
along the given string by
. Alias to Split.list_cpy
.
Utils
compare_versions s1 s2
compares version strings s1
and s2
, considering that numbers are above text.
compare_natural s1 s2
is the Natural Sort Order, comparing chunks of digits as natural numbers. https://en.wikipedia.org/wiki/Natural_sort_order
edit_distance ?cutoff s1 s2
is the edition distance between the two strings s1
and s2
. This satisfies the classical distance axioms: it is always positive, symmetric, and satisfies the formula distance s1 s2 + distance s2 s3 >= distance s1 s3
.