package textutils_kernel
Install
Dune Dependency
Authors
Maintainers
Sources
sha256=acac915dc3240a0888141e89dc9fcc8fe696c25003f331ca0f014bcbeb57fa37
doc/textutils_kernel.utf8_text/Utf8_text/index.html
Module Utf8_text
Source
Text
is text encoded in UTF-8.
Under the hood, this is just a String.t
, but the type is abstract so that the compiler will remind us not to use String.length
when we mean Text.width
.
include Ppx_quickcheck_runtime.Quickcheckable.S with type t := t
The invariant is that t
is a sequence of well-formed UTF-8 code points.
include Core.Container.S0 with type t := t with type elt := Core.Uchar.t
val mem : t -> Core.Uchar.t -> bool
val length : t -> int
val iter : t -> f:(Core.Uchar.t -> unit) -> unit
val fold : t -> init:'accum -> f:('accum -> Core.Uchar.t -> 'accum) -> 'accum
val fold_result :
t ->
init:'accum ->
f:('accum -> Core.Uchar.t -> ('accum, 'e) Base__.Result.t) ->
('accum, 'e) Base__.Result.t
val fold_until :
t ->
init:'accum ->
f:
('accum ->
Core.Uchar.t ->
('accum, 'final) Base__Container_intf.Continue_or_stop.t) ->
finish:('accum -> 'final) ->
'final
val exists : t -> f:(Core.Uchar.t -> bool) -> bool
val for_all : t -> f:(Core.Uchar.t -> bool) -> bool
val count : t -> f:(Core.Uchar.t -> bool) -> int
val sum :
(module Base__Container_intf.Summable with type t = 'sum) ->
t ->
f:(Core.Uchar.t -> 'sum) ->
'sum
val find : t -> f:(Core.Uchar.t -> bool) -> Core.Uchar.t option
val find_map : t -> f:(Core.Uchar.t -> 'a option) -> 'a option
val to_list : t -> Core.Uchar.t list
val to_array : t -> Core.Uchar.t array
val min_elt :
t ->
compare:(Core.Uchar.t -> Core.Uchar.t -> int) ->
Core.Uchar.t option
val max_elt :
t ->
compare:(Core.Uchar.t -> Core.Uchar.t -> int) ->
Core.Uchar.t option
width t
approximates the displayed width of t
.
We incorrectly assume that every code point has the same width. This is better than String.length
for many code points, but doesn't work for double-width characters or combining diacritics.
chunks_of t ~width
splits t
into chunks no wider than width
characters s.t.
t = t |> chunks_of ~width |> concat
. chunks_of
always returns at least one chunk, which may be empty.
If prefer_split_on_spaces = true
and such a space exists, t
will be split on the last U+020 SPACE before the chunk becomes too wide. Otherwise, the split happens exactly at width
characters.
iteri t ~f
calls f index uchar
for every uchar
in t
. index
counts characters, not bytes.