package gapi-ocaml
Install
Dune Dependency
Authors
Maintainers
Sources
sha256=b84b680528a5e050014103a8e7a60a5d43efd5fefc3f838310bd46769775ab48
md5=8ee26acf1f6c6f5e24c7b57fa070a0a2
doc/gapi-ocaml.netstring-local/Neturl/index.html
Module Neturl
Source
Uniform Resource Locators (URLs) * * Contents * * - Interface * * The tutorial has been moved to Neturl_tut
.
Interface
* * This module provides functions to parse URLs, to print URLs, to * store URLs, to modify URLs, and to apply relative URLs. * * URLs are strings formed according to pattern (1) or (2): * * + scheme://user;userparams:password\@host:port/path;params?query#fragment
* + scheme:other;params?query#fragment
* * The word at the beginning of the URL identifies the URL scheme * (such as "http" or "file"). Depending on the scheme, not all of the * parts are allowed, or parts may be omitted. This module defines the * type url_syntax
whose values describe which parts are allowed/required/ * not allowed for a concrete URL scheme (see below). * * Not all characters are allowed in a URL. Some characters are allowed, * but have the special task to separate the various parts of the URL * (reserved characters). * However, it is possible to include even invalid or reserved characters * as normal content by applying the %
-encoding on these characters: * A '%'
indicates that an encoded character follows, and the character * is denoted by a two-digit hexadecimal number (e.g. %2f
for '/'
). * In the following descriptions, the term "encoded string" means a string * containing such %
-encoded characters, and the "decoded string" means a * string not containing such characters. * See the module Netencoding.Url
for functions encoding or decoding * strings. * * The type url
describes values storing the components of a URL, * and the url_syntax
for the URL. In general, the components are * stored as encoded strings; however, not for all components the * %
-encoding is applicable. * * For convenience, the functions creating, modifying, and accessing * URLs can handle both encoded and decoded strings. In order to * avoid errors, the functions pass strings even in their decoded form. * * Note that there is currently no function to compare URLs. The * canoncical comparison ( =
) is not applicable because the same URL * may be written in different ways. * * Note that nothing is said about the character set/encoding of URLs. * Some protocols and standards prefer UTF-8 as fundamental encoding * and apply the %
-encoding on top of it; i.e. the byte sequence * representing a character in UTF-8 is %
-encoded. * * Standards Compliance * * This module implements RFC 1738 and RFC 1808. There is also a newer * RFC, 2396, updating the former RFCs, but this module is not fully * compatible with RFC 2396. The following (minor) problems may occur: * * - The module escapes more characters than needed. All characters that * are "unsafe" or "reserved" in either RFC document are escaped. * - URL parameters (appended with a ";") are handled as in RFCs 1738/1808. * In RFC 2396, every path component may have parameters, and the * algorithm to resolve relative URLs is different in this point. * If it is required to apply RFC 2396, one can disable URL parameters * in the syntax, and extract them from the path by a self-written * postprocessor. Usually, this is only required for imap
URLs. * * In one point, RFC 2396 is preferred: * * - Authorities may be terminated by a question mark, as in * "http://host?query"
. This is illegal in RFC 1738. The consequence * is, however, that question marks in user strings must be escaped. * * RFC 3986 introduces IPv6 addresses. These are now supported (but see * the comments below).
Raised by a number of functions when encountering a badly formed * URL.
Returns the URL scheme from the string representation of an URL. * E.g. extract_url_scheme "http://host/path" = "http"
. * The scheme name is always converted to lowercase characters. * Raises Malformed_URL
if the scheme name is not found.
type url_syntax = {
url_enable_scheme : url_syntax_option;
url_enable_user : url_syntax_option;
url_enable_user_param : url_syntax_option;
url_enable_password : url_syntax_option;
url_enable_host : url_syntax_option;
url_enable_port : url_syntax_option;
url_enable_path : url_syntax_option;
url_enable_param : url_syntax_option;
url_enable_query : url_syntax_option;
url_enable_fragment : url_syntax_option;
url_enable_other : url_syntax_option;
url_accepts_8bits : bool;
url_is_valid : url -> bool;
url_enable_relative : bool;
}
Values of type url_syntax
describe which components of an URL are * recognized, which are allowed (and optional), and which are required. * Not all combinations are valid; the predicate expressed by the * function url_syntax_is_valid
must hold. * * The function url_is_valid
is applied when a fresh URL is created * and must return true
. This function allows it to add an arbitrary * validity criterion to url_syntax
. (Note that the URL passed to * this function is not fully working; you can safely assume that the * accessor functions url_scheme
etc. can be applied to it.) * * Switch url_accepts_8bit
: If true
, the bytes with code 128 to * 255 are treated like alphanumeric characters; if false
these bytes * are illegal (but it is still possible to include such byte in their * encoded form: %80
to %FF
). * * Switch url_enable_relative
: If true
, the syntax allows relative * URLs in principle. Actually, parsing of relative URLs is possible * when the optional parts are flagged as Url_part_allowed
and not * as Url_part_required
. However, it is useful to specify URL syntaxes * always as absolute URLs, and to weaken them on demand when a relative * URL is found by the parser. This switch enables that. In particular, * the function partial_url_syntax
checks this flag.
Values of type url
describe concrete URLs. Every URL must have * a fundamental url_syntax
, and it is only possible to create URLs * conforming to the syntax. See make_url
for further information.
Checks whether the passed url_syntax
is valid. This means: * - If passwords are recognized, users (and hosts) must be recognized, too * - If ports are recognized, hosts must be recognized, too * - If users are recognized, hosts must be recognized, too * - Either the syntax recognizes one of the phrases * { user, password, host, port, path }, or the syntax recognized * the phrase 'other'.
Transforms the syntax into another syntax where all required parts are * changed into optional parts.
An URL syntax that recognizes nothing. Use this as base for your own * definitions, e.g. *
* let my_syntax = { null_url_syntax with
* url_enable_host = Url_part_required; ... }
*
Syntax for IP based protocols. This syntax allows scheme, user, * password, host, port, path, param, query, fragment, but not "other". * It does not accept 8 bit bytes.
Syntax descriptions for common URL schemes. The key of the hashtable * is the scheme name, and the value is the corresponding syntax. * * - "file"
: scheme, host?, path * - "ftp"
: scheme, user?, password?, host, port?, path?, param? * Note: param is not checked. * - "http"
, "https"
: * scheme, user?, password?, host, port?, path?, query? * - "mailto"
: scheme, other, query? (RFC 2368) * - "pop"
, "pops"
: scheme, user?, user_param?, password?, host, port? * Note: user_param is not checked. * (RFC 2384) * - "imap"
, "imaps"
: scheme, user?, user_param?, password?, host, port?, * path?, query? (RFC 2192) * Note: "param" is intentionally not recognized to get the resolution of * relative URLs as described in the RFC. When analysing this kind of URL, * it is recommended to re-parse it with "param" enabled. * - "news"
: scheme, other (RFC 1738) * - "nntp"
, "nntps"
: scheme, host, port?, path (with two components) * (RFC 1738) * - "data"
: scheme, other (RFC 2397). "other" is not further decomposed. * - "ipp"
, "ipps"
: scheme, host, port? , path?, query? (RFC 3510) * - "cid"
, "mid"
: Content/message identifiers: scheme, other * - "ldap"
: scheme, host?, port?, path?, query? (RFC 4516) * * Notes: * - These syntax descriptions can be weakened for partial/relative URLs * by changing the required parts to optional parts: See the function * partial_url_syntax
. * - None of the descriptions allows fragments. These can be enabled by * setting url_enable_fragment
to Url_part_allowed
. E.g. *
{ file_url_syntax with url_enable_fragment = Url_part_allowed }
* - 8 bit bytes are not accepted * - A large number of standardised scheme syntaxes are not available, * e.g. gopher, prospero, wais. The selection is a bit subjective, * but I have tried to omit protocols that are no longer in common * use, or that are very special. * - The LDAP URL syntax (RFC 1959) does not fit into our scheme, it * is omitted for now because of this.
val make_url :
?encoded:bool ->
?scheme:string ->
?user:string ->
?user_param:string list ->
?password:string ->
?host:string ->
?addr:Unix.inet_addr ->
?port:int ->
?socksymbol:Netsockaddr.socksymbol ->
?path:string list ->
?param:string list ->
?query:string ->
?fragment:string ->
?other:string ->
url_syntax ->
url
Creates a URL from components: * * - The components scheme
and host
are simple strings to which the * %
-encoding is not applicable. host
may be a (DNS) name, an * IPv4 address as "dotted quad", or an IPv6 address enclosed in * brackets. * - addr
also sets host
, but directly from an inet_addr
. * - The component port
is a simple number. Of course, the %
-encoding * is not applicable, too. * - socksymbol
sets both host
and port
from the socksymbol of * type `Inet
or `Inet_byname
. * - The components user
, password
, query
, fragment
, and other
* are strings which may contain %
-encoded characters. By default, * you can pass any string for these components, and problematic characters * are automatically encoded. If you set encoded:true
, the passed * strings must already be encoded, but the function checks whether * the encoding is syntactically correct. * Note that for query
even the characters '?'
and '='
are encoded * by default, so you need to set encoded:true
to pass a reasonable * query string. * - The components user_param
, path
and param
are lists of strings which may * contain %
-encoded characters. Again, the default is to pass * decoded strings to the function, and the function encodes them * automatically, and by setting encoded:true
the caller is responsible * for encoding the strings. Passing empty lists for these components * means that they are not part of the constructed URL. * See below for the respresentation of these components. * * socksymbol
has precedence over addr
, which has precedence over * host
. socksymbol
also has precedence over port
. * * The strings representing the components do not * contain the characters separating the components from each other. * * The created URL must conform to the url_syntax
, i.e.: * - The URL must only contain components which are recognized by the * syntax * - The URL must contain components which are required by the syntax * - The URL must fulfill the predicate expressed by the url_is_valid
* function of the syntax. * * The path of a URL is represented as a list of '/'
-separated path * components. i.e. * * [ s1; s2; ...; sN ]
represents the path * s1 ^ "/" ^ s2 ^ "/" ^ ... ^ "/" ^ sN
* * As special cases: * - []
is the non-existing path * - [ "" ]
is "/"
* - [ "";"" ]
is illegal * * Except of s1
and sN
, the path components must not be empty strings. * * To avoid ambiguities, it is illegal to create URLs with both relative * paths (s1 <> ""
) and host components. * * Parameters of URLs (param
and user_param
) are components * beginning with ';'
. The list * of parameters is represented as list of strings where the strings * contain the value following ';'
.
val modify_url :
?syntax:url_syntax ->
?encoded:bool ->
?scheme:string ->
?user:string ->
?user_param:string list ->
?password:string ->
?host:string ->
?addr:Unix.inet_addr ->
?port:int ->
?socksymbol:Netsockaddr.socksymbol ->
?path:string list ->
?param:string list ->
?query:string ->
?fragment:string ->
?other:string ->
url ->
url
Modifies the passed components and returns the modified URL. * The modfied URL shares unmodified components with the original * URL.
val remove_from_url :
?scheme:bool ->
?user:bool ->
?user_param:bool ->
?password:bool ->
?host:bool ->
?port:bool ->
?path:bool ->
?param:bool ->
?query:bool ->
?fragment:bool ->
?other:bool ->
url ->
url
Removes the true
components from the URL, and returns the modified * URL. * The modfied URL shares unmodified components with the original * URL.
val default_url :
?encoded:bool ->
?scheme:string ->
?user:string ->
?user_param:string list ->
?password:string ->
?host:string ->
?port:int ->
?path:string list ->
?param:string list ->
?query:string ->
?fragment:string ->
?other:string ->
url ->
url
Adds missing components and returns the modified URL. * The modfied URL shares unmodified components with the original * URL.
val undefault_url :
?scheme:string ->
?user:string ->
?user_param:string list ->
?password:string ->
?host:string ->
?port:int ->
?path:string list ->
?param:string list ->
?query:string ->
?fragment:string ->
?other:string ->
url ->
url
Removes components from the URL if they have the passed value, and * returns the modified URL. * Note: The values must always be passed in encoded form! * The modfied URL shares unmodified components with the original * URL.
Returns the url_syntax
record of a URL.
Parses the passed string according to the passed url_syntax
.
val parse_url :
?schemes:(string, url_syntax) Hashtbl.t ->
?base_syntax:url_syntax ->
?accept_8bits:bool ->
?enable_fragment:bool ->
string ->
url
Parses the string and returns the URL the string represents. * If the URL is absolute (i.e. begins with a scheme like * "http:..."), the syntax will be looked up in schemes
. * If the URL is relative, the base_syntax
will be taken * if passed. Without base_syntax
, relative URLs cannot be * parsed. * *
Escapes some unsafe or "unwise" characters that are commonly used * in URL strings: space, < > { } ^ \\ | and double quotes. * Call this function before parsing the URL to support these * characters. * * If escape_hash
is set, '#' is also escaped. * * Change: Since Ocamlnet-3.4, square brackets are no longer fixed up, * because they have now a legal use to denote IPv6 addresses.
val url_provides :
?scheme:bool ->
?user:bool ->
?user_param:bool ->
?password:bool ->
?host:bool ->
?port:bool ->
?path:bool ->
?param:bool ->
?query:bool ->
?fragment:bool ->
?other:bool ->
url ->
bool
Returns true
iff the URL has all of the components passed with * true
value.
Return components of the URL. The functions return decoded strings * unless encoded:true
is set. * If the component does not exist, the exception Not_found
* is raised. * * Note that IPv6 addresses, when returned by url_host
, are enclosed * in square brackets. Modules calling url_host
may require porting * to support this syntax variant.
If the host
part of the URL is an IP address, the address is returned. Works for IPv4 and IPv6 addresses. Otherwise Not_found
is raised.
url_socksymbol url default_port
: Returns the host
and port
parts of the URL as socksymbol
. If the port is missing in the URL, default_port
is substituted. If the host
is missing in the URL the exception Not_found
is raised.
Splits a '/'
-separated path into components (e.g. to set up the * path
argument of make_url
). * E.g. *
* split_path "a/b/c" = [ "a"; "b"; "c" ],
* split_path "/a/b" = [ ""; "a"; "b" ],
* split_path "a/b/" = [ "a"; "b"; "" ]
* Beware that split_path ".."
returns [".."]
while split_path "../"
* returns [".."; ""]
. The two will behave differently, for example * when used with Neturl.apply_relative_url
.
Concatenates the path components (reverse function of split_path).
Removes "."
and ".."
from the path if possible. Deletes double slashes. * * Examples * *
norm_path ["."] = []
* * means: "." = ""norm_path ["."; ""] = []
* * means: "./" = ""norm_path ["a"; "."] = ["a"; ""]
* * means: "a/." = "a/"norm_path ["a"; "b"; "."] = ["a"; "b"; ""]
* * means: "a/b/." = "a/b/"norm_path ["a"; "."; "b"; "."] = ["a"; "b"; ""]
* * means: "a/./b/." = "a/b/"norm_path [".."] = [".."; ""]
* * means: ".." = "../"norm_path [".."; ""] = [".."; ""]
* * means: "../" = "../"norm_path ["a"; "b"; ".."; "c" ] = ["a"; "c"]
* * means: "a/b/../c" = "a/c"norm_path ["a"; "b"; ".."; "c"; ""] = ["a"; "c"; ""]
* * means: "a/b/../c/" = "a/c/"norm_path ["";"";"a";"";"b"] = [""; "a"; "b"]
* * means: "//a//b" = "/a/b"norm_path ["a"; "b"; ""; ".."; "c"; ""] = ["a"; "c"; ""]
* * means: "a/b//../c/" = "a/c/"norm_path ["a"; ".."] = []
* * means: "a/.." = ""
apply_relative_url base rel
: * Interprets rel
relative to base
and returns the new URL. This * function implements RFC 1808. * * It is not necessary that rel
has the same syntax as base
. * Note, however, that it is checked whether the resulting URL is * syntactically correct with the syntax of base
. If not, the * exception Malformed_URL
will be raised. * * Examples (the URLs are represented as strings, see Neturl.split_path
* to split them for Neturl.make_url
): * * base="x/y", url="a/b" => result="x/a/b" * base="x/y/", url="a/b" => result="x/y/a/b" * base="x/y/..", url="a/b" => result="x/y/a/b" (beware!) * base="x/y/../", url="a/b" => result="x/a/b"
If the anonymous URL is absolute, it is just returned as result of * this function. If the URL is relative, it is tried to make it * absolute by resolving it relative to base
. If there is no base
* or if the the base URL does not allow the parts that would be added * (e.g. if the anonymous URL possesses a fragment and base
does not * allow that), this will fail, and the function raises Malformed_URL
.
Generates a URL with "file" scheme from the passed path name. The * URL is always absolute, i.e. the current directory is prepended if the * path is not absolute. * * Note that no character set conversions are performed. * * Win32: The input path name may use forward or backward slashes. * Absolute paths with drive letters and UNC paths are recognised. * Relative paths with drive letters, however, are not recognised * (e.g. "c:file"
), as it is not possible to access the drive-specific * working directory from the O'Caml runtime. * * Cygwin: The input path name may use forward or backward slashes. * Absolute paths with drive letters and UNC paths are recognised. * The former are translated to "/cygdrive"
names. * *
Extracts the path from an absolute file URL, and returns a * correct path name. * * If the URL is not a file URL, or is not absolute, the function will * fail. * * Win32: The URL must either contain a drive letter, or must refer * to another host. * * Cygwin: Drive letters and remote URLs are recognised.