package mechaml

  1. Overview
  2. Docs

Module Mechaml.PageSource

Page

This module contains all the functions used to analyze a page, select specific elements, and manage forms.

Sourcetype t

The type of an html page

Sourceval from_soup : ?location:Uri.t -> Soup.soup Soup.node -> t

Make a new page from a base URI and a Lambdasoup document

Sourceval from_string : ?location:Uri.t -> string -> t

Make a new page from a base URI and an HTML string

Sourceval base_uri : t -> Uri.t

Return the location of a page (or Uri.empty if not specified)

Sourceval resolver : t -> Uri.t -> Uri.t

Return the resolver of page, which is a function that takes relative URIs of the page to absolute ones using the page base URI

Convert to a Lambdasoup HTML node

Lazy sequences

Lambdasoup provides lazy sequences to traverse only needed part of an HTML document when used in combination with with_stop. We provide a wrapper that is compatible with Mechaml types such as forms, images, inputs, etc.

Sourcetype +'a seq

Lazy sequences of HTML elements. See Soup.nodes type

Sourcetype 'a stop = 'a Soup.stop = {
  1. throw : 'b. 'a -> 'b;
}

Soup.stop type

Operations on lazy sequences

Sourceval iter : ('a -> unit) -> 'a seq -> unit
Sourceval fold : ('a -> 'b -> 'a) -> 'a -> 'b seq -> 'a
Sourceval filter : ('a -> bool) -> 'a seq -> 'a seq
Sourceval first : 'a seq -> 'a option
Sourceval nth : int -> 'a seq -> 'a option
Sourceval find_first : ('a -> bool) -> 'a seq -> 'a option
Sourceval to_list : 'a seq -> 'a list
Sourceval with_stop : ('a stop -> 'a) -> 'a

see Lambdasoup's Soup.with_stop

Form

Sourcemodule Form : sig ... end

Operations on forms and inputs

Operations on hypertext links

Sourcemodule Image : sig ... end

Operations on images

Nodes selection

All the following function are built using the same pattern.

  • xxxs (eg forms) return all the elements of a certain type as a lazy sequence. For example, forms mypage will return all the forms in the page
  • xxx_with take a CSS selector as parameter, and return the first element that matches the selector, or None if there isn't any. Eg, link_with "[href$=.jpg]" mypage will try to find a link that point to a JPEG image
  • xxxs_with proceed as the previous one, but return a lazy sequence of all elements matching the selector.
Sourceval form_with : string -> t -> Form.t option
Sourceval forms_with : string -> t -> Form.t seq
Sourceval forms : t -> Form.t seq
Sourceval image_with : string -> t -> Image.t option
Sourceval images_with : string -> t -> Image.t seq
Sourceval images : t -> Image.t seq
OCaml

Innovation. Community. Security.