package pretty_expressive
Install
Dune Dependency
Authors
Maintainers
Sources
sha256=6d3adf12289b5ca838c782b6758982adaf1afa3d70cf8fd2cee0e6753f456b86
sha512=1cf8fb91761ce3adec91f7a14ece8ef4e25661e26b0be199759fe53cd11caf5609639cda9dc89f2dcf76c32cc35302fcb2022184e0cae1c020b0cd940892182e
doc/index.html
The pretty_expressive
library
This library implements a pretty expressive printer, following the algorithm presented in A Pretty Expressive Printer (OOPSLA'23). The pretty printer is expressive, provably optimal, and practically efficient.
Getting Started
open Pretty_expressive
(* Sets the page width limit to 80 *)
let cf = Printer.default_cost_factory ~page_width:80 ()
module P = Printer.Make (val cf)
open P
# pretty_format (text "Hello" ^^ text " World!");;
- : string = "Hello World!"
API Reference
See the documentation for the module Pretty_expressive
.
Pretty Expressive Guide
General-purpose pretty printing is a process that produces human readable text from structured data. Users encode the structured data together with styling choices in an abstract document doc
. This document contains printing instructions: things like text, newlines, and indentation. It can also contain choices (<|>
) between two or more alternatives, resulting in many possible layouts for a document. Pretty_expressive
’s job is to pick a prettiest layout (according to a specified optimality objective) from among all of the choices. E.g., the one that minimizes the number of lines while not exceeding the page width limit.
Here’s a simple example that pretty prints a document encoding a fragment of code.
open Pretty_expressive
(** Prints the example document [d] with the page width limit [w] *)
let print_doc (w : int) =
let cf = Printer.default_cost_factory ~page_width:w () in
let module P = Printer.Make (val cf) in
let open P in
let d = text "while (true) {" ^^
nest 4
(nl ^^ text "f();" ^^ nl ^^ text "if (done())" ^^
(let exit_d = text "exit();" in
(space ^^ exit_d) <|> nest 4 (nl ^^ exit_d))) ^^
nl ^^ text "}"
in
pretty_print print_string d
There is a lot of code above, so let's unpack it.
Document Construction
In the above code, let d = text "while (true) {" ^^ ....
defines a document d
.
text
prints a string.^^
prints a concatenation of two documents.nl
prints a newline.nest
adds indentation level so thatnl
adds indentation spaces.<|>
creates a choice.
As a result, the above document d
encodes two possible layouts:
while (true) { f(); if (done()) exit(); }
and
while (true) { f(); if (done()) exit(); }
Printer Construction
Most pretty printers are parameterized by a page width limit, which indicates the number of characters that each line should not exceed. Pretty_expressive
is instead parameterized by a cost factory, which can control not only the page width limit, but also other aspects of prettiness. For the sake of simplicity, we will for now use a pre-defined cost factory Printer.default_cost_factory
, which only allows the page width limit adjustment through the labeled argument ~page_width
. Thus, let cf = Printer.default_cost_factory ~page_width:w ()
creates a (first-class module) cost factory cf
that sets the page width limit to w
.
The pretty printer can then be instantiated by using Printer.Make
. It is a functor that consumes a CostFactory
module. Here, let module P = Printer.Make (val cf)
creates a pretty printer P
with the above cost factory.
We then let open P
so that combinators like text
, ^^
, and the pretty printing function pretty_print
are in scope.
Putting them all together
With the above setup, we can pretty-print d
with the cost factory cf
by calling pretty_print print_string d
, which uses print_string
to output content.
Let's now actually use the pretty printer with page width limit of 80.
# print_doc 80;;
while (true) {
f();
if (done()) exit();
}
- : unit = ()
It outputs the first layout because the layout fits the page width limit, while having fewer lines compared to the second layout. By contrast, with the page width limit of 20, the second layout is now chosen.
# print_doc 20;;
while (true) {
f();
if (done())
exit();
}
- : unit = ()
This is because the first layout does not fit the page width limit, leaving the second layout as the only option.
Alternative Document Construction
There are many ways to construct a document that encodes the same set of layouts. Some may be easier than the other.
For example, another way to construct a document for the above example could be:
let d = text "while (true) {" ^^
nest 4
(nl ^^ text "f();" ^^ nl ^^ text "if (done())" ^^
group (nest 4 (nl ^^ text "exit();"))) ^^
nl ^^ text "}"
Here, the group
combinator is used. It creates a choice: whether to collapse nl
to a single space or not.
See Printer.Make
for the full list of combinators that we provide. Since combinators are simply regular functions, users may also compose the existing combinators together to create new user-defined combinators.
Explainers
In this section, we explain some important concepts. The full design consideration of Pretty_expressive
can be found in the paper.
Best Practice for Document Construction
While we can put arbitrary sub-documents as the operands of the choice combinator <|>
, we should share sub-documents across choices. This matters because the performance of Pretty_expressive
depends on the DAG size of the input document. Without sharing, the input document would get unfolded into a tree, whose size could be exponentially large compared to the possible DAG size.
As an example, say we want to pretty print an S-expression with three possible styles for each list node: the horizontal style, the vertical style, and the argument list style. That is:
(a b c d)
could be printed as itself or
(a b c d)
or
(a b c d)
We can construct a function to convert an S-expression to a document as follows:
type sexp = Atom of string | List of sexp list
let print_sexp (s : sexp) (w : int) =
let cf = Printer.default_cost_factory ~page_width:w () in
let module P = Printer.Make (val cf) in
let open P in
let acat = fold_doc (fun x y -> x <+> space <+> y) in
let rec pretty (s : sexp) =
match s with
| Atom s -> text s
| List [] -> lparen <+> rparen
| List [x] -> lparen <+> pretty x <+> rparen
| List (x :: xs) ->
let x_d = pretty x in
let xs_d = List.map pretty xs in
lparen <+>
(* Share x_d and xs_d across choices *)
(acat (x_d :: xs_d) <|> (* the horizontal style *)
vcat (x_d :: xs_d) <|> (* the vertical style *)
(x_d <+> space <+> vcat xs_d)) <+> (* the argument list style *)
rparen
in
pretty_print print_string (pretty s)
The important point is that we reuse x_d
and xs_d
across <|>
. Had we written the following code instead, the document construction could take exponential time, and the resulting document whose DAG size is very large would also cause pretty-printing to be inefficient.
(* Don't do this! *)
lparen <+>
(acat (pretty x :: List.map pretty xs) <|> (* the horizontal style *)
vcat (pretty x :: List.map pretty xs) <|> (* the vertical style *)
(pretty x <+> space <+> vcat List.map pretty xs)) <+> (* the argument list style *)
rparen
The Computation Width Limit
Unlike other pretty printers, Pretty_expressive
needs an additional parameter: the computation width limit. Regular users should not need to concern much with this parameter (default_cost_factory
will already provide a sensible value for the parameter), but advanced users who need a fine-grained control may want to adjust this parameter.
The parameter is used by the pretty printer to bound the computation. On the flip side, the pretty printer is only guaranteed to return an optimal layout among layout printings that do not exceed the parameter. As a result, if the parameter is too high, the performance could be negatively impacted, but if the parameter is too low, the output layout quality could be negatively impacted.
Pretty_expressive
employs various heuristics to make the output layout pretty even when the computation width limit is exceeded, however. In most applications, the value of 1.2 \times w
should suffice, where w
is the page width limit.
Technical Notes
A layout printing exceeds the computation width limit when either the column position or indentation level exceeds the limit. For example:
text "Racket"
exceeds the computation width limit of 5, since there are 6 characters in a line. Similarly:
nest 6 (text "Rack")
exceeds the computation width limit of 5, since the indentation level exceeds 5 (even though this indentation level is completely unused).
When all possible layout printing due to a document exceeds the computation width limit, Pretty_expressive
will still output a layout, with no guarantee that the layout is optimal. In such case, we say that the output layout is tainted. The info
record and functions such as pretty_print_info
can be used to find if the output layout is tainted or not.
Cost Factory
Pretty printers choose an optimal layout from a document by minimizing an optimality objective. Unlike other pretty printers, which have built-in optimality objectives, Pretty_expressive
allows users to customize an optimality objective via the cost factory.
The cost factory interface is given in CostFactory
. A valid cost factory must also satisfy the interface, as well as various contracts specified in the documentation of CostFactory
. These contracts ensure that the concept of a cost for a layout is well-defined, and make it possible to efficiently find a layout with minimal cost.
See default_cost_factory
and the paper for examples of cost factories.
Custom Cost Factory
Consider the example in Best Practice for Document Construction. Each list node can be rendered with three possible styles: the horizontal style, the vertical style, and the argument list style.
let example_sexp =
List [Atom "abc"; Atom "def"; List [Atom "ghi"; Atom "jkl"; Atom "mno"]]
# print_sexp example_sexp 15;;
(abc
def
(ghi jkl mno))
- : unit = ()
Indeed, this is an optimal layout according to default_cost_factory
(though not the only optimal layout), because it does not have any badness, and two newlines are minimal.
However, let’s say that we consider the vertical style to be not as pretty. The vertical style should still be a possibility however, since it can help us avoid going over the page width limit and minimize the number of newlines in many situations. We simply would prefer other styles when all else is equal. In this case, we would prefer the output:
(abc def (ghi jkl mno))
To address this issue, we construct a new cost factory that is similar to default_cost_factory
, but with an extra component: the style cost.
let my_cost_factory ~page_width ?computation_width () =
(module struct
let cf = Printer.default_cost_factory ~page_width:page_width ?computation_width:computation_width ()
module F = (val cf)
type t = F.t * int
let limit = F.limit
let text pos len = F.text pos len, 0
let newline _ = F.newline 0, 0
let combine (c1, s1) (c2, s2) =
F.combine c1 c2, s1 + s2
let le (c1, s1) (c2, s2) =
if c1 = c2 then
s1 <= s2
else
F.le c1 c2
let two_columns_overflow w = F.two_columns_overflow w, 0
let two_columns_bias w = F.two_columns_bias w, 0
let string_of_cost (c, s) = Printf.sprintf "(%s %d)" (F.string_of_cost c) s
let debug_format = F.debug_format
end: Signature.CostFactory with type t = (int * int * int) * int)
We now construct a function to convert an S-expression into a document, and utilize the cost
construct to add a style cost to the vertical style document, thus discouraging it.
let revised_print_sexp (s : sexp) (w : int) =
let cf = my_cost_factory ~page_width:w () in
let module P = Printer.Make (val cf) in
let open P in
let acat = fold_doc (fun x y -> x <+> space <+> y) in
let rec pretty (s : sexp) =
match s with
| Atom s -> text s
| List [] -> lparen <+> rparen
| List [x] -> lparen <+> pretty x <+> rparen
| List (x :: xs) ->
let x_d = pretty x in
let xs_d = List.map pretty xs in
lparen <+>
(acat (x_d :: xs_d) <|> (* the horizontal style *)
(cost ((0, 0, 0), 1) (vcat (x_d :: xs_d))) <|> (* the vertical style -- penalized *)
(x_d <+> space <+> vcat xs_d)) <+> (* the argument list style *)
rparen
in
pretty_print print_string (pretty s)
Now we can pretty print as we desired:
# revised_print_sexp example_sexp 15;;
(abc def
(ghi jkl
mno))
- : unit = ()
This does not mean that the vertical style won't ever be used, however. It is simply discouraged. With an even lower page width limit, the vertical style is the only way to avoid overflow, so it is employed.
# revised_print_sexp example_sexp 10;;
(abc
def
(ghi
jkl
mno))
- : unit = ()