package cmarkit
Install
Dune Dependency
Authors
Maintainers
Sources
sha512=42fa920e84f2b7d45f5cf1251d3308daa7becff2590f7ce84186cb22335415b02cc9bc44179095bf0d37624fb5a0e66d1c96fcc1b12f1106f567247a71c79039
doc/cmarkit/Cmarkit/index.html
Module Cmarkit
CommonMark parser and abstract syntax tree.
See examples.
References.
- John MacFarlane. CommonMark Spec. Version 0.30, 2021
Abstract syntax tree
module Textloc : sig ... end
Text locations.
module Meta : sig ... end
Node metadata.
type 'a node = 'a * Meta.t
The type for abstract syntax tree nodes. The data of type 'a
and its metadata.
module Layout : sig ... end
Types for layout information.
module Block_line : sig ... end
Block lines.
module Label : sig ... end
Labels.
module Link_definition : sig ... end
Link definitions.
module Inline : sig ... end
Inlines.
module Block : sig ... end
Blocks.
module Doc : sig ... end
Documents (and parser).
Maps and folds
module Mapper : sig ... end
Abstract syntax tree mappers.
module Folder : sig ... end
Abstract syntax tree folders.
Extensions
For some documents, bare CommonMark just misses it. The extensions are here to make it hit the mark. To enable them use Doc.of_string
with strict:false
.
Please note the following:
- There is no plan to provide an extension mechanism at the parsing level. A lot can already be achieved by using reference resolvers, abusing code fences, post-processing the abstract syntax tree, or extending the renderers.
- In order to minimize dialects and extension interaction oddities, there is no plan to allow to selectively enable extensions.
- If one day the CommonMark specification standardizes a set of extensions.
Cmarkit
will support those. - In the short term, there is no plan to support more extensions than those that are listed here.
Strikethrough
According to pandoc
.
Strikethrough your ~~perfect~~ imperfect thoughts.
Inline text delimited between two ~~
gets into an Inline.Ext_strikethrough
node.
The text delimited by ~~
cannot start or end with Unicode whitespace. When a closer can close multiple openers, the neareast opener is closed. Strikethrough inlines can be nested.
Math
According to a mix of pandoc
, GLFM, GFM.
Inline math
This is an inline $\sqrt(x - 1)$ math expression.
Inline text delimited between $
gets into an Inline.Ext_math_span
node.
The text delimited by $
cannot start and end with Unicode whitespace. Inline math cannot be nested, after an opener the nearest (non-escaped) closing delimiter matches. Otherwise it is parsed in essence like a code span.
Display math
It's better to get that $$ \left( \sum_{k=1}^n a_k b_k \right)^2 $$ on its own line. A math block may also be more convenient: ```math \left( \sum_{k=1}^n a_k b_k \right)^2 < \Phi ```
Inline text delimited by $$
gets into a Inline.Ext_math_span
with the Inline.Math_span.display
property set to true
. Alternatively code blocks whose language is math
get into in Block.Ext_math_block
blocks.
In contrast to $
, the text delimited by $$
can start and end with whitespace, however it can't contain a blank line. Display math cannot be nested, after an opener the nearest (non-escaped) closing delimiter matches. Otherwise it's parsed in essence like a code span.
List task items
According to a mix of md4c, GLFM, GFM and personal ad-hoc brewery.
* [ ] That's unchecked. * [x] That's checked. * [~] That's cancelled.
If a list item starts with up to three space, followed by followed by [
, a single Unicode character, ]
and a space (the space can be omitted if the line is empty, but subsequent indentation considers there was one). The Unicode character gets stored in Block.List_item.ext_task_marker
and counts as one column regardless of the character's render width. The task marker including the final space is considered part of the list marker as far as subsequent indentation is concerned.
The Unicode character indicates the status of the task. That's up to the client but the function Block.List_item.task_status_of_task_marker
which is used by the built-in renderers makes the following choices:
- Unchecked:
' '
(U+0020). - Checked:
'x'
(U+0078),'X'
(U+0058),'✓'
(U+2713, CHECK MARK),'✔'
(U+2714, HEAVY CHECK MARK),'𐄂'
(U+10102, AEGEAN CHECK MARK),'🗸'
(U+1F5F8, LIGHT CHECK MARK). - Cancelled:
'~'
(U+007E). - Other: any other character, interpretation left to clients or renderers (built-in ones equate it with done).
Tables
According to djot.
| # | Name | Description | Link | |:-:|----------:|:----------------------|------------------------:| | 1 | OCaml | The OCaml website | <https://ocaml.org> | | 2 | Haskell | The Haskell website | <https://haskell.org> | | 3 | MDN | Web dev docs | <https://developer.mozilla.org/> | | 4 | Wikipedia | The Free Encyclopedia | <https://wikipedia.org> |
A table is a sequence of rows, each row starts and ends with a (non-escaped) pipe |
character. The first row can't be indented by more than three spaces of indentation, subsequent rows can be arbitrarily indented. Blanks after the final pipe are allowed.
Each row of the table contains cells separated by (non-escaped) pipe |
characters. Pipes embedded in inlines constructs do not count as separators (the parsing strategy is to parse the row as an inline, split the result on the |
present in toplevel text nodes and strip initial and trailing blanks in cells). The number of |
separators plus 1 determines the number of columns of a row. The number of columns of a table is the greatest number of columns of its rows.
A separator line is a row in which every cell content is made only of one or more -
optionally prefixed and suffixed by :
. These rows are not data, they indicate alignment of data in their cell for subsequent rows (multiple separator lines in a single table are allowed) and that the previous line (if any) was a row of column headers. :-
is left aligned -:
is right aligned, :-:
is centered. If there's no alignement specified it's left aligned.
Tables are stored in Block.Ext_table
nodes.
Footnotes
According to djot for the footnote contents.
This is a footnote in history[^1] with mutiple references[^1]. Footnotes are not [very special][^1] references. [^1]: Footnotes can have lazy continuation lines and multiple paragraphs. If you start one column after the left bracket, blocks still get into the footnote. But this is no longer the footnote.
Footnotes go through the label resolution mecanism and share the same namespace as link references (including the ^
). They end up being defined in the Doc.defs
as Block.Footnote.Def
definitions. Footnote references are simply made by using Inline.Link
with the corresponding labels.
Definition
A footnote definition starts with a (single line) link label followed by :
. The label must start with a ^
. Footnote labels go through the label resolution mechanism.
All subsequent lines indented one column further than the start of the label (i.e. starting on the ^
) get into the footnote. Lazy continuation lines are supported.
The result is stored in the document's Doc.defs
in Block.Footnote.Def
cases and it's position in the documentation witnessed by a Block.Ext_footnote_definition
node which is kept for layout.
References
Footnote references are simply reference links with the footnote label. Linking text on footnotes is allowed. Shortcut and collapsed references to footnotes are rendered specially by Cmarkit_html
.