It is traditionnal to do parsing in two phases (scanning/parsing). This is not necessary with combinators in general (scannerless). This is still true with Pacomb. However, this makes the grammar more readable to use a lexing phase.
Moreover, lexing is often done with a longuest match rule that is not semantically equivalent to the semantics of context free grammar.
This modules provide combinator to create terminals that the parser will call.
give_up () rejects parsing from a corresponding semantic action. An error message can be provided. Can be used both in the semantics of terminals and parsing rules.
Terminal accepting the end of a buffer only. remark: eof is automatically added at the end of a grammar by Combinator.parse_buffer. name default is "EOF"
Accept a character in the given charset. name default as in test
Sourceval not_test : ?name:string ->(char -> bool)->unit t
Reject the input (raises Noparse) if the first character of the input passed the test. Does not read the character if the test fails. name default to "^" prepended to the result of Charset.show.
Reject the input (raises Noparse) if the first character of the input is in the charset. Does not read the character if not in the charset. name default as in not_test
Sourceval sub : ?name:string ->?charset:Charset.t->'at->('a-> bool)->'at
Does a test on the result of a given lexer and reject if it returns false. You may provide a restricted charset for the set of charaters accepted in the initial position.
alt t1 t2 parses the input with t1 or t2. Contrary to grammars, terminals does not use continuations, if t1 succeds, no backtrack will be performed to try t2. For instance,
Applies a function to the result of the given terminal. name defaults to the terminal name.
Sourceval star : ?name:string ->'at->(unit ->'b)->('b->'a->'b)->'bt
star t a f Repetition of a given terminal 0,1 or more times. The type of function to compose the action allows for 'b = Buffer.t for efficiency. The returned value is f ( ... (f(f (a ()) x_1) x_2) ...) x_n if t returns x_1 ... x_n. name defaults to sprintf "(%s)*" t.n
Sourceval plus : ?name:string ->'at->(unit ->'b)->('b->'a->'b)->'bt
utf8 c parses a specific unicode char and returns (), name defaults to the string representing the char
Sourceval any_grapheme : ?name:string ->unit ->string t
Parses any utf8 grapheme. name defaults to "GRAPHEME"
Sourceval grapheme : ?name:string ->string ->unit t
grapheme s parses the given utf8 grapheme and return (). The difference with string s x is that if the input starts with a grapheme s' such that s is a strict prefix of s', parsing will fail. name defaults to "GRAPHEME("^s^")"