.\" Man page generated from reStructuredText.
.
.TH "LARK" "7" "Oct 25, 2020" "" "Lark"
.SH NAME
lark \- Lark Documentation
.
.nr rst2man-indent-level 0
.
.de1 rstReportMargin
\\$1 \\n[an-margin]
level \\n[rst2man-indent-level]
level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
-
\\n[rst2man-indent0]
\\n[rst2man-indent1]
\\n[rst2man-indent2]
..
.de1 INDENT
.\" .rstReportMargin pre:
. RS \\$1
. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
. nr rst2man-indent-level +1
.\" .rstReportMargin post:
..
.de UNINDENT
. RE
.\" indent \\n[an-margin]
.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
.nr rst2man-indent-level -1
.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
..
.SH PHILOSOPHY
.sp
Parsers are innately complicated and confusing. They\(aqre difficult to understand, difficult to write, and difficult to use. Even experts on the subject can become baffled by the nuances of these complicated state\-machines.
.sp
Lark\(aqs mission is to make the process of writing them as simple and abstract as possible, by following these design principles:
.SS Design Principles
.INDENT 0.0
.IP \(bu 2
Readability matters
.IP \(bu 2
Keep the grammar clean and simple
.IP \(bu 2
Don\(aqt force the user to decide on things that the parser can figure out on its own
.IP \(bu 2
Usability is more important than performance
.IP \(bu 2
Performance is still very important
.IP \(bu 2
Follow the Zen Of Python, whenever possible and applicable
.UNINDENT
.sp
In accordance with these principles, I arrived at the following design choices:

.sp
.ce
----

.ce 0
.sp
.SS Design Choices
.SS 1. Separation of code and grammar
.sp
Grammars are the de\-facto reference for your language, and for the structure of your parse\-tree. For any non\-trivial language, the conflation of code and grammar always turns out convoluted and difficult to read.
.sp
The grammars in Lark are EBNF\-inspired, so they are especially easy to read & work with.
.SS 2. Always build a parse\-tree (unless told not to)
.sp
Trees are always simpler to work with than state\-machines.
.INDENT 0.0
.IP \(bu 2
Trees allow you to see the "state\-machine" visually
.IP \(bu 2
Trees allow your computation to be aware of previous and future states
.IP \(bu 2
Trees allow you to process the parse in steps, instead of forcing you to do it all at once.
.UNINDENT
.sp
And anyway, every parse\-tree can be replayed as a state\-machine, so there is no loss of information.
.sp
See this answer in more detail \fI\%here\fP\&.
.sp
To improve performance, you can skip building the tree for LALR(1), by providing Lark with a transformer (see the \fI\%JSON example\fP).
.SS 3. Earley is the default
.sp
The Earley algorithm can accept \fIany\fP context\-free grammar you throw at it (i.e. any grammar you can write in EBNF, it can parse). That makes it extremely friendly to beginners, who are not aware of the strange and arbitrary restrictions that LALR(1) places on its grammars.
.sp
As the users grow to understand the structure of their grammar, the scope of their target language, and their performance requirements, they may choose to switch over to LALR(1) to gain a huge performance boost, possibly at the cost of some language features.
.sp
In short, "Premature optimization is the root of all evil."
.SS Other design features
.INDENT 0.0
.IP \(bu 2
Automatically resolve terminal collisions whenever possible
.IP \(bu 2
Automatically keep track of line & column numbers
.UNINDENT
.SH FEATURES
.SS Main Features
.INDENT 0.0
.IP \(bu 2
Earley parser, capable of parsing any context\-free grammar
.INDENT 2.0
.IP \(bu 2
Implements SPPF, for efficient parsing and storing of ambiguous grammars.
.UNINDENT
.IP \(bu 2
LALR(1) parser, limited in power of expression, but very efficient in space and performance (O(n)).
.INDENT 2.0
.IP \(bu 2
Implements a parse\-aware lexer that provides a better power of expression than traditional LALR implementations (such as ply).
.UNINDENT
.IP \(bu 2
EBNF\-inspired grammar, with extra features (See: \fI\%Grammar Reference\fP)
.IP \(bu 2
Builds a parse\-tree (AST) automagically based on the grammar
.IP \(bu 2
Stand\-alone parser generator \- create a small independent parser to embed in your project.
.IP \(bu 2
Flexible error handling by using a "puppet parser" mechanism (LALR only)
.IP \(bu 2
Automatic line & column tracking (for both tokens and matched rules)
.IP \(bu 2
Automatic terminal collision resolution
.IP \(bu 2
Standard library of terminals (strings, numbers, names, etc.)
.IP \(bu 2
Unicode fully supported
.IP \(bu 2
Extensive test suite
.IP \(bu 2
MyPy support using type stubs
.IP \(bu 2
Python 2 & Python 3 compatible
.IP \(bu 2
Pure\-Python implementation
.UNINDENT
.sp
\fI\%Read more about the parsers\fP
.SS Extra features
.INDENT 0.0
.IP \(bu 2
Import rules and tokens from other Lark grammars, for code reuse and modularity.
.IP \(bu 2
Support for external regex module (\fI\%see here\fP)
.IP \(bu 2
Import grammars from Nearley.js (\fI\%read more\fP)
.IP \(bu 2
CYK parser
.IP \(bu 2
Visualize your parse trees as dot or png files (\fI\%see_example\fP)
.UNINDENT
.SS Experimental features
.INDENT 0.0
.IP \(bu 2
Automatic reconstruction of input from parse\-tree (see examples)
.UNINDENT
.SS Planned features (not implemented yet)
.INDENT 0.0
.IP \(bu 2
Generate code in other languages than Python
.IP \(bu 2
Grammar composition
.IP \(bu 2
LALR(k) parser
.IP \(bu 2
Full regexp\-collision support using NFAs
.UNINDENT
.SH PARSERS
.sp
Lark implements the following parsing algorithms: Earley, LALR(1), and CYK
.SS Earley
.sp
An \fI\%Earley Parser\fP is a chart parser capable of parsing any context\-free grammar at O(n^3), and O(n^2) when the grammar is unambiguous. It can parse most LR grammars at O(n). Most programming languages are LR, and can be parsed at a linear time.
.sp
Lark\(aqs Earley implementation runs on top of a skipping chart parser, which allows it to use regular expressions, instead of matching characters one\-by\-one. This is a huge improvement to Earley that is unique to Lark. This feature is used by default, but can also be requested explicitly using \fBlexer=\(aqdynamic\(aq\fP\&.
.sp
It\(aqs possible to bypass the dynamic lexing, and use the regular Earley parser with a traditional lexer, that tokenizes as an independent first step. Doing so will provide a speed benefit, but will tokenize without using Earley\(aqs ambiguity\-resolution ability. So choose this only if you know why! Activate with \fBlexer=\(aqstandard\(aq\fP
.sp
\fBSPPF & Ambiguity resolution\fP
.sp
Lark implements the Shared Packed Parse Forest data\-structure for the Earley parser, in order to reduce the space and computation required to handle ambiguous grammars.
.sp
You can read more about SPPF \fI\%here\fP
.sp
As a result, Lark can efficiently parse and store every ambiguity in the grammar, when using Earley.
.sp
Lark provides the following options to combat ambiguity:
.INDENT 0.0
.IP \(bu 2
Lark will choose the best derivation for you (default). Users can choose between different disambiguation strategies, and can prioritize (or demote) individual rules over others, using the rule\-priority syntax.
.IP \(bu 2
Users may choose to receive the set of all possible parse\-trees (using ambiguity=\(aqexplicit\(aq), and choose the best derivation themselves. While simple and flexible, it comes at the cost of space and performance, and so it isn\(aqt recommended for highly ambiguous grammars, or very long inputs.
.IP \(bu 2
As an advanced feature, users may use specialized visitors to iterate the SPPF themselves.
.UNINDENT
.sp
\fBdynamic_complete\fP
.sp
\fBTODO: Add documentation on dynamic_complete\fP
.SS LALR(1)
.sp
\fI\%LALR(1)\fP is a very efficient, true\-and\-tested parsing algorithm. It\(aqs incredibly fast and requires very little memory. It can parse most programming languages (For example: Python and Java).
.sp
Lark comes with an efficient implementation that outperforms every other parsing library for Python (including PLY)
.sp
Lark extends the traditional YACC\-based architecture with a \fIcontextual lexer\fP, which automatically provides feedback from the parser to the lexer, making the LALR(1) algorithm stronger than ever.
.sp
The contextual lexer communicates with the parser, and uses the parser\(aqs lookahead prediction to narrow its choice of tokens. So at each point, the lexer only matches the subgroup of terminals that are legal at that parser state, instead of all of the terminals. It’s surprisingly effective at resolving common terminal collisions, and allows one to parse languages that LALR(1) was previously incapable of parsing.
.sp
This is an improvement to LALR(1) that is unique to Lark.
.SS CYK Parser
.sp
A \fI\%CYK parser\fP can parse any context\-free grammar at O(n^3*|G|).
.sp
Its too slow to be practical for simple grammars, but it offers good performance for highly ambiguous grammars.
.SH JSON PARSER - TUTORIAL
.sp
Lark is a parser \- a program that accepts a grammar and text, and produces a structured tree that represents that text.
In this tutorial we will write a JSON parser in Lark, and explore Lark\(aqs various features in the process.
.sp
It has 5 parts.
.INDENT 0.0
.IP \(bu 2
Writing the grammar
.IP \(bu 2
Creating the parser
.IP \(bu 2
Shaping the tree
.IP \(bu 2
Evaluating the tree
.IP \(bu 2
Optimizing
.UNINDENT
.sp
Knowledge assumed:
.INDENT 0.0
.IP \(bu 2
Using Python
.IP \(bu 2
A basic understanding of how to use regular expressions
.UNINDENT
.SS Part 1 \- The Grammar
.sp
Lark accepts its grammars in a format called \fI\%EBNF\fP\&. It basically looks like this:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
rule_name : list of rules and TERMINALS to match
          | another possible list of items
          | etc.

TERMINAL: "some text to match"
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
(\fIa terminal is a string or a regular expression\fP)
.sp
The parser will try to match each rule (left\-part) by matching its items (right\-part) sequentially, trying each alternative (In practice, the parser is predictive so we don\(aqt have to try every alternative).
.sp
How to structure those rules is beyond the scope of this tutorial, but often it\(aqs enough to follow one\(aqs intuition.
.sp
In the case of JSON, the structure is simple: A json document is either a list, or a dictionary, or a string/number/etc.
.sp
The dictionaries and lists are recursive, and contain other json documents (or "values").
.sp
Let\(aqs write this structure in EBNF form:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
value: dict
     | list
     | STRING
     | NUMBER
     | "true" | "false" | "null"

list : "[" [value ("," value)*] "]"

dict : "{" [pair ("," pair)*] "}"
pair : STRING ":" value
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
A quick explanation of the syntax:
.INDENT 0.0
.IP \(bu 2
Parenthesis let us group rules together.
.IP \(bu 2
rule* means \fIany amount\fP\&. That means, zero or more instances of that rule.
.IP \(bu 2
[rule] means \fIoptional\fP\&. That means zero or one instance of that rule.
.UNINDENT
.sp
Lark also supports the rule+ operator, meaning one or more instances. It also supports the rule? operator which is another way to say \fIoptional\fP\&.
.sp
Of course, we still haven\(aqt defined "STRING" and "NUMBER"\&. Luckily, both these literals are already defined in Lark\(aqs common library:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
%import common.ESCAPED_STRING   \-> STRING
%import common.SIGNED_NUMBER    \-> NUMBER
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The arrow (\->) renames the terminals. But that only adds obscurity in this case, so going forward we\(aqll just use their original names.
.sp
We\(aqll also take care of the white\-space, which is part of the text.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
%import common.WS
%ignore WS
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
We tell our parser to ignore whitespace. Otherwise, we\(aqd have to fill our grammar with WS terminals.
.sp
By the way, if you\(aqre curious what these terminals signify, they are roughly equivalent to this:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
NUMBER : /\-?\ed+(\e.\ed+)?([eE][+\-]?\ed+)?/
STRING : /".*?(?<!\e\e)"/
%ignore /[ \et\en\ef\er]+/
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Lark will accept this, if you really want to complicate your life :)
.sp
You can find the original definitions in \fI\%common.lark\fP\&.
They\(aqre don\(aqt strictly adhere to \fI\%json.org\fP \- but our purpose here is to accept json, not validate it.
.sp
Notice that terminals are written in UPPER\-CASE, while rules are written in lower\-case.
I\(aqll touch more on the differences between rules and terminals later.
.SS Part 2 \- Creating the Parser
.sp
Once we have our grammar, creating the parser is very simple.
.sp
We simply instantiate Lark, and tell it to accept a "value":
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from lark import Lark
json_parser = Lark(r"""
    value: dict
         | list
         | ESCAPED_STRING
         | SIGNED_NUMBER
         | "true" | "false" | "null"

    list : "[" [value ("," value)*] "]"

    dict : "{" [pair ("," pair)*] "}"
    pair : ESCAPED_STRING ":" value

    %import common.ESCAPED_STRING
    %import common.SIGNED_NUMBER
    %import common.WS
    %ignore WS

    """, start=\(aqvalue\(aq)
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
It\(aqs that simple! Let\(aqs test it out:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
>>> text = \(aq{"key": ["item0", "item1", 3.14]}\(aq
>>> json_parser.parse(text)
Tree(value, [Tree(dict, [Tree(pair, [Token(STRING, "key"), Tree(value, [Tree(list, [Tree(value, [Token(STRING, "item0")]), Tree(value, [Token(STRING, "item1")]), Tree(value, [Token(NUMBER, 3.14)])])])])])])
>>> print( _.pretty() )
value
  dict
    pair
      "key"
      value
        list
          value	"item0"
          value	"item1"
          value	3.14
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
As promised, Lark automagically creates a tree that represents the parsed text.
.sp
But something is suspiciously missing from the tree. Where are the curly braces, the commas and all the other punctuation literals?
.sp
Lark automatically filters out literals from the tree, based on the following criteria:
.INDENT 0.0
.IP \(bu 2
Filter out string literals without a name, or with a name that starts with an underscore.
.IP \(bu 2
Keep regexps, even unnamed ones, unless their name starts with an underscore.
.UNINDENT
.sp
Unfortunately, this means that it will also filter out literals like "true" and "false", and we will lose that information. The next section, "Shaping the tree" deals with this issue, and others.
.SS Part 3 \- Shaping the Tree
.sp
We now have a parser that can create a parse tree (or: AST), but the tree has some issues:
.INDENT 0.0
.IP \(bu 2
"true", "false" and "null" are filtered out (test it out yourself!)
.IP \(bu 2
Is has useless branches, like \fIvalue\fP, that clutter\-up our view.
.UNINDENT
.sp
I\(aqll present the solution, and then explain it:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
?value: dict
      | list
      | string
      | SIGNED_NUMBER      \-> number
      | "true"             \-> true
      | "false"            \-> false
      | "null"             \-> null

\&...

string : ESCAPED_STRING
.ft P
.fi
.UNINDENT
.UNINDENT
.INDENT 0.0
.IP \(bu 2
Those little arrows signify \fIaliases\fP\&. An alias is a name for a specific part of the rule. In this case, we will name the \fItrue/false/null\fP matches, and this way we won\(aqt lose the information. We also alias \fISIGNED_NUMBER\fP to mark it for later processing.
.IP \(bu 2
The question\-mark prefixing \fIvalue\fP ("?value") tells the tree\-builder to inline this branch if it has only one member. In this case, \fIvalue\fP will always have only one member, and will always be inlined.
.IP \(bu 2
We turned the \fIESCAPED_STRING\fP terminal into a rule. This way it will appear in the tree as a branch. This is equivalent to aliasing (like we did for the number), but now \fIstring\fP can also be used elsewhere in the grammar (namely, in the \fIpair\fP rule).
.UNINDENT
.sp
Here is the new grammar:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from lark import Lark
json_parser = Lark(r"""
    ?value: dict
          | list
          | string
          | SIGNED_NUMBER      \-> number
          | "true"             \-> true
          | "false"            \-> false
          | "null"             \-> null

    list : "[" [value ("," value)*] "]"

    dict : "{" [pair ("," pair)*] "}"
    pair : string ":" value

    string : ESCAPED_STRING

    %import common.ESCAPED_STRING
    %import common.SIGNED_NUMBER
    %import common.WS
    %ignore WS

    """, start=\(aqvalue\(aq)
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
And let\(aqs test it out:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
>>> text = \(aq{"key": ["item0", "item1", 3.14, true]}\(aq
>>> print( json_parser.parse(text).pretty() )
dict
  pair
    string	"key"
    list
      string	"item0"
      string	"item1"
      number	3.14
      true
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Ah! That is much much nicer.
.SS Part 4 \- Evaluating the tree
.sp
It\(aqs nice to have a tree, but what we really want is a JSON object.
.sp
The way to do it is to evaluate the tree, using a Transformer.
.sp
A transformer is a class with methods corresponding to branch names. For each branch, the appropriate method will be called with the children of the branch as its argument, and its return value will replace the branch in the tree.
.sp
So let\(aqs write a partial transformer, that handles lists and dictionaries:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from lark import Transformer

class MyTransformer(Transformer):
    def list(self, items):
        return list(items)
    def pair(self, key_value):
        k, v = key_value
        return k, v
    def dict(self, items):
        return dict(items)
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
And when we run it, we get this:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
>>> tree = json_parser.parse(text)
>>> MyTransformer().transform(tree)
{Tree(string, [Token(ANONRE_1, "key")]): [Tree(string, [Token(ANONRE_1, "item0")]), Tree(string, [Token(ANONRE_1, "item1")]), Tree(number, [Token(ANONRE_0, 3.14)]), Tree(true, [])]}
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
This is pretty close. Let\(aqs write a full transformer that can handle the terminals too.
.sp
Also, our definitions of list and dict are a bit verbose. We can do better:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from lark import Transformer

class TreeToJson(Transformer):
    def string(self, s):
        (s,) = s
        return s[1:\-1]
    def number(self, n):
        (n,) = n
        return float(n)

    list = list
    pair = tuple
    dict = dict

    null = lambda self, _: None
    true = lambda self, _: True
    false = lambda self, _: False
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
And when we run it:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
>>> tree = json_parser.parse(text)
>>> TreeToJson().transform(tree)
{u\(aqkey\(aq: [u\(aqitem0\(aq, u\(aqitem1\(aq, 3.14, True]}
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Magic!
.SS Part 5 \- Optimizing
.SS Step 1 \- Benchmark
.sp
By now, we have a fully working JSON parser, that can accept a string of JSON, and return its Pythonic representation.
.sp
But how fast is it?
.sp
Now, of course there are JSON libraries for Python written in C, and we can never compete with them. But since this is applicable to any parser you would write in Lark, let\(aqs see how far we can take this.
.sp
The first step for optimizing is to have a benchmark. For this benchmark I\(aqm going to take data from \fI\%json\-generator.com/\fP\&. I took their default suggestion and changed it to 5000 objects. The result is a 6.6MB sparse JSON file.
.sp
Our first program is going to be just a concatenation of everything we\(aqve done so far:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import sys
from lark import Lark, Transformer

json_grammar = r"""
    ?value: dict
          | list
          | string
          | SIGNED_NUMBER      \-> number
          | "true"             \-> true
          | "false"            \-> false
          | "null"             \-> null

    list : "[" [value ("," value)*] "]"

    dict : "{" [pair ("," pair)*] "}"
    pair : string ":" value

    string : ESCAPED_STRING

    %import common.ESCAPED_STRING
    %import common.SIGNED_NUMBER
    %import common.WS
    %ignore WS
    """

class TreeToJson(Transformer):
    def string(self, s):
        (s,) = s
        return s[1:\-1]
    def number(self, n):
        (n,) = n
        return float(n)

    list = list
    pair = tuple
    dict = dict

    null = lambda self, _: None
    true = lambda self, _: True
    false = lambda self, _: False

json_parser = Lark(json_grammar, start=\(aqvalue\(aq, lexer=\(aqstandard\(aq)

if __name__ == \(aq__main__\(aq:
    with open(sys.argv[1]) as f:
        tree = json_parser.parse(f.read())
        print(TreeToJson().transform(tree))
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
We run it and get this:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ time python tutorial_json.py json_data > /dev/null

real	0m36.257s
user	0m34.735s
sys         0m1.361s
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
That\(aqs unsatisfactory time for a 6MB file. Maybe if we were parsing configuration or a small DSL, but we\(aqre trying to handle large amount of data here.
.sp
Well, turns out there\(aqs quite a bit we can do about it!
.SS Step 2 \- LALR(1)
.sp
So far we\(aqve been using the Earley algorithm, which is the default in Lark. Earley is powerful but slow. But it just so happens that our grammar is LR\-compatible, and specifically LALR(1) compatible.
.sp
So let\(aqs switch to LALR(1) and see what happens:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
json_parser = Lark(json_grammar, start=\(aqvalue\(aq, parser=\(aqlalr\(aq)
.ft P
.fi
.UNINDENT
.UNINDENT
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ time python tutorial_json.py json_data > /dev/null

real        0m7.554s
user        0m7.352s
sys         0m0.148s
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Ah, that\(aqs much better. The resulting JSON is of course exactly the same. You can run it for yourself and see.
.sp
It\(aqs important to note that not all grammars are LR\-compatible, and so you can\(aqt always switch to LALR(1). But there\(aqs no harm in trying! If Lark lets you build the grammar, it means you\(aqre good to go.
.SS Step 3 \- Tree\-less LALR(1)
.sp
So far, we\(aqve built a full parse tree for our JSON, and then transformed it. It\(aqs a convenient method, but it\(aqs not the most efficient in terms of speed and memory. Luckily, Lark lets us avoid building the tree when parsing with LALR(1).
.sp
Here\(aqs the way to do it:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
json_parser = Lark(json_grammar, start=\(aqvalue\(aq, parser=\(aqlalr\(aq, transformer=TreeToJson())

if __name__ == \(aq__main__\(aq:
    with open(sys.argv[1]) as f:
        print( json_parser.parse(f.read()) )
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
We\(aqve used the transformer we\(aqve already written, but this time we plug it straight into the parser. Now it can avoid building the parse tree, and just send the data straight into our transformer. The \fIparse()\fP method now returns the transformed JSON, instead of a tree.
.sp
Let\(aqs benchmark it:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
real	0m4.866s
user	0m4.722s
sys 	0m0.121s
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
That\(aqs a measurable improvement! Also, this way is more memory efficient. Check out the benchmark table at the end to see just how much.
.sp
As a general practice, it\(aqs recommended to work with parse trees, and only skip the tree\-builder when your transformer is already working.
.SS Step 4 \- PyPy
.sp
PyPy is a JIT engine for running Python, and it\(aqs designed to be a drop\-in replacement.
.sp
Lark is written purely in Python, which makes it very suitable for PyPy.
.sp
Let\(aqs get some free performance:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ time pypy tutorial_json.py json_data > /dev/null

real	0m1.397s
user	0m1.296s
sys 	0m0.083s
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
PyPy is awesome!
.SS Conclusion
.sp
We\(aqve brought the run\-time down from 36 seconds to 1.1 seconds, in a series of small and simple steps.
.sp
Now let\(aqs compare the benchmarks in a nicely organized table.
.sp
I measured memory consumption using a little script called \fI\%memusg\fP
.sp
I added a few other parsers for comparison. PyParsing and funcparselib fair pretty well in their memory usage (they don\(aqt build a tree), but they can\(aqt compete with the run\-time speed of LALR(1).
.sp
These benchmarks are for Lark\(aqs alpha version. I already have several optimizations planned that will significantly improve run\-time speed.
.sp
Once again, shout\-out to PyPy for being so effective.
.SS Afterword
.sp
This is the end of the tutorial. I hoped you liked it and learned a little about Lark.
.sp
To see what else you can do with Lark, check out the \fI\%examples\fP\&.
.sp
For questions or any other subject, feel free to email me at erezshin at gmail dot com.
.SH HOW TO USE LARK - GUIDE
.SS Work process
.sp
This is the recommended process for working with Lark:
.INDENT 0.0
.IP \(bu 2
Collect or create input samples, that demonstrate key features or behaviors in the language you\(aqre trying to parse.
.IP \(bu 2
Write a grammar. Try to aim for a structure that is intuitive, and in a way that imitates how you would explain your language to a fellow human.
.IP \(bu 2
Try your grammar in Lark against each input sample. Make sure the resulting parse\-trees make sense.
.IP \(bu 2
Use Lark\(aqs grammar features to \fI\%shape the tree\fP: Get rid of superfluous rules by inlining them, and use aliases when specific cases need clarification.
.UNINDENT
.INDENT 0.0
.IP \(bu 2
You can perform steps 1\-4 repeatedly, gradually growing your grammar to include more sentences.
.UNINDENT
.INDENT 0.0
.IP \(bu 2
Create a transformer to evaluate the parse\-tree into a structure you\(aqll be comfortable to work with. This may include evaluating literals, merging branches, or even converting the entire tree into your own set of AST classes.
.UNINDENT
.sp
Of course, some specific use\-cases may deviate from this process. Feel free to suggest these cases, and I\(aqll add them to this page.
.SS Getting started
.sp
Browse the \fI\%Examples\fP to find a template that suits your purposes.
.sp
Read the tutorials to get a better understanding of how everything works. (links in the \fI\%main page\fP)
.sp
Use the \fI\%Cheatsheet (PDF)\fP for quick reference.
.sp
Use the reference pages for more in\-depth explanations. (links in the \fI\%main page\fP]
.SS LALR usage
.sp
By default Lark silently resolves Shift/Reduce conflicts as Shift. To enable warnings pass \fBdebug=True\fP\&. To get the messages printed you have to configure the \fBlogger\fP beforehand. For example:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import logging
from lark import Lark, logger

logger.setLevel(logging.DEBUG)

collision_grammar = \(aq\(aq\(aq
start: as as
as: a*
a: "a"
\(aq\(aq\(aq
p = Lark(collision_grammar, parser=\(aqlalr\(aq, debug=True)
.ft P
.fi
.UNINDENT
.UNINDENT
.SH HOW TO DEVELOP LARK - GUIDE
.sp
There are many ways you can help the project:
.INDENT 0.0
.IP \(bu 2
Help solve issues
.IP \(bu 2
Improve the documentation
.IP \(bu 2
Write new grammars for Lark\(aqs library
.IP \(bu 2
Write a blog post introducing Lark to your audience
.IP \(bu 2
Port Lark to another language
.IP \(bu 2
Help me with code development
.UNINDENT
.sp
If you\(aqre interested in taking one of these on, let me know and I will provide more details and assist you in the process.
.SS Unit Tests
.sp
Lark comes with an extensive set of tests. Many of the tests will run several times, once for each parser configuration.
.sp
To run the tests, just go to the lark project root, and run the command:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
python \-m tests
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
or
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
pypy \-m tests
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
For a list of supported interpreters, you can consult the \fBtox.ini\fP file.
.sp
You can also run a single unittest using its class and method name, for example:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
##   test_package test_class_name.test_function_name
python \-m tests TestLalrStandard.test_lexer_error_recovering
.ft P
.fi
.UNINDENT
.UNINDENT
.SS tox
.sp
To run all Unit Tests with tox,
install tox and Python 2.7 up to the latest python interpreter supported (consult the file tox.ini).
Then,
run the command \fBtox\fP on the root of this project (where the main setup.py file is on).
.sp
And, for example,
if you would like to only run the Unit Tests for Python version 2.7,
you can run the command \fBtox \-e py27\fP
.SS pytest
.sp
You can also run the tests using pytest:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
pytest tests
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Using setup.py
.sp
Another way to run the tests is using setup.py:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
python setup.py test 
.ft P
.fi
.UNINDENT
.UNINDENT
.SH RECIPES
.sp
A collection of recipes to use Lark and its various features
.SS Use a transformer to parse integer tokens
.sp
Transformers are the common interface for processing matched rules and tokens.
.sp
They can be used during parsing for better performance.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from lark import Lark, Transformer

class T(Transformer):
    def INT(self, tok):
        "Convert the value of \(gatok\(ga from string to int, while maintaining line number & column."
        return tok.update(value=int(tok))

parser = Lark("""
start: INT*
%import common.INT
%ignore " "
""", parser="lalr", transformer=T())

print(parser.parse(\(aq3 14 159\(aq))
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Prints out:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
Tree(start, [Token(INT, 3), Token(INT, 14), Token(INT, 159)])
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Collect all comments with lexer_callbacks
.sp
\fBlexer_callbacks\fP can be used to interface with the lexer as it generates tokens.
.sp
It accepts a dictionary of the form
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
{TOKEN_TYPE: callback}
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Where callback is of type \fBf(Token) \-> Token\fP
.sp
It only works with the standard and contextual lexers.
.sp
This has the same effect of using a transformer, but can also process ignored tokens.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from lark import Lark

comments = []

parser = Lark("""
    start: INT*

    COMMENT: /#.*/

    %import common (INT, WS)
    %ignore COMMENT
    %ignore WS
""", parser="lalr", lexer_callbacks={\(aqCOMMENT\(aq: comments.append})

parser.parse("""
1 2 3  # hello
# world
4 5 6
""")

print(comments)
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Prints out:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
[Token(COMMENT, \(aq# hello\(aq), Token(COMMENT, \(aq# world\(aq)]
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fINote: We don\(aqt have to return a token, because comments are ignored\fP
.SS CollapseAmbiguities
.sp
Parsing ambiguous texts with earley and \fBambiguity=\(aqexplicit\(aq\fP produces a single tree with \fB_ambig\fP nodes to mark where the ambiguity occurred.
.sp
However, it\(aqs sometimes more convenient instead to work with a list of all possible unambiguous trees.
.sp
Lark provides a utility transformer for that purpose:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from lark import Lark, Tree, Transformer
from lark.visitors import CollapseAmbiguities

grammar = """
    !start: x y

    !x: "a" "b"
      | "ab"
      | "abc"

    !y: "c" "d"
      | "cd"
      | "d"

"""
parser = Lark(grammar, ambiguity=\(aqexplicit\(aq)

t = parser.parse(\(aqabcd\(aq)
for x in CollapseAmbiguities().transform(t):
    print(x.pretty())
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
This prints out:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
start
x
    a
    b
y
    c
    d

start
x     ab
y     cd

start
x     abc
y     d
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
While convenient, this should be used carefully, as highly ambiguous trees will soon create an exponential explosion of such unambiguous derivations.
.SS Keeping track of parents when visiting
.sp
The following visitor assigns a \fBparent\fP attribute for every node in the tree.
.sp
If your tree nodes aren\(aqt unique (if there is a shared Tree instance), the assert will fail.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
class Parent(Visitor):
    def __default__(self, tree):
        for subtree in tree.children:
            if isinstance(subtree, Tree):
                assert not hasattr(subtree, \(aqparent\(aq)
                subtree.parent = tree
.ft P
.fi
.UNINDENT
.UNINDENT
.SH EXAMPLES FOR LARK
.sp
\fBHow to run the examples\fP:
.sp
After cloning the repo, open the terminal into the root directory of the
project, and run the following:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
[lark]$ python \-m examples.<name_of_example>
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
For example, the following will parse all the Python files in the
standard library of your local installation:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
[lark]$ python \-m examples.python_parser
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Beginner Examples
.SS Parsing Indentation
.sp
A demonstration of parsing indentation (“whitespace significant” language)
and the usage of the Indenter class.
.sp
Since indentation is context\-sensitive, a postlex stage is introduced to
manufacture INDENT/DEDENT tokens.
.sp
It is crucial for the indenter that the NL_type matches
the spaces (and tabs) after the newline.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from lark import Lark
from lark.indenter import Indenter

tree_grammar = r"""
    ?start: _NL* tree

    tree: NAME _NL [_INDENT tree+ _DEDENT]

    %import common.CNAME \-> NAME
    %import common.WS_INLINE
    %declare _INDENT _DEDENT
    %ignore WS_INLINE

    _NL: /(\er?\en[\et ]*)+/
"""

class TreeIndenter(Indenter):
    NL_type = \(aq_NL\(aq
    OPEN_PAREN_types = []
    CLOSE_PAREN_types = []
    INDENT_type = \(aq_INDENT\(aq
    DEDENT_type = \(aq_DEDENT\(aq
    tab_len = 8

parser = Lark(tree_grammar, parser=\(aqlalr\(aq, postlex=TreeIndenter())

test_tree = """
a
    b
    c
        d
        e
    f
        g
"""

def test():
    print(parser.parse(test_tree).pretty())

if __name__ == \(aq__main__\(aq:
    test()
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBTotal running time of the script:\fP ( 0 minutes  0.000 seconds)
.SS Lark Grammar
.sp
A reference implementation of the Lark grammar (using LALR(1))
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import lark
from pathlib import Path

parser = lark.Lark.open(\(aqlark.lark\(aq, rel_to=__file__, parser="lalr")

examples_path = Path(__file__).parent
lark_path = Path(lark.__file__).parent

grammar_files = [
    examples_path / \(aqlark.lark\(aq,
    examples_path / \(aqadvanced/python2.lark\(aq,
    examples_path / \(aqadvanced/python3.lark\(aq,
    examples_path / \(aqrelative\-imports/multiples.lark\(aq,
    examples_path / \(aqrelative\-imports/multiple2.lark\(aq,
    examples_path / \(aqrelative\-imports/multiple3.lark\(aq,
    examples_path / \(aqtests/no_newline_at_end.lark\(aq,
    examples_path / \(aqtests/negative_priority.lark\(aq,
    lark_path / \(aqgrammars/common.lark\(aq,
]

def test():
    for grammar_file in grammar_files:
        tree = parser.parse(open(grammar_file).read())
    print("All grammars parsed successfully")

if __name__ == \(aq__main__\(aq:
    test()
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBTotal running time of the script:\fP ( 0 minutes  0.000 seconds)
.SS Handling Ambiguity
.sp
A demonstration of ambiguity
.sp
This example shows how to use get explicit ambiguity from Lark\(aqs Earley parser.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import sys
from lark import Lark, tree

grammar = """
    sentence: noun verb noun        \-> simple
            | noun verb "like" noun \-> comparative

    noun: adj? NOUN
    verb: VERB
    adj: ADJ

    NOUN: "flies" | "bananas" | "fruit"
    VERB: "like" | "flies"
    ADJ: "fruit"

    %import common.WS
    %ignore WS
"""

parser = Lark(grammar, start=\(aqsentence\(aq, ambiguity=\(aqexplicit\(aq)

sentence = \(aqfruit flies like bananas\(aq

def make_png(filename):
    tree.pydot__tree_to_png( parser.parse(sentence), filename)

def make_dot(filename):
    tree.pydot__tree_to_dot( parser.parse(sentence), filename)

if __name__ == \(aq__main__\(aq:
    print(parser.parse(sentence).pretty())
    # make_png(sys.argv[1])
    # make_dot(sys.argv[1])

# Output:
#
# _ambig
#   comparative
#     noun  fruit
#     verb  flies
#     noun  bananas
#   simple
#     noun
#       fruit
#       flies
#     verb  like
#     noun  bananas
#
# (or view a nicer version at "./fruitflies.png")
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBTotal running time of the script:\fP ( 0 minutes  0.000 seconds)
.SS Basic calculator
.sp
A simple example of a REPL calculator
.sp
This example shows how to write a basic calculator with variables.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from lark import Lark, Transformer, v_args


try:
    input = raw_input   # For Python2 compatibility
except NameError:
    pass


calc_grammar = """
    ?start: sum
          | NAME "=" sum    \-> assign_var

    ?sum: product
        | sum "+" product   \-> add
        | sum "\-" product   \-> sub

    ?product: atom
        | product "*" atom  \-> mul
        | product "/" atom  \-> div

    ?atom: NUMBER           \-> number
         | "\-" atom         \-> neg
         | NAME             \-> var
         | "(" sum ")"

    %import common.CNAME \-> NAME
    %import common.NUMBER
    %import common.WS_INLINE

    %ignore WS_INLINE
"""


@v_args(inline=True)    # Affects the signatures of the methods
class CalculateTree(Transformer):
    from operator import add, sub, mul, truediv as div, neg
    number = float

    def __init__(self):
        self.vars = {}

    def assign_var(self, name, value):
        self.vars[name] = value
        return value

    def var(self, name):
        try:
            return self.vars[name]
        except KeyError:
            raise Exception("Variable not found: %s" % name)


calc_parser = Lark(calc_grammar, parser=\(aqlalr\(aq, transformer=CalculateTree())
calc = calc_parser.parse


def main():
    while True:
        try:
            s = input(\(aq> \(aq)
        except EOFError:
            break
        print(calc(s))


def test():
    print(calc("a = 1+2"))
    print(calc("1+a*\-3"))


if __name__ == \(aq__main__\(aq:
    # test()
    main()
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBTotal running time of the script:\fP ( 0 minutes  0.000 seconds)
.SS Turtle DSL
.sp
Implements a LOGO\-like toy language for Python’s turtle, with interpreter.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
try:
    input = raw_input   # For Python2 compatibility
except NameError:
    pass

import turtle

from lark import Lark

turtle_grammar = """
    start: instruction+

    instruction: MOVEMENT NUMBER            \-> movement
               | "c" COLOR [COLOR]          \-> change_color
               | "fill" code_block          \-> fill
               | "repeat" NUMBER code_block \-> repeat

    code_block: "{" instruction+ "}"

    MOVEMENT: "f"|"b"|"l"|"r"
    COLOR: LETTER+

    %import common.LETTER
    %import common.INT \-> NUMBER
    %import common.WS
    %ignore WS
"""

parser = Lark(turtle_grammar)

def run_instruction(t):
    if t.data == \(aqchange_color\(aq:
        turtle.color(*t.children)   # We just pass the color names as\-is

    elif t.data == \(aqmovement\(aq:
        name, number = t.children
        { \(aqf\(aq: turtle.fd,
          \(aqb\(aq: turtle.bk,
          \(aql\(aq: turtle.lt,
          \(aqr\(aq: turtle.rt, }[name](int(number))

    elif t.data == \(aqrepeat\(aq:
        count, block = t.children
        for i in range(int(count)):
            run_instruction(block)

    elif t.data == \(aqfill\(aq:
        turtle.begin_fill()
        run_instruction(t.children[0])
        turtle.end_fill()

    elif t.data == \(aqcode_block\(aq:
        for cmd in t.children:
            run_instruction(cmd)
    else:
        raise SyntaxError(\(aqUnknown instruction: %s\(aq % t.data)


def run_turtle(program):
    parse_tree = parser.parse(program)
    for inst in parse_tree.children:
        run_instruction(inst)

def main():
    while True:
        code = input(\(aq> \(aq)
        try:
            run_turtle(code)
        except Exception as e:
            print(e)

def test():
    text = """
        c red yellow
        fill { repeat 36 {
            f200 l170
        }}
    """
    run_turtle(text)

if __name__ == \(aq__main__\(aq:
    # test()
    main()
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBTotal running time of the script:\fP ( 0 minutes  0.000 seconds)
.SS Simple JSON Parser
.sp
The code is short and clear, and outperforms every other parser (that\(aqs written in Python).
For an explanation, check out the JSON parser tutorial at /docs/json_tutorial.md
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import sys

from lark import Lark, Transformer, v_args

json_grammar = r"""
    ?start: value

    ?value: object
          | array
          | string
          | SIGNED_NUMBER      \-> number
          | "true"             \-> true
          | "false"            \-> false
          | "null"             \-> null

    array  : "[" [value ("," value)*] "]"
    object : "{" [pair ("," pair)*] "}"
    pair   : string ":" value

    string : ESCAPED_STRING

    %import common.ESCAPED_STRING
    %import common.SIGNED_NUMBER
    %import common.WS

    %ignore WS
"""


class TreeToJson(Transformer):
    @v_args(inline=True)
    def string(self, s):
        return s[1:\-1].replace(\(aq\e\e"\(aq, \(aq"\(aq)

    array = list
    pair = tuple
    object = dict
    number = v_args(inline=True)(float)

    null = lambda self, _: None
    true = lambda self, _: True
    false = lambda self, _: False


### Create the JSON parser with Lark, using the Earley algorithm
# json_parser = Lark(json_grammar, parser=\(aqearley\(aq, lexer=\(aqstandard\(aq)
# def parse(x):
#     return TreeToJson().transform(json_parser.parse(x))

### Create the JSON parser with Lark, using the LALR algorithm
json_parser = Lark(json_grammar, parser=\(aqlalr\(aq,
                   # Using the standard lexer isn\(aqt required, and isn\(aqt usually recommended.
                   # But, it\(aqs good enough for JSON, and it\(aqs slightly faster.
                   lexer=\(aqstandard\(aq,
                   # Disabling propagate_positions and placeholders slightly improves speed
                   propagate_positions=False,
                   maybe_placeholders=False,
                   # Using an internal transformer is faster and more memory efficient
                   transformer=TreeToJson())
parse = json_parser.parse


def test():
    test_json = \(aq\(aq\(aq
        {
            "empty_object" : {},
            "empty_array"  : [],
            "booleans"     : { "YES" : true, "NO" : false },
            "numbers"      : [ 0, 1, \-2, 3.3, 4.4e5, 6.6e\-7 ],
            "strings"      : [ "This", [ "And" , "That", "And a \e\e"b" ] ],
            "nothing"      : null
        }
    \(aq\(aq\(aq

    j = parse(test_json)
    print(j)
    import json
    assert j == json.loads(test_json)


if __name__ == \(aq__main__\(aq:
    # test()
    with open(sys.argv[1]) as f:
        print(parse(f.read()))
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBTotal running time of the script:\fP ( 0 minutes  0.000 seconds)
.SS Advanced Examples
.SS LALR’s contextual lexer
.sp
Demonstrates the power of LALR’s contextual lexer on a toy configuration language.
.sp
The tokens NAME and VALUE match the same input. A standard lexer would arbitrarily
choose one over the other, which would lead to a (confusing) parse error.
However, due to the unambiguous structure of the grammar, Lark\(aqs LALR(1) algorithm knows
which one of them to expect at each point during the parse.
The lexer then only matches the tokens that the parser expects.
The result is a correct parse, something that is impossible with a regular lexer.
.sp
Another approach is to discard a lexer altogether and use the Earley algorithm.
It will handle more cases than the contextual lexer, but at the cost of performance.
See examples/conf_earley.py for an example of that approach.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from lark import Lark

parser = Lark(r"""
        start: _NL? section+
        section: "[" NAME "]" _NL item+
        item: NAME "=" VALUE? _NL
        VALUE: /./+

        %import common.CNAME \-> NAME
        %import common.NEWLINE \-> _NL
        %import common.WS_INLINE
        %ignore WS_INLINE
    """, parser="lalr")


sample_conf = """
[bla]
a=Hello
this="that",4
empty=
"""

print(parser.parse(sample_conf).pretty())
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBTotal running time of the script:\fP ( 0 minutes  0.000 seconds)
.SS Templates
.sp
This example shows how to use Lark\(aqs templates to achieve cleaner grammars
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from lark import Lark

grammar = r"""
start: list | dict

list: "[" _seperated{atom, ","} "]"
dict: "{" _seperated{key_value, ","} "}"
key_value: atom ":" atom

_seperated{x, sep}: x (sep x)*  // Define a sequence of \(aqx sep x sep x ...\(aq

atom: NUMBER | ESCAPED_STRING

%import common (NUMBER, ESCAPED_STRING, WS)
%ignore WS
"""


parser = Lark(grammar)

print(parser.parse(\(aq[1, "a", 2]\(aq))
print(parser.parse(\(aq{"a": 2, "b": 6}\(aq))
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBTotal running time of the script:\fP ( 0 minutes  0.000 seconds)
.SS Earley’s dynamic lexer
.sp
Demonstrates the power of Earley’s dynamic lexer on a toy configuration language
.sp
Using a lexer for configuration files is tricky, because values don\(aqt
have to be surrounded by delimiters. Using a standard lexer for this just won\(aqt work.
.sp
In this example we use a dynamic lexer and let the Earley parser resolve the ambiguity.
.sp
Another approach is to use the contextual lexer with LALR. It is less powerful than Earley,
but it can handle some ambiguity when lexing and it\(aqs much faster.
See examples/conf_lalr.py for an example of that approach.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from lark import Lark

parser = Lark(r"""
        start: _NL? section+
        section: "[" NAME "]" _NL item+
        item: NAME "=" VALUE? _NL
        VALUE: /./+

        %import common.CNAME \-> NAME
        %import common.NEWLINE \-> _NL
        %import common.WS_INLINE
        %ignore WS_INLINE
    """, parser="earley")

def test():
    sample_conf = """
[bla]

a=Hello
this="that",4
empty=
"""

    r = parser.parse(sample_conf)
    print (r.pretty())

if __name__ == \(aq__main__\(aq:
    test()
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBTotal running time of the script:\fP ( 0 minutes  0.000 seconds)
.SS Error handling with a puppet
.sp
This example demonstrates error handling using a parsing puppet in LALR
.sp
When the parser encounters an UnexpectedToken exception, it creates a
parsing puppet with the current parse\-state, and lets you control how
to proceed step\-by\-step. When you\(aqve achieved the correct parse\-state,
you can resume the run by returning True.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from lark import Token

from _json_parser import json_parser

def ignore_errors(e):
    if e.token.type == \(aqCOMMA\(aq:
        # Skip comma
        return True
    elif e.token.type == \(aqSIGNED_NUMBER\(aq:
        # Try to feed a comma and retry the number
        e.puppet.feed_token(Token(\(aqCOMMA\(aq, \(aq,\(aq))
        e.puppet.feed_token(e.token)
        return True

    # Unhandled error. Will stop parse and raise exception
    return False


def main():
    s = "[0 1, 2,, 3,,, 4, 5 6 ]"
    res = json_parser.parse(s, on_error=ignore_errors)
    print(res)      # prints [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0]

main()
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBTotal running time of the script:\fP ( 0 minutes  0.000 seconds)
.SS Reconstruct a JSON
.sp
Demonstrates the experimental text\-reconstruction feature
.sp
The Reconstructor takes a parse tree (already filtered from punctuation, of course),
and reconstructs it into correct text, that can be parsed correctly.
It can be useful for creating "hooks" to alter data before handing it to other parsers. You can also use it to generate samples from scratch.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import json

from lark import Lark
from lark.reconstruct import Reconstructor

from _json_parser import json_grammar

test_json = \(aq\(aq\(aq
    {
        "empty_object" : {},
        "empty_array"  : [],
        "booleans"     : { "YES" : true, "NO" : false },
        "numbers"      : [ 0, 1, \-2, 3.3, 4.4e5, 6.6e\-7 ],
        "strings"      : [ "This", [ "And" , "That", "And a \e\e"b" ] ],
        "nothing"      : null
    }
\(aq\(aq\(aq

def test_earley():

    json_parser = Lark(json_grammar, maybe_placeholders=False)
    tree = json_parser.parse(test_json)

    new_json = Reconstructor(json_parser).reconstruct(tree)
    print (new_json)
    print (json.loads(new_json) == json.loads(test_json))


def test_lalr():

    json_parser = Lark(json_grammar, parser=\(aqlalr\(aq, maybe_placeholders=False)
    tree = json_parser.parse(test_json)

    new_json = Reconstructor(json_parser).reconstruct(tree)
    print (new_json)
    print (json.loads(new_json) == json.loads(test_json))

test_earley()
test_lalr()
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBTotal running time of the script:\fP ( 0 minutes  0.000 seconds)
.SS Custom lexer
.sp
Demonstrates using a custom lexer to parse a non\-textual stream of data
.sp
You can use a custom lexer to tokenize text when the lexers offered by Lark
are too slow, or not flexible enough.
.sp
You can also use it (as shown in this example) to tokenize streams of objects.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from lark import Lark, Transformer, v_args
from lark.lexer import Lexer, Token

class TypeLexer(Lexer):
    def __init__(self, lexer_conf):
        pass

    def lex(self, data):
        for obj in data:
            if isinstance(obj, int):
                yield Token(\(aqINT\(aq, obj)
            elif isinstance(obj, (type(\(aq\(aq), type(u\(aq\(aq))):
                yield Token(\(aqSTR\(aq, obj)
            else:
                raise TypeError(obj)

parser = Lark("""
        start: data_item+
        data_item: STR INT*

        %declare STR INT
        """, parser=\(aqlalr\(aq, lexer=TypeLexer)


class ParseToDict(Transformer):
    @v_args(inline=True)
    def data_item(self, name, *numbers):
        return name.value, [n.value for n in numbers]

    start = dict


def test():
    data = [\(aqalice\(aq, 1, 27, 3, \(aqbob\(aq, 4, \(aqcarrie\(aq, \(aqdan\(aq, 8, 6]

    print(data)

    tree = parser.parse(data)
    res = ParseToDict().transform(tree)

    print(\(aq\-\->\(aq)
    print(res) # prints {\(aqalice\(aq: [1, 27, 3], \(aqbob\(aq: [4], \(aqcarrie\(aq: [], \(aqdan\(aq: [8, 6]}


if __name__ == \(aq__main__\(aq:
    test()
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBTotal running time of the script:\fP ( 0 minutes  0.000 seconds)
.SS Transform a Forest
.sp
This example demonstrates how to subclass \fBTreeForestTransformer\fP to
directly transform a SPPF.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from lark import Lark
from lark.parsers.earley_forest import TreeForestTransformer, handles_ambiguity, Discard

class CustomTransformer(TreeForestTransformer):

    @handles_ambiguity
    def sentence(self, trees):
        return next(tree for tree in trees if tree.data == \(aqsimple\(aq)

    def simple(self, children):
        children.append(\(aq.\(aq)
        return self.tree_class(\(aqsimple\(aq, children)

    def adj(self, children):
        raise Discard()

    def __default_token__(self, token):
        return token.capitalize()

grammar = """
    sentence: noun verb noun        \-> simple
            | noun verb "like" noun \-> comparative

    noun: adj? NOUN
    verb: VERB
    adj: ADJ

    NOUN: "flies" | "bananas" | "fruit"
    VERB: "like" | "flies"
    ADJ: "fruit"

    %import common.WS
    %ignore WS
"""

parser = Lark(grammar, start=\(aqsentence\(aq, ambiguity=\(aqforest\(aq)
sentence = \(aqfruit flies like bananas\(aq
forest = parser.parse(sentence)

tree = CustomTransformer(resolve_ambiguity=False).transform(forest)
print(tree.pretty())

# Output:
#
# simple
#   noun  Flies
#   verb  Like
#   noun  Bananas
#   .
#
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBTotal running time of the script:\fP ( 0 minutes  0.000 seconds)
.SS Simple JSON Parser
.sp
The code is short and clear, and outperforms every other parser (that\(aqs written in Python).
For an explanation, check out the JSON parser tutorial at /docs/json_tutorial.md
.sp
(this is here for use by the other examples)
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import sys

from lark import Lark, Transformer, v_args

json_grammar = r"""
    ?start: value

    ?value: object
          | array
          | string
          | SIGNED_NUMBER      \-> number
          | "true"             \-> true
          | "false"            \-> false
          | "null"             \-> null

    array  : "[" [value ("," value)*] "]"
    object : "{" [pair ("," pair)*] "}"
    pair   : string ":" value

    string : ESCAPED_STRING

    %import common.ESCAPED_STRING
    %import common.SIGNED_NUMBER
    %import common.WS

    %ignore WS
"""


class TreeToJson(Transformer):
    @v_args(inline=True)
    def string(self, s):
        return s[1:\-1].replace(\(aq\e\e"\(aq, \(aq"\(aq)

    array = list
    pair = tuple
    object = dict
    number = v_args(inline=True)(float)

    null = lambda self, _: None
    true = lambda self, _: True
    false = lambda self, _: False


### Create the JSON parser with Lark, using the LALR algorithm
json_parser = Lark(json_grammar, parser=\(aqlalr\(aq,
                   # Using the standard lexer isn\(aqt required, and isn\(aqt usually recommended.
                   # But, it\(aqs good enough for JSON, and it\(aqs slightly faster.
                   lexer=\(aqstandard\(aq,
                   # Disabling propagate_positions and placeholders slightly improves speed
                   propagate_positions=False,
                   maybe_placeholders=False,
                   # Using an internal transformer is faster and more memory efficient
                   transformer=TreeToJson())
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBTotal running time of the script:\fP ( 0 minutes  0.000 seconds)
.SS Custom SPPF Prioritizer
.sp
This example demonstrates how to subclass \fBForestVisitor\fP to make a custom
SPPF node prioritizer to be used in conjunction with \fBTreeForestTransformer\fP\&.
.sp
Our prioritizer will count the number of descendants of a node that are tokens.
By negating this count, our prioritizer will prefer nodes with fewer token
descendants. Thus, we choose the more specific parse.
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from lark import Lark
from lark.parsers.earley_forest import ForestVisitor, TreeForestTransformer

class TokenPrioritizer(ForestVisitor):

    def visit_symbol_node_in(self, node):
        # visit the entire forest by returning node.children
        return node.children

    def visit_packed_node_in(self, node):
        return node.children

    def visit_symbol_node_out(self, node):
        priority = 0
        for child in node.children:
            # Tokens do not have a priority attribute
            # count them as \-1
            priority += getattr(child, \(aqpriority\(aq, \-1)
        node.priority = priority

    def visit_packed_node_out(self, node):
        priority = 0
        for child in node.children:
            priority += getattr(child, \(aqpriority\(aq, \-1)
        node.priority = priority

    def on_cycle(self, node, path):
        raise Exception("Oops, we encountered a cycle.")

grammar = """
start: hello " " world | hello_world
hello: "Hello"
world: "World"
hello_world: "Hello World"
"""

parser = Lark(grammar, parser=\(aqearley\(aq, ambiguity=\(aqforest\(aq)
forest = parser.parse("Hello World")

print("Default prioritizer:")
tree = TreeForestTransformer(resolve_ambiguity=True).transform(forest)
print(tree.pretty())

forest = parser.parse("Hello World")

print("Custom prioritizer:")
tree = TreeForestTransformer(resolve_ambiguity=True, prioritizer=TokenPrioritizer()).transform(forest)
print(tree.pretty())

# Output:
#
# Default prioritizer:
# start
#   hello Hello
#
#   world World
#
# Custom prioritizer:
# start
#   hello_world   Hello World
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBTotal running time of the script:\fP ( 0 minutes  0.000 seconds)
.SS Compile Python to Bytecode
.sp
A toy example that compiles Python directly to bytecode, without generating an AST.
It currently only works for very very simple Python code.
.sp
It requires the \(aqbytecode\(aq library. You can get it using
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ pip install bytecode
.ft P
.fi
.UNINDENT
.UNINDENT
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from lark import Lark, Transformer, v_args
from lark.indenter import Indenter

from bytecode import Instr, Bytecode

class PythonIndenter(Indenter):
    NL_type = \(aq_NEWLINE\(aq
    OPEN_PAREN_types = [\(aqLPAR\(aq, \(aqLSQB\(aq, \(aqLBRACE\(aq]
    CLOSE_PAREN_types = [\(aqRPAR\(aq, \(aqRSQB\(aq, \(aqRBRACE\(aq]
    INDENT_type = \(aq_INDENT\(aq
    DEDENT_type = \(aq_DEDENT\(aq
    tab_len = 8


@v_args(inline=True)
class Compile(Transformer):
    def number(self, n):
        return [Instr(\(aqLOAD_CONST\(aq, int(n))]
    def string(self, s):
        return [Instr(\(aqLOAD_CONST\(aq, s[1:\-1])]
    def var(self, n):
        return [Instr(\(aqLOAD_NAME\(aq, n)]

    def arith_expr(self, a, op, b):
        # TODO support chain arithmetic
        assert op == \(aq+\(aq
        return a + b + [Instr(\(aqBINARY_ADD\(aq)]

    def arguments(self, args):
        return args

    def funccall(self, name, args):
        return name + args + [Instr(\(aqCALL_FUNCTION\(aq, 1)]

    @v_args(inline=False)
    def file_input(self, stmts):
        return sum(stmts, []) + [Instr("RETURN_VALUE")]

    def expr_stmt(self, lval, rval):
        # TODO more complicated than that
        name ,= lval
        assert name.name == \(aqLOAD_NAME\(aq # XXX avoid with another layer of abstraction
        return rval + [Instr("STORE_NAME", name.arg)]

    def __default__(self, *args):
        assert False, args


python_parser3 = Lark.open(\(aqpython3.lark\(aq, rel_to=__file__, start=\(aqfile_input\(aq,
                           parser=\(aqlalr\(aq, postlex=PythonIndenter(),
                           transformer=Compile(), propagate_positions=False)

def compile_python(s):
    insts = python_parser3.parse(s+"\en")
    return Bytecode(insts).to_code()

code = compile_python("""
a = 3
b = 5
print("Hello World!")
print(a+(b+2))
print((a+b)+2)
""")
exec(code)
# \-\- Output \-\-
# Hello World!
# 10
# 10
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBTotal running time of the script:\fP ( 0 minutes  0.000 seconds)
.SS Grammar\-complete Python Parser
.sp
A fully\-working Python 2 & 3 parser (but not production ready yet!)
.sp
This example demonstrates usage of the included Python grammars
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import sys
import os, os.path
from io import open
import glob, time

from lark import Lark
from lark.indenter import Indenter

# __path__ = os.path.dirname(__file__)

class PythonIndenter(Indenter):
    NL_type = \(aq_NEWLINE\(aq
    OPEN_PAREN_types = [\(aqLPAR\(aq, \(aqLSQB\(aq, \(aqLBRACE\(aq]
    CLOSE_PAREN_types = [\(aqRPAR\(aq, \(aqRSQB\(aq, \(aqRBRACE\(aq]
    INDENT_type = \(aq_INDENT\(aq
    DEDENT_type = \(aq_DEDENT\(aq
    tab_len = 8

kwargs = dict(rel_to=__file__, postlex=PythonIndenter(), start=\(aqfile_input\(aq)

python_parser2 = Lark.open(\(aqpython2.lark\(aq, parser=\(aqlalr\(aq, **kwargs)
python_parser3 = Lark.open(\(aqpython3.lark\(aq,parser=\(aqlalr\(aq, **kwargs)
python_parser2_earley = Lark.open(\(aqpython2.lark\(aq, parser=\(aqearley\(aq, lexer=\(aqstandard\(aq, **kwargs)

try:
    xrange
except NameError:
    chosen_parser = python_parser3
else:
    chosen_parser = python_parser2


def _read(fn, *args):
    kwargs = {\(aqencoding\(aq: \(aqiso\-8859\-1\(aq}
    with open(fn, *args, **kwargs) as f:
        return f.read()

def _get_lib_path():
    if os.name == \(aqnt\(aq:
        if \(aqPyPy\(aq in sys.version:
            return os.path.join(sys.prefix, \(aqlib\-python\(aq, sys.winver)
        else:
            return os.path.join(sys.prefix, \(aqLib\(aq)
    else:
        return [x for x in sys.path if x.endswith(\(aq%s.%s\(aq % sys.version_info[:2])][0]

def test_python_lib():
    path = _get_lib_path()

    start = time.time()
    files = glob.glob(path+\(aq/*.py\(aq)
    for f in files:
        print( f )
        chosen_parser.parse(_read(os.path.join(path, f)) + \(aq\en\(aq)

    end = time.time()
    print( "test_python_lib (%d files), time: %s secs"%(len(files), end\-start) )

def test_earley_equals_lalr():
    path = _get_lib_path()

    files = glob.glob(path+\(aq/*.py\(aq)
    for f in files:
        print( f )
        tree1 = python_parser2.parse(_read(os.path.join(path, f)) + \(aq\en\(aq)
        tree2 = python_parser2_earley.parse(_read(os.path.join(path, f)) + \(aq\en\(aq)
        assert tree1 == tree2


if __name__ == \(aq__main__\(aq:
    test_python_lib()
    # test_earley_equals_lalr()
    # python_parser3.parse(_read(sys.argv[1]) + \(aq\en\(aq)
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBTotal running time of the script:\fP ( 0 minutes  0.000 seconds)
.SS Example\-Driven Error Reporting
.sp
A demonstration of example\-driven error reporting with the LALR parser
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from lark import Lark, UnexpectedInput

from _json_parser import json_grammar   # Using the grammar from the json_parser example

json_parser = Lark(json_grammar, parser=\(aqlalr\(aq)

class JsonSyntaxError(SyntaxError):
    def __str__(self):
        context, line, column = self.args
        return \(aq%s at line %s, column %s.\en\en%s\(aq % (self.label, line, column, context)

class JsonMissingValue(JsonSyntaxError):
    label = \(aqMissing Value\(aq

class JsonMissingOpening(JsonSyntaxError):
    label = \(aqMissing Opening\(aq

class JsonMissingClosing(JsonSyntaxError):
    label = \(aqMissing Closing\(aq

class JsonMissingComma(JsonSyntaxError):
    label = \(aqMissing Comma\(aq

class JsonTrailingComma(JsonSyntaxError):
    label = \(aqTrailing Comma\(aq


def parse(json_text):
    try:
        j = json_parser.parse(json_text)
    except UnexpectedInput as u:
        exc_class = u.match_examples(json_parser.parse, {
            JsonMissingOpening: [\(aq{"foo": ]}\(aq,
                                 \(aq{"foor": }}\(aq,
                                 \(aq{"foo": }\(aq],
            JsonMissingClosing: [\(aq{"foo": [}\(aq,
                                 \(aq{\(aq,
                                 \(aq{"a": 1\(aq,
                                 \(aq[1\(aq],
            JsonMissingComma: [\(aq[1 2]\(aq,
                               \(aq[false 1]\(aq,
                               \(aq["b" 1]\(aq,
                               \(aq{"a":true 1:4}\(aq,
                               \(aq{"a":1 1:4}\(aq,
                               \(aq{"a":"b" 1:4}\(aq],
            JsonTrailingComma: [\(aq[,]\(aq,
                                \(aq[1,]\(aq,
                                \(aq[1,2,]\(aq,
                                \(aq{"foo":1,}\(aq,
                                \(aq{"foo":false,"bar":true,}\(aq]
        }, use_accepts=True)
        if not exc_class:
            raise
        raise exc_class(u.get_context(json_text), u.line, u.column)


def test():
    try:
        parse(\(aq{"example1": "value"\(aq)
    except JsonMissingClosing as e:
        print(e)

    try:
        parse(\(aq{"example2": ] \(aq)
    except JsonMissingOpening as e:
        print(e)


if __name__ == \(aq__main__\(aq:
    test()
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBTotal running time of the script:\fP ( 0 minutes  0.000 seconds)
.SS Syntax Highlighting
.sp
This example shows how to write a syntax\-highlighted editor with Qt and Lark
.sp
Requirements:
.INDENT 0.0
.INDENT 3.5
PyQt5==5.10.1
QScintilla==2.10.4
.UNINDENT
.UNINDENT
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
import sys
import textwrap

from PyQt5.Qt import *  # noqa

from PyQt5.Qsci import QsciScintilla
from PyQt5.Qsci import QsciLexerCustom

from lark import Lark


class LexerJson(QsciLexerCustom):

    def __init__(self, parent=None):
        super().__init__(parent)
        self.create_parser()
        self.create_styles()

    def create_styles(self):
        deeppink = QColor(249, 38, 114)
        khaki = QColor(230, 219, 116)
        mediumpurple = QColor(174, 129, 255)
        mediumturquoise = QColor(81, 217, 205)
        yellowgreen = QColor(166, 226, 46)
        lightcyan = QColor(213, 248, 232)
        darkslategrey = QColor(39, 40, 34)

        styles = {
            0: mediumturquoise,
            1: mediumpurple,
            2: yellowgreen,
            3: deeppink,
            4: khaki,
            5: lightcyan
        }

        for style, color in styles.items():
            self.setColor(color, style)
            self.setPaper(darkslategrey, style)
            self.setFont(self.parent().font(), style)

        self.token_styles = {
            "COLON": 5,
            "COMMA": 5,
            "LBRACE": 5,
            "LSQB": 5,
            "RBRACE": 5,
            "RSQB": 5,
            "FALSE": 0,
            "NULL": 0,
            "TRUE": 0,
            "STRING": 4,
            "NUMBER": 1,
        }

    def create_parser(self):
        grammar = \(aq\(aq\(aq
            anons: ":" "{" "}" "," "[" "]"
            TRUE: "true"
            FALSE: "false"
            NULL: "NULL"
            %import common.ESCAPED_STRING \-> STRING
            %import common.SIGNED_NUMBER  \-> NUMBER
            %import common.WS
            %ignore WS
        \(aq\(aq\(aq

        self.lark = Lark(grammar, parser=None, lexer=\(aqstandard\(aq)
        # All tokens: print([t.name for t in self.lark.parser.lexer.tokens])

    def defaultPaper(self, style):
        return QColor(39, 40, 34)

    def language(self):
        return "Json"

    def description(self, style):
        return {v: k for k, v in self.token_styles.items()}.get(style, "")

    def styleText(self, start, end):
        self.startStyling(start)
        text = self.parent().text()[start:end]
        last_pos = 0

        try:
            for token in self.lark.lex(text):
                ws_len = token.pos_in_stream \- last_pos
                if ws_len:
                    self.setStyling(ws_len, 0)    # whitespace

                token_len = len(bytearray(token, "utf\-8"))
                self.setStyling(
                    token_len, self.token_styles.get(token.type, 0))

                last_pos = token.pos_in_stream + token_len
        except Exception as e:
            print(e)


class EditorAll(QsciScintilla):

    def __init__(self, parent=None):
        super().__init__(parent)

        # Set font defaults
        font = QFont()
        font.setFamily(\(aqConsolas\(aq)
        font.setFixedPitch(True)
        font.setPointSize(8)
        font.setBold(True)
        self.setFont(font)

        # Set margin defaults
        fontmetrics = QFontMetrics(font)
        self.setMarginsFont(font)
        self.setMarginWidth(0, fontmetrics.width("000") + 6)
        self.setMarginLineNumbers(0, True)
        self.setMarginsForegroundColor(QColor(128, 128, 128))
        self.setMarginsBackgroundColor(QColor(39, 40, 34))
        self.setMarginType(1, self.SymbolMargin)
        self.setMarginWidth(1, 12)

        # Set indentation defaults
        self.setIndentationsUseTabs(False)
        self.setIndentationWidth(4)
        self.setBackspaceUnindents(True)
        self.setIndentationGuides(True)

        # self.setFolding(QsciScintilla.CircledFoldStyle)

        # Set caret defaults
        self.setCaretForegroundColor(QColor(247, 247, 241))
        self.setCaretWidth(2)

        # Set selection color defaults
        self.setSelectionBackgroundColor(QColor(61, 61, 52))
        self.resetSelectionForegroundColor()

        # Set multiselection defaults
        self.SendScintilla(QsciScintilla.SCI_SETMULTIPLESELECTION, True)
        self.SendScintilla(QsciScintilla.SCI_SETMULTIPASTE, 1)
        self.SendScintilla(
            QsciScintilla.SCI_SETADDITIONALSELECTIONTYPING, True)

        lexer = LexerJson(self)
        self.setLexer(lexer)


EXAMPLE_TEXT = textwrap.dedent("""\e
        {
            "_id": "5b05ffcbcf8e597939b3f5ca",
            "about": "Excepteur consequat commodo esse voluptate aute aliquip ad sint deserunt commodo eiusmod irure. Sint aliquip sit magna duis eu est culpa aliqua excepteur ut tempor nulla. Aliqua ex pariatur id labore sit. Quis sit ex aliqua veniam exercitation laboris anim adipisicing. Lorem nisi reprehenderit ullamco labore qui sit ut aliqua tempor consequat pariatur proident.",
            "address": "665 Malbone Street, Thornport, Louisiana, 243",
            "age": 23,
            "balance": "$3,216.91",
            "company": "BULLJUICE",
            "email": "elisekelley@bulljuice.com",
            "eyeColor": "brown",
            "gender": "female",
            "guid": "d3a6d865\-0f64\-4042\-8a78\-4f53de9b0707",
            "index": 0,
            "isActive": false,
            "isActive2": true,
            "latitude": \-18.660714,
            "longitude": \-85.378048,
            "name": "Elise Kelley",
            "phone": "+1 (808) 543\-3966",
            "picture": "http://placehold.it/32x32",
            "registered": "2017\-09\-30T03:47:40 \-02:00",
            "tags": [
                "et",
                "nostrud",
                "in",
                "fugiat",
                "incididunt",
                "labore",
                "nostrud"
            ]
        }\e
    """)

def main():
    app = QApplication(sys.argv)
    ex = EditorAll()
    ex.setWindowTitle(__file__)
    ex.setText(EXAMPLE_TEXT)
    ex.resize(800, 600)
    ex.show()
    sys.exit(app.exec_())


if __name__ == "__main__":
    main()
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBTotal running time of the script:\fP ( 0 minutes  0.000 seconds)
.SH GRAMMAR REFERENCE
.SS Definitions
.sp
A \fBgrammar\fP is a list of rules and terminals, that together define a language.
.sp
Terminals define the alphabet of the language, while rules define its structure.
.sp
In Lark, a terminal may be a string, a regular expression, or a concatenation of these and other terminals.
.sp
Each rule is a list of terminals and rules, whose location and nesting define the structure of the resulting parse\-tree.
.sp
A \fBparsing algorithm\fP is an algorithm that takes a grammar definition and a sequence of symbols (members of the alphabet), and matches the entirety of the sequence by searching for a structure that is allowed by the grammar.
.SS General Syntax and notes
.sp
Grammars in Lark are based on \fI\%EBNF\fP syntax, with several enhancements.
.sp
EBNF is basically a short\-hand for common BNF patterns.
.sp
Optionals are expanded:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
  a b? c    \->    (a c | a b c)
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Repetition is extracted into a recursion:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
  a: b*    \->    a: _b_tag
                 _b_tag: (_b_tag b)?
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
And so on.
.sp
Lark grammars are composed of a list of definitions and directives, each on its own line. A definition is either a named rule, or a named terminal, with the following syntax, respectively:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
  rule: <EBNF EXPRESSION>
      | etc.

  TERM: <EBNF EXPRESSION>   // Rules aren\(aqt allowed
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBComments\fP start with \fB//\fP and last to the end of the line (C++ style)
.sp
Lark begins the parse with the rule \(aqstart\(aq, unless specified otherwise in the options.
.sp
Names of rules are always in lowercase, while names of terminals are always in uppercase. This distinction has practical effects, for the shape of the generated parse\-tree, and the automatic construction of the lexer (aka tokenizer, or scanner).
.SS Terminals
.sp
Terminals are used to match text into symbols. They can be defined as a combination of literals and other terminals.
.sp
\fBSyntax:\fP
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
<NAME> [. <priority>] : <literals\-and\-or\-terminals>
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Terminal names must be uppercase.
.sp
Literals can be one of:
.INDENT 0.0
.IP \(bu 2
\fB"string"\fP
.IP \(bu 2
\fB/regular expression+/\fP
.IP \(bu 2
\fB"case\-insensitive string"i\fP
.IP \(bu 2
\fB/re with flags/imulx\fP
.IP \(bu 2
Literal range: \fB"a".."z"\fP, \fB"1".."9"\fP, etc.
.UNINDENT
.sp
Terminals also support grammar operators, such as \fB|\fP, \fB+\fP, \fB*\fP and \fB?\fP\&.
.sp
Terminals are a linear construct, and therefore may not contain themselves (recursion isn\(aqt allowed).
.SS Templates
.sp
Templates are expanded when preprocessing the grammar.
.sp
Definition syntax:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
  my_template{param1, param2, ...}: <EBNF EXPRESSION>
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Use syntax:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
some_rule: my_template{arg1, arg2, ...}
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Example:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
_separated{x, sep}: x (sep x)*  // Define a sequence of \(aqx sep x sep x ...\(aq

num_list: "[" _separated{NUMBER, ","} "]"   // Will match "[1, 2, 3]" etc.
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Priority
.sp
Terminals can be assigned priority only when using a lexer (future versions may support Earley\(aqs dynamic lexing).
.sp
Priority can be either positive or negative. If not specified for a terminal, it defaults to 1.
.sp
Highest priority terminals are always matched first.
.SS Regexp Flags
.sp
You can use flags on regexps and strings. For example:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
SELECT: "select"i     //# Will ignore case, and match SELECT or Select, etc.
MULTILINE_TEXT: /.+/s
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Supported flags are one of: \fBimslu\fP\&. See Python\(aqs regex documentation for more details on each one.
.sp
Regexps/strings of different flags can only be concatenated in Python 3.6+
.SS Notes for when using a lexer:
.sp
When using a lexer (standard or contextual), it is the grammar\-author\(aqs responsibility to make sure the literals don\(aqt collide, or that if they do, they are matched in the desired order. Literals are matched according to the following precedence:
.INDENT 0.0
.IP \(bu 2
Highest priority first (priority is specified as: TERM.number: ...)
.IP \(bu 2
Length of match (for regexps, the longest theoretical match is used)
.IP \(bu 2
Length of literal / pattern definition
.IP \(bu 2
Name
.UNINDENT
.sp
\fBExamples:\fP
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
IF: "if"
INTEGER : /[0\-9]+/
INTEGER2 : ("0".."9")+          //# Same as INTEGER
DECIMAL.2: INTEGER? "." INTEGER  //# Will be matched before INTEGER
WHITESPACE: (" " | /\et/ )+
SQL_SELECT: "select"i
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Regular expressions & Ambiguity
.sp
Each terminal is eventually compiled to a regular expression. All the operators and references inside it are mapped to their respective expressions.
.sp
For example, in the following grammar, \fBA1\fP and \fBA2\fP, are equivalent:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
A1: "a" | "b"
A2: /a|b/
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
This means that inside terminals, Lark cannot detect or resolve ambiguity, even when using Earley.
.sp
For example, for this grammar:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
start           : (A | B)+
A               : "a" | "ab"
B               : "b"
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
We get this behavior:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
>>> p.parse("ab")
Tree(start, [Token(A, \(aqa\(aq), Token(B, \(aqb\(aq)])
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
This is happening because Python\(aqs regex engine always returns the first matching option.
.sp
If you find yourself in this situation, the recommended solution is to use rules instead.
.sp
Example:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
>>> p = Lark("""start: (a | b)+
\&...             !a: "a" | "ab"
\&...             !b: "b"
\&...             """, ambiguity="explicit")
>>> print(p.parse("ab").pretty())
_ambig
  start
    a   ab
  start
    a   a
    b   b
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Rules
.sp
\fBSyntax:\fP
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
<name> : <items\-to\-match>  [\-> <alias> ]
       | ...
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Names of rules and aliases are always in lowercase.
.sp
Rule definitions can be extended to the next line by using the OR operator (signified by a pipe: \fB|\fP ).
.sp
An alias is a name for the specific rule alternative. It affects tree construction.
.sp
Each item is one of:
.INDENT 0.0
.IP \(bu 2
\fBrule\fP
.IP \(bu 2
\fBTERMINAL\fP
.IP \(bu 2
\fB"string literal"\fP or \fB/regexp literal/\fP
.IP \(bu 2
\fB(item item ..)\fP \- Group items
.IP \(bu 2
\fB[item item ..]\fP \- Maybe. Same as \fB(item item ..)?\fP, but when \fBmaybe_placeholders=True\fP, generates \fBNone\fP if there is no match.
.IP \(bu 2
\fBitem?\fP \- Zero or one instances of item ("maybe")
.IP \(bu 2
\fBitem*\fP \- Zero or more instances of item
.IP \(bu 2
\fBitem+\fP \- One or more instances of item
.IP \(bu 2
\fBitem ~ n\fP \- Exactly \fIn\fP instances of item
.IP \(bu 2
\fBitem ~ n..m\fP \- Between \fIn\fP to \fIm\fP instances of item (not recommended for wide ranges, due to performance issues)
.UNINDENT
.sp
\fBExamples:\fP
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
hello_world: "hello" "world"
mul: (mul "*")? number     //# Left\-recursion is allowed and encouraged!
expr: expr operator expr
    | value               //# Multi\-line, belongs to expr

four_words: word ~ 4
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Priority
.sp
Rules can be assigned priority only when using Earley (future versions may support LALR as well).
.sp
Priority can be either positive or negative. In not specified for a terminal, it\(aqs assumed to be 1 (i.e. the default).
.sp

.SS Directives
.SS %ignore
.sp
All occurrences of the terminal will be ignored, and won\(aqt be part of the parse.
.sp
Using the \fB%ignore\fP directive results in a cleaner grammar.
.sp
It\(aqs especially important for the LALR(1) algorithm, because adding whitespace (or comments, or other extraneous elements) explicitly in the grammar, harms its predictive abilities, which are based on a lookahead of 1.
.sp
\fBSyntax:\fP
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
%ignore <TERMINAL>
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
\fBExamples:\fP
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
%ignore " "

COMMENT: "#" /[^\en]/*
%ignore COMMENT
.ft P
.fi
.UNINDENT
.UNINDENT
.SS %import
.sp
Allows one to import terminals and rules from lark grammars.
.sp
When importing rules, all their dependencies will be imported into a namespace, to avoid collisions. It\(aqs not possible to override their dependencies (e.g. like you would when inheriting a class).
.sp
\fBSyntax:\fP
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
%import <module>.<TERMINAL>
%import <module>.<rule>
%import <module>.<TERMINAL> \-> <NEWTERMINAL>
%import <module>.<rule> \-> <newrule>
%import <module> (<TERM1>, <TERM2>, <rule1>, <rule2>)
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
If the module path is absolute, Lark will attempt to load it from the built\-in directory (currently, only \fBcommon.lark\fP is available).
.sp
If the module path is relative, such as \fB\&.path.to.file\fP, Lark will attempt to load it from the current working directory. Grammars must have the \fB\&.lark\fP extension.
.sp
The rule or terminal can be imported under another name with the \fB\->\fP syntax.
.sp
\fBExample:\fP
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
%import common.NUMBER

%import .terminals_file (A, B, C)

%import .rules_file.rulea \-> ruleb
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Note that \fB%ignore\fP directives cannot be imported. Imported rules will abide by the \fB%ignore\fP directives declared in the main grammar.
.SS %declare
.sp
Declare a terminal without defining it. Useful for plugins.
.SH TREE CONSTRUCTION REFERENCE
.sp
Lark builds a tree automatically based on the structure of the grammar, where each rule that is matched becomes a branch (node) in the tree, and its children are its matches, in the order of matching.
.sp
For example, the rule \fBnode: child1 child2\fP will create a tree node with two children. If it is matched as part of another rule (i.e. if it isn\(aqt the root), the new rule\(aqs tree node will become its parent.
.sp
Using \fBitem+\fP or \fBitem*\fP will result in a list of items, equivalent to writing \fBitem item item ..\fP\&.
.sp
Using \fBitem?\fP will return the item if it matched, or nothing.
.sp
If \fBmaybe_placeholders=False\fP (the default), then \fB[]\fP behaves like \fB()?\fP\&.
.sp
If \fBmaybe_placeholders=True\fP, then using \fB[item]\fP will return the item if it matched, or the value \fBNone\fP, if it didn\(aqt.
.SS Terminals
.sp
Terminals are always values in the tree, never branches.
.sp
Lark filters out certain types of terminals by default, considering them punctuation:
.INDENT 0.0
.IP \(bu 2
Terminals that won\(aqt appear in the tree are:
.INDENT 2.0
.IP \(bu 2
Unnamed literals (like \fB"keyword"\fP or \fB"+"\fP)
.IP \(bu 2
Terminals whose name starts with an underscore (like \fB_DIGIT\fP)
.UNINDENT
.IP \(bu 2
Terminals that \fIwill\fP appear in the tree are:
.INDENT 2.0
.IP \(bu 2
Unnamed regular expressions (like \fB/[0\-9]/\fP)
.IP \(bu 2
Named terminals whose name starts with a letter (like \fBDIGIT\fP)
.UNINDENT
.UNINDENT
.sp
Note: Terminals composed of literals and other terminals always include the entire match without filtering any part.
.sp
\fBExample:\fP
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
start:  PNAME pname

PNAME:  "(" NAME ")"
pname:  "(" NAME ")"

NAME:   /\ew+/
%ignore /\es+/
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Lark will parse "(Hello) (World)" as:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
start
    (Hello)
    pname World
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Rules prefixed with \fB!\fP will retain all their literals regardless.
.sp
\fBExample:\fP
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
    expr: "(" expr ")"
        | NAME+

    NAME: /\ew+/

    %ignore " "
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Lark will parse "((hello world))" as:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
expr
    expr
        expr
            "hello"
            "world"
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The brackets do not appear in the tree by design. The words appear because they are matched by a named terminal.
.SS Shaping the tree
.sp
Users can alter the automatic construction of the tree using a collection of grammar features.
.INDENT 0.0
.IP \(bu 2
Rules whose name begins with an underscore will be inlined into their containing rule.
.UNINDENT
.sp
\fBExample:\fP
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
    start: "(" _greet ")"
    _greet: /\ew+/ /\ew+/
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Lark will parse "(hello world)" as:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
start
    "hello"
    "world"
.ft P
.fi
.UNINDENT
.UNINDENT
.INDENT 0.0
.IP \(bu 2
Rules that receive a question mark (?) at the beginning of their definition, will be inlined if they have a single child, after filtering.
.UNINDENT
.sp
\fBExample:\fP
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
    start: greet greet
    ?greet: "(" /\ew+/ ")"
          | /\ew+/ /\ew+/
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Lark will parse "hello world (planet)" as:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
start
    greet
        "hello"
        "world"
    "planet"
.ft P
.fi
.UNINDENT
.UNINDENT
.INDENT 0.0
.IP \(bu 2
Rules that begin with an exclamation mark will keep all their terminals (they won\(aqt get filtered).
.UNINDENT
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
    !expr: "(" expr ")"
         | NAME+
    NAME: /\ew+/
    %ignore " "
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Will parse "((hello world))" as:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
expr
  (
  expr
    (
    expr
      hello
      world
    )
  )
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Using the \fB!\fP prefix is usually a "code smell", and may point to a flaw in your grammar design.
.INDENT 0.0
.IP \(bu 2
Aliases \- options in a rule can receive an alias. It will be then used as the branch name for the option, instead of the rule name.
.UNINDENT
.sp
\fBExample:\fP
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
    start: greet greet
    greet: "hello"
         | "world" \-> planet
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
Lark will parse "hello world" as:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
start
    greet
    planet
.ft P
.fi
.UNINDENT
.UNINDENT
.SH API REFERENCE
.SS Lark
.INDENT 0.0
.TP
.B class lark.Lark(grammar, **options)
Main interface for the library.
.sp
It\(aqs mostly a thin wrapper for the many different parsers, and for the tree constructor.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBgrammar\fP \-\- a string or file\-object containing the grammar spec (using Lark\(aqs ebnf syntax)
.IP \(bu 2
\fBoptions\fP \-\- a dictionary controlling various aspects of Lark.
.UNINDENT
.UNINDENT
.sp
Example
.sp
.nf
.ft C
>>> Lark(r\(aq\(aq\(aqstart: "foo" \(aq\(aq\(aq)
Lark(...)
.ft P
.fi
.sp
\fB===  General Options  ===\fP
.INDENT 7.0
.TP
.B start
The start symbol. Either a string, or a list of strings for multiple possible starts (Default: "start")
.TP
.B debug
Display debug information, such as warnings (default: False)
.TP
.B transformer
Applies the transformer to every parse tree (equivlent to applying it after the parse, but faster)
.TP
.B propagate_positions
Propagates (line, column, end_line, end_column) attributes into all tree branches.
.TP
.B maybe_placeholders
When True, the \fB[]\fP operator returns \fBNone\fP when not matched.
.sp
When \fBFalse\fP,  \fB[]\fP behaves like the \fB?\fP operator, and returns no value at all.
(default= \fBFalse\fP\&. Recommended to set to \fBTrue\fP)
.TP
.B cache
Cache the results of the Lark grammar analysis, for x2 to x3 faster loading. LALR only for now.
.INDENT 7.0
.IP \(bu 2
When \fBFalse\fP, does nothing (default)
.IP \(bu 2
When \fBTrue\fP, caches to a temporary file in the local directory
.IP \(bu 2
When given a string, caches to the path pointed by the string
.UNINDENT
.TP
.B regex
When True, uses the \fBregex\fP module instead of the stdlib \fBre\fP\&.
.TP
.B g_regex_flags
Flags that are applied to all terminals (both regex and strings)
.TP
.B keep_all_tokens
Prevent the tree builder from automagically removing "punctuation" tokens (default: False)
.TP
.B tree_class
Lark will produce trees comprised of instances of this class instead of the default \fBlark.Tree\fP\&.
.UNINDENT
.sp
\fB=== Algorithm Options ===\fP
.INDENT 7.0
.TP
.B parser
Decides which parser engine to use. Accepts "earley" or "lalr". (Default: "earley").
(there is also a "cyk" option for legacy)
.TP
.B lexer
Decides whether or not to use a lexer stage
.INDENT 7.0
.IP \(bu 2
"auto" (default): Choose for me based on the parser
.IP \(bu 2
"standard": Use a standard lexer
.IP \(bu 2
"contextual": Stronger lexer (only works with parser="lalr")
.IP \(bu 2
"dynamic": Flexible and powerful (only with parser="earley")
.IP \(bu 2
"dynamic_complete": Same as dynamic, but tries \fIevery\fP variation of tokenizing possible.
.UNINDENT
.TP
.B ambiguity
Decides how to handle ambiguity in the parse. Only relevant if parser="earley"
.INDENT 7.0
.IP \(bu 2
"resolve": The parser will automatically choose the simplest derivation
(it chooses consistently: greedy for tokens, non\-greedy for rules)
.IP \(bu 2
"explicit": The parser will return all derivations wrapped in "_ambig" tree nodes (i.e. a forest).
.IP \(bu 2
"forest": The parser will return the root of the shared packed parse forest.
.UNINDENT
.UNINDENT
.sp
\fB=== Misc. / Domain Specific Options ===\fP
.INDENT 7.0
.TP
.B postlex
Lexer post\-processing (Default: None) Only works with the standard and contextual lexers.
.TP
.B priority
How priorities should be evaluated \- auto, none, normal, invert (Default: auto)
.TP
.B lexer_callbacks
Dictionary of callbacks for the lexer. May alter tokens during lexing. Use with caution.
.TP
.B use_bytes
Accept an input of type \fBbytes\fP instead of \fBstr\fP (Python 3 only).
.TP
.B edit_terminals
A callback for editing the terminals before parse.
.UNINDENT
.sp
\fB=== End Options ===\fP
.INDENT 7.0
.TP
.B save(f)
Saves the instance into the given file object
.sp
Useful for caching and multiprocessing.
.UNINDENT
.INDENT 7.0
.TP
.B classmethod load(f)
Loads an instance from the given file object
.sp
Useful for caching and multiprocessing.
.UNINDENT
.INDENT 7.0
.TP
.B classmethod open(grammar_filename, rel_to=None, **options)
Create an instance of Lark with the grammar given by its filename
.sp
If \fBrel_to\fP is provided, the function will find the grammar filename in relation to it.
.sp
Example
.sp
.nf
.ft C
>>> Lark.open("grammar_file.lark", rel_to=__file__, parser="lalr")
Lark(...)
.ft P
.fi
.UNINDENT
.INDENT 7.0
.TP
.B parse(text, start=None, on_error=None)
Parse the given text, according to the options provided.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBtext\fP (\fIstr\fP) \-\- Text to be parsed.
.IP \(bu 2
\fBstart\fP (\fIstr\fP\fI, \fP\fIoptional\fP) \-\- Required if Lark was given multiple possible start symbols (using the start option).
.IP \(bu 2
\fBon_error\fP (\fIfunction\fP\fI, \fP\fIoptional\fP) \-\- if provided, will be called on UnexpectedToken error. Return true to resume parsing.
LALR only. See examples/error_puppet.py for an example of how to use on_error.
.UNINDENT
.TP
.B Returns
If a transformer is supplied to \fB__init__\fP, returns whatever is the
result of the transformation. Otherwise, returns a Tree instance.
.UNINDENT
.UNINDENT
.UNINDENT
.SS Using Unicode character classes with \fBregex\fP
.sp
Python\(aqs builtin \fBre\fP module has a few persistent known bugs and also won\(aqt parse
advanced regex features such as character classes.
With \fBpip install lark\-parser[regex]\fP, the \fBregex\fP module will be
installed alongside lark and can act as a drop\-in replacement to \fBre\fP\&.
.sp
Any instance of Lark instantiated with \fBregex=True\fP will use the \fBregex\fP module instead of \fBre\fP\&.
.sp
For example, we can use character classes to match PEP\-3131 compliant Python identifiers:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
from lark import Lark
>>> g = Lark(r"""
                    ?start: NAME
                    NAME: ID_START ID_CONTINUE*
                    ID_START: /[\ep{Lu}\ep{Ll}\ep{Lt}\ep{Lm}\ep{Lo}\ep{Nl}_]+/
                    ID_CONTINUE: ID_START | /[\ep{Mn}\ep{Mc}\ep{Nd}\ep{Pc}·]+/
                """, regex=True)

>>> g.parse(\(aqவணக்கம்\(aq)
\(aqவணக்கம்\(aq
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Tree
.INDENT 0.0
.TP
.B class lark.Tree(data, children, meta=None)
The main tree class.
.sp
Creates a new tree, and stores "data" and "children" in attributes of the same name.
Trees can be hashed and compared.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBdata\fP \-\- The name of the rule or alias
.IP \(bu 2
\fBchildren\fP \-\- List of matched sub\-rules and terminals
.IP \(bu 2
\fBmeta\fP \-\- Line & Column numbers (if \fBpropagate_positions\fP is enabled).
meta attributes: line, column, start_pos, end_line, end_column, end_pos
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B pretty(indent_str=\(aq  \(aq)
Returns an indented string representation of the tree.
.sp
Great for debugging.
.UNINDENT
.INDENT 7.0
.TP
.B iter_subtrees()
Depth\-first iteration.
.sp
Iterates over all the subtrees, never returning to the same node twice (Lark\(aqs parse\-tree is actually a DAG).
.UNINDENT
.INDENT 7.0
.TP
.B find_pred(pred)
Returns all nodes of the tree that evaluate pred(node) as true.
.UNINDENT
.INDENT 7.0
.TP
.B find_data(data)
Returns all nodes of the tree whose data equals the given data.
.UNINDENT
.INDENT 7.0
.TP
.B iter_subtrees_topdown()
Breadth\-first iteration.
.sp
Iterates over all the subtrees, return nodes in order like pretty() does.
.UNINDENT
.UNINDENT
.SS Token
.INDENT 0.0
.TP
.B class lark.Token(type_, value, pos_in_stream=None, line=None, column=None, end_line=None, end_column=None, end_pos=None)
A string with meta\-information, that is produced by the lexer.
.sp
When parsing text, the resulting chunks of the input that haven\(aqt been discarded,
will end up in the tree as Token instances. The Token class inherits from Python\(aqs \fBstr\fP,
so normal string comparisons and operations will work as expected.
.INDENT 7.0
.TP
.B type
Name of the token (as specified in grammar)
.UNINDENT
.INDENT 7.0
.TP
.B value
Value of the token (redundant, as \fBtoken.value == token\fP will always be true)
.UNINDENT
.INDENT 7.0
.TP
.B pos_in_stream
The index of the token in the text
.UNINDENT
.INDENT 7.0
.TP
.B line
The line of the token in the text (starting with 1)
.UNINDENT
.INDENT 7.0
.TP
.B column
The column of the token in the text (starting with 1)
.UNINDENT
.INDENT 7.0
.TP
.B end_line
The line where the token ends
.UNINDENT
.INDENT 7.0
.TP
.B end_column
The next column after the end of the token. For example,
if the token is a single character with a column value of 4,
end_column will be 5.
.UNINDENT
.INDENT 7.0
.TP
.B end_pos
the index where the token ends (basically \fBpos_in_stream + len(token)\fP)
.UNINDENT
.UNINDENT
.SS Transformer, Visitor & Interpreter
.sp
See visitors\&.
.SS ForestVisitor, ForestTransformer, & TreeForestTransformer
.sp
See forest\&.
.SS UnexpectedInput
.INDENT 0.0
.TP
.B class lark.exceptions.UnexpectedInput
UnexpectedInput Error.
.sp
Used as a base class for the following exceptions:
.INDENT 7.0
.IP \(bu 2
\fBUnexpectedToken\fP: The parser received an unexpected token
.IP \(bu 2
\fBUnexpectedCharacters\fP: The lexer encountered an unexpected string
.UNINDENT
.sp
After catching one of these exceptions, you may call the following helper methods to create a nicer error message.
.INDENT 7.0
.TP
.B get_context(text, span=40)
Returns a pretty string pinpointing the error in the text,
with span amount of context characters around it.
.sp
\fBNOTE:\fP
.INDENT 7.0
.INDENT 3.5
The parser doesn\(aqt hold a copy of the text it has to parse,
so you have to provide it again
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B match_examples(parse_fn, examples, token_type_match_fallback=False, use_accepts=False)
Allows you to detect what\(aqs wrong in the input text by matching
against example errors.
.sp
Given a parser instance and a dictionary mapping some label with
some malformed syntax examples, it\(aqll return the label for the
example that bests matches the current error. The function will
iterate the dictionary until it finds a matching error, and
return the corresponding value.
.sp
For an example usage, see \fIexamples/error_reporting_lalr.py\fP
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBparse_fn\fP \-\- parse function (usually \fBlark_instance.parse\fP)
.IP \(bu 2
\fBexamples\fP \-\- dictionary of \fB{\(aqexample_string\(aq: value}\fP\&.
.IP \(bu 2
\fBuse_accepts\fP \-\- Recommended to call this with \fBuse_accepts=True\fP\&.
The default is \fBFalse\fP for backwards compatibility.
.UNINDENT
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B class lark.exceptions.UnexpectedToken(token, expected, considered_rules=None, state=None, puppet=None)
When the parser throws UnexpectedToken, it instantiates a puppet
with its internal state. Users can then interactively set the puppet to
the desired puppet state, and resume regular parsing.
.sp
see: \fI\%ParserPuppet\fP\&.
.UNINDENT
.INDENT 0.0
.TP
.B class lark.exceptions.UnexpectedCharacters(seq, lex_pos, line, column, allowed=None, considered_tokens=None, state=None, token_history=None)
.UNINDENT
.SS ParserPuppet
.INDENT 0.0
.TP
.B class lark.parsers.lalr_puppet.ParserPuppet(parser, state_stack, value_stack, start, stream, set_state)
ParserPuppet gives you advanced control over error handling when parsing with LALR.
.sp
For a simpler, more streamlined interface, see the \fBon_error\fP argument to \fBLark.parse()\fP\&.
.INDENT 7.0
.TP
.B feed_token(token)
Feed the parser with a token, and advance it to the next state, as if it received it from the lexer.
.sp
Note that \fBtoken\fP has to be an instance of \fBToken\fP\&.
.UNINDENT
.INDENT 7.0
.TP
.B copy()
Create a new puppet with a separate state.
.sp
Calls to feed_token() won\(aqt affect the old puppet, and vice\-versa.
.UNINDENT
.INDENT 7.0
.TP
.B pretty()
Print the output of \fBchoices()\fP in a way that\(aqs easier to read.
.UNINDENT
.INDENT 7.0
.TP
.B choices()
Returns a dictionary of token types, matched to their action in the parser.
.sp
Only returns token types that are accepted by the current state.
.sp
Updated by \fBfeed_token()\fP\&.
.UNINDENT
.INDENT 7.0
.TP
.B resume_parse()
Resume parsing from the current puppet state.
.UNINDENT
.UNINDENT
.SH TRANSFORMERS & VISITORS
.sp
Transformers & Visitors provide a convenient interface to process the
parse\-trees that Lark returns.
.sp
They are used by inheriting from the correct class (visitor or transformer),
and implementing methods corresponding to the rule you wish to process. Each
method accepts the children as an argument. That can be modified using the
\fBv_args\fP decorator, which allows one to inline the arguments (akin to \fB*args\fP),
or add the tree \fBmeta\fP property as an argument.
.sp
See: \fI\%visitors.py\fP
.SS Visitor
.sp
Visitors visit each node of the tree, and run the appropriate method on it according to the node\(aqs data.
.sp
They work bottom\-up, starting with the leaves and ending at the root of the tree.
.sp
There are two classes that implement the visitor interface:
.INDENT 0.0
.IP \(bu 2
\fBVisitor\fP: Visit every node (without recursion)
.IP \(bu 2
\fBVisitor_Recursive\fP: Visit every node using recursion. Slightly faster.
.UNINDENT
.INDENT 0.0
.TP
.B Example:
.INDENT 7.0
.INDENT 3.5
.sp
.nf
.ft C
class IncreaseAllNumbers(Visitor):
def number(self, tree):
    assert tree.data == "number"
    tree.children[0] += 1

IncreaseAllNumbers().visit(parse_tree)
.ft P
.fi
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B class lark.visitors.Visitor
Bottom\-up visitor, non\-recursive.
.sp
Visits the tree, starting with the leaves and finally the root (bottom\-up)
Calls its methods (provided by user via inheritance) according to \fBtree.data\fP
.UNINDENT
.INDENT 0.0
.TP
.B class lark.visitors.Visitor_Recursive
Bottom\-up visitor, recursive.
.sp
Visits the tree, starting with the leaves and finally the root (bottom\-up)
Calls its methods (provided by user via inheritance) according to \fBtree.data\fP
.UNINDENT
.SS Interpreter
.INDENT 0.0
.TP
.B class lark.visitors.Interpreter
Interpreter walks the tree starting at the root.
.sp
Visits the tree, starting with the root and finally the leaves (top\-down)
.sp
For each tree node, it calls its methods (provided by user via inheritance) according to \fBtree.data\fP\&.
.sp
Unlike \fBTransformer\fP and \fBVisitor\fP, the Interpreter doesn\(aqt automatically visit its sub\-branches.
The user has to explicitly call \fBvisit\fP, \fBvisit_children\fP, or use the \fB@visit_children_decor\fP\&.
This allows the user to implement branching and loops.
.UNINDENT
.INDENT 0.0
.TP
.B Example:
.INDENT 7.0
.INDENT 3.5
.sp
.nf
.ft C
class IncreaseSomeOfTheNumbers(Interpreter):
    def number(self, tree):
        tree.children[0] += 1

    def skip(self, tree):
        # skip this subtree. don\(aqt change any number node inside it.
        pass

    IncreaseSomeOfTheNumbers().visit(parse_tree)
.ft P
.fi
.UNINDENT
.UNINDENT
.UNINDENT
.SS Transformer
.INDENT 0.0
.TP
.B class lark.visitors.Transformer(visit_tokens=True)
Transformers visit each node of the tree, and run the appropriate method on it according to the node\(aqs data.
.sp
Calls its methods (provided by user via inheritance) according to \fBtree.data\fP\&.
The returned value replaces the old one in the structure.
.sp
They work bottom\-up (or depth\-first), starting with the leaves and ending at the root of the tree.
Transformers can be used to implement map & reduce patterns. Because nodes are reduced from leaf to root,
at any point the callbacks may assume the children have already been transformed (if applicable).
.sp
\fBTransformer\fP can do anything \fBVisitor\fP can do, but because it reconstructs the tree,
it is slightly less efficient. It can be used to implement map or reduce patterns.
.sp
All these classes implement the transformer interface:
.INDENT 7.0
.IP \(bu 2
\fBTransformer\fP \- Recursively transforms the tree. This is the one you probably want.
.IP \(bu 2
\fBTransformer_InPlace\fP \- Non\-recursive. Changes the tree in\-place instead of returning new instances
.IP \(bu 2
\fBTransformer_InPlaceRecursive\fP \- Recursive. Changes the tree in\-place instead of returning new instances
.UNINDENT
.INDENT 7.0
.TP
.B Parameters
\fBvisit_tokens\fP \-\- By default, transformers only visit rules.
visit_tokens=True will tell \fBTransformer\fP to visit tokens
as well. This is a slightly slower alternative to lexer_callbacks
but it\(aqs easier to maintain and works for all algorithms
(even when there isn\(aqt a lexer).
.UNINDENT
.INDENT 7.0
.TP
.B __default__(data, children, meta)
Default operation on tree (for override)
.sp
Function that is called on if a function with a corresponding name has not been found.
Defaults to reconstruct the Tree.
.UNINDENT
.INDENT 7.0
.TP
.B __default_token__(token)
Default operation on token (for override)
.sp
Function that is called on if a function with a corresponding name has not been found.
Defaults to just return the argument.
.UNINDENT
.UNINDENT
.INDENT 0.0
.TP
.B Example:
.INDENT 7.0
.INDENT 3.5
.sp
.nf
.ft C
from lark import Tree, Transformer

class EvalExpressions(Transformer):
    def expr(self, args):
            return eval(args[0])

t = Tree(\(aqa\(aq, [Tree(\(aqexpr\(aq, [\(aq1+2\(aq])])
print(EvalExpressions().transform( t ))

# Prints: Tree(a, [3])
.ft P
.fi
.UNINDENT
.UNINDENT
.TP
.B Example:
.INDENT 7.0
.INDENT 3.5
.sp
.nf
.ft C
class T(Transformer):
    INT = int
    NUMBER = float
    def NAME(self, name):
        return lookup_dict.get(name, name)

T(visit_tokens=True).transform(tree)
.ft P
.fi
.UNINDENT
.UNINDENT
.UNINDENT
.SS v_args
.INDENT 0.0
.TP
.B lark.visitors.v_args(inline=False, meta=False, tree=False, wrapper=None)
A convenience decorator factory for modifying the behavior of user\-supplied visitor methods.
.sp
By default, callback methods of transformers/visitors accept one argument \- a list of the node\(aqs children.
.sp
\fBv_args\fP can modify this behavior. When used on a transformer/visitor class definition,
it applies to all the callback methods inside it.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBinline\fP \-\- Children are provided as \fB*args\fP instead of a list argument (not recommended for very long lists).
.IP \(bu 2
\fBmeta\fP \-\- Provides two arguments: \fBchildren\fP and \fBmeta\fP (instead of just the first)
.IP \(bu 2
\fBtree\fP \-\- Provides the entire tree as the argument, instead of the children.
.UNINDENT
.UNINDENT
.sp
Example
.INDENT 7.0
.INDENT 3.5
.sp
.nf
.ft C
@v_args(inline=True)
class SolveArith(Transformer):
    def add(self, left, right):
        return left + right


class ReverseNotation(Transformer_InPlace):
    @v_args(tree=True)
    def tree_node(self, tree):
        tree.children = tree.children[::\-1]
.ft P
.fi
.UNINDENT
.UNINDENT
.UNINDENT
.SS Discard
.INDENT 0.0
.TP
.B class lark.visitors.Discard
When raising the Discard exception in a transformer callback,
that node is discarded and won\(aqt appear in the parent.
.UNINDENT
.SH WORKING WITH THE SPPF
.sp
When parsing with Earley, Lark provides the \fBambiguity=\(aqforest\(aq\fP option
to obtain the shared packed parse forest (SPPF) produced by the parser as
an alternative to it being automatically converted to a tree.
.sp
Lark provides a few tools to facilitate working with the SPPF. Here are some
things to consider when deciding whether or not to use the SPPF.
.sp
\fBPros\fP
.INDENT 0.0
.IP \(bu 2
Efficient storage of highly ambiguous parses
.IP \(bu 2
Precise handling of ambiguities
.IP \(bu 2
Custom rule prioritizers
.IP \(bu 2
Ability to handle infinite ambiguities
.IP \(bu 2
Directly transform forest \-> object instead of forest \-> tree \-> object
.UNINDENT
.sp
\fBCons\fP
.INDENT 0.0
.IP \(bu 2
More complex than working with a tree
.IP \(bu 2
SPPF may contain nodes corresponding to rules generated internally
.IP \(bu 2
Loss of Lark grammar features:
.INDENT 2.0
.IP \(bu 2
Rules starting with \(aq_\(aq are not inlined in the SPPF
.IP \(bu 2
Rules starting with \(aq?\(aq are never inlined in the SPPF
.IP \(bu 2
All tokens will appear in the SPPF
.UNINDENT
.UNINDENT
.SS SymbolNode
.INDENT 0.0
.TP
.B class lark.parsers.earley_forest.SymbolNode(s, start, end)
A Symbol Node represents a symbol (or Intermediate LR0).
.sp
Symbol nodes are keyed by the symbol (s). For intermediate nodes
s will be an LR0, stored as a tuple of (rule, ptr). For completed symbol
nodes, s will be a string representing the non\-terminal origin (i.e.
the left hand side of the rule).
.sp
The children of a Symbol or Intermediate Node will always be Packed Nodes;
with each Packed Node child representing a single derivation of a production.
.sp
Hence a Symbol Node with a single child is unambiguous.
.INDENT 7.0
.TP
.B Variables
.INDENT 7.0
.IP \(bu 2
\fBs\fP \-\- A Symbol, or a tuple of (rule, ptr) for an intermediate node.
.IP \(bu 2
\fBstart\fP \-\- The index of the start of the substring matched by this
symbol (inclusive).
.IP \(bu 2
\fBend\fP \-\- The index of the end of the substring matched by this
symbol (exclusive).
.IP \(bu 2
\fBis_intermediate\fP \-\- True if this node is an intermediate node.
.IP \(bu 2
\fBpriority\fP \-\- The priority of the node\(aqs symbol.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B property is_ambiguous
Returns True if this node is ambiguous.
.UNINDENT
.INDENT 7.0
.TP
.B property children
Returns a list of this node\(aqs children sorted from greatest to
least priority.
.UNINDENT
.UNINDENT
.SS PackedNode
.INDENT 0.0
.TP
.B class lark.parsers.earley_forest.PackedNode(parent, s, rule, start, left, right)
A Packed Node represents a single derivation in a symbol node.
.INDENT 7.0
.TP
.B Variables
.INDENT 7.0
.IP \(bu 2
\fBrule\fP \-\- The rule associated with this node.
.IP \(bu 2
\fBparent\fP \-\- The parent of this node.
.IP \(bu 2
\fBleft\fP \-\- The left child of this node. \fBNone\fP if one does not exist.
.IP \(bu 2
\fBright\fP \-\- The right child of this node. \fBNone\fP if one does not exist.
.IP \(bu 2
\fBpriority\fP \-\- The priority of this node.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B property children
Returns a list of this node\(aqs children.
.UNINDENT
.UNINDENT
.SS ForestVisitor
.INDENT 0.0
.TP
.B class lark.parsers.earley_forest.ForestVisitor
An abstract base class for building forest visitors.
.sp
This class performs a controllable depth\-first walk of an SPPF.
The visitor will not enter cycles and will backtrack if one is encountered.
Subclasses are notified of cycles through the \fBon_cycle\fP method.
.sp
Behavior for visit events is defined by overriding the
\fBvisit*node*\fP functions.
.sp
The walk is controlled by the return values of the \fBvisit*node_in\fP
methods. Returning a node(s) will schedule them to be visited. The visitor
will begin to backtrack if no nodes are returned.
.INDENT 7.0
.TP
.B visit_token_node(node)
Called when a \fBToken\fP is visited. \fBToken\fP nodes are always leaves.
.UNINDENT
.INDENT 7.0
.TP
.B visit_symbol_node_in(node)
Called when a symbol node is visited. Nodes that are returned
will be scheduled to be visited. If \fBvisit_intermediate_node_in\fP
is not implemented, this function will be called for intermediate
nodes as well.
.UNINDENT
.INDENT 7.0
.TP
.B visit_symbol_node_out(node)
Called after all nodes returned from a corresponding \fBvisit_symbol_node_in\fP
call have been visited. If \fBvisit_intermediate_node_out\fP
is not implemented, this function will be called for intermediate
nodes as well.
.UNINDENT
.INDENT 7.0
.TP
.B visit_packed_node_in(node)
Called when a packed node is visited. Nodes that are returned
will be scheduled to be visited.
.UNINDENT
.INDENT 7.0
.TP
.B visit_packed_node_out(node)
Called after all nodes returned from a corresponding \fBvisit_packed_node_in\fP
call have been visited.
.UNINDENT
.INDENT 7.0
.TP
.B on_cycle(node, path)
Called when a cycle is encountered.
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBnode\fP \-\- The node that causes a cycle.
.IP \(bu 2
\fBpath\fP \-\- The list of nodes being visited: nodes that have been
entered but not exited. The first element is the root in a forest
visit, and the last element is the node visited most recently.
\fBpath\fP should be treated as read\-only.
.UNINDENT
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B get_cycle_in_path(node, path)
A utility function for use in \fBon_cycle\fP to obtain a slice of
\fBpath\fP that only contains the nodes that make up the cycle.
.UNINDENT
.UNINDENT
.SS ForestTransformer
.INDENT 0.0
.TP
.B class lark.parsers.earley_forest.ForestTransformer
The base class for a bottom\-up forest transformation. Most users will
want to use \fBTreeForestTransformer\fP instead as it has a friendlier
interface and covers most use cases.
.sp
Transformations are applied via inheritance and overriding of the
\fBtransform*node\fP methods.
.sp
\fBtransform_token_node\fP receives a \fBToken\fP as an argument.
All other methods receive the node that is being transformed and
a list of the results of the transformations of that node\(aqs children.
The return value of these methods are the resulting transformations.
.sp
If \fBDiscard\fP is raised in a node\(aqs transformation, no data from that node
will be passed to its parent\(aqs transformation.
.INDENT 7.0
.TP
.B transform(root)
Perform a transformation on an SPPF.
.UNINDENT
.INDENT 7.0
.TP
.B transform_symbol_node(node, data)
Transform a symbol node.
.UNINDENT
.INDENT 7.0
.TP
.B transform_intermediate_node(node, data)
Transform an intermediate node.
.UNINDENT
.INDENT 7.0
.TP
.B transform_packed_node(node, data)
Transform a packed node.
.UNINDENT
.INDENT 7.0
.TP
.B transform_token_node(node)
Transform a \fBToken\fP\&.
.UNINDENT
.UNINDENT
.SS TreeForestTransformer
.INDENT 0.0
.TP
.B class lark.parsers.earley_forest.TreeForestTransformer(tree_class=<class \(aqlark.tree.Tree\(aq>, prioritizer=<lark.parsers.earley_forest.ForestSumVisitor object>, resolve_ambiguity=True)
A \fBForestTransformer\fP with a tree \fBTransformer\fP\-like interface.
By default, it will construct a tree.
.sp
Methods provided via inheritance are called based on the rule/symbol
names of nodes in the forest.
.sp
Methods that act on rules will receive a list of the results of the
transformations of the rule\(aqs children. By default, trees and tokens.
.sp
Methods that act on tokens will receive a token.
.sp
Alternatively, methods that act on rules may be annotated with
\fBhandles_ambiguity\fP\&. In this case, the function will receive a list
of all the transformations of all the derivations of the rule.
By default, a list of trees where each tree.data is equal to the
rule name or one of its aliases.
.sp
Non\-tree transformations are made possible by override of
\fB__default__\fP, \fB__default_token__\fP, and \fB__default_ambig__\fP\&.
.sp
\fBNOTE:\fP
.INDENT 7.0
.INDENT 3.5
Tree shaping features such as inlined rules and token filtering are
not built into the transformation. Positions are also not
propagated.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B Parameters
.INDENT 7.0
.IP \(bu 2
\fBtree_class\fP \-\- The tree class to use for construction
.IP \(bu 2
\fBprioritizer\fP \-\- A \fBForestVisitor\fP that manipulates the priorities of
nodes in the SPPF.
.IP \(bu 2
\fBresolve_ambiguity\fP \-\- If True, ambiguities will be resolved based on
priorities.
.UNINDENT
.UNINDENT
.INDENT 7.0
.TP
.B __default__(name, data)
Default operation on tree (for override).
.sp
Returns a tree with name with data as children.
.UNINDENT
.INDENT 7.0
.TP
.B __default_ambig__(name, data)
Default operation on ambiguous rule (for override).
.sp
Wraps data in an \(aq_ambig_\(aq node if it contains more than
one element.
.UNINDENT
.INDENT 7.0
.TP
.B __default_token__(node)
Default operation on \fBToken\fP (for override).
.sp
Returns \fBnode\fP\&.
.UNINDENT
.UNINDENT
.SS handles_ambiguity
.INDENT 0.0
.TP
.B lark.parsers.earley_forest.handles_ambiguity(func)
Decorator for methods of subclasses of \fBTreeForestTransformer\fP\&.
Denotes that the method should receive a list of transformed derivations.
.UNINDENT
.SH IMPORTING GRAMMARS FROM NEARLEY
.sp
Lark comes with a tool to convert grammars from \fI\%Nearley\fP, a popular Earley library for Javascript. It uses \fI\%Js2Py\fP to convert and run the Javascript postprocessing code segments.
.SS Requirements
.INDENT 0.0
.IP \(bu 2
Install Lark with the \fBnearley\fP component:
.UNINDENT
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
pip install lark\-parser[nearley]
.ft P
.fi
.UNINDENT
.UNINDENT
.INDENT 0.0
.IP \(bu 2
Acquire a copy of the nearley codebase. This can be done using:
.UNINDENT
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
git clone https://github.com/Hardmath123/nearley
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Usage
.sp
Here\(aqs an example of how to import nearley\(aqs calculator example into Lark:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
git clone https://github.com/Hardmath123/nearley
python \-m lark.tools.nearley nearley/examples/calculator/arithmetic.ne main nearley > ncalc.py
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
You can use the output as a regular python module:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
>>> import ncalc
>>> ncalc.parse(\(aqsin(pi/4) ^ e\(aq)
0.38981434460254655
.ft P
.fi
.UNINDENT
.UNINDENT
.sp
The Nearley converter also supports an experimental converter for newer JavaScript (ES6+), using the \fB\-\-es6\fP flag:
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
git clone https://github.com/Hardmath123/nearley
python \-m lark.tools.nearley nearley/examples/calculator/arithmetic.ne main nearley \-\-es6 > ncalc.py
.ft P
.fi
.UNINDENT
.UNINDENT
.SS Notes
.INDENT 0.0
.IP \(bu 2
Lark currently cannot import templates from Nearley
.IP \(bu 2
Lark currently cannot export grammars to Nearley
.UNINDENT
.sp
These might get added in the future, if enough users ask for them.
.sp
Lark is a modern parsing library for Python. Lark can parse any context\-free grammar.
.sp
Lark provides:
.INDENT 0.0
.IP \(bu 2
Advanced grammar language, based on EBNF
.IP \(bu 2
Three parsing algorithms to choose from: Earley, LALR(1) and CYK
.IP \(bu 2
Automatic tree construction, inferred from your grammar
.IP \(bu 2
Fast unicode lexer with regexp support, and automatic line\-counting
.UNINDENT
.SH INSTALL LARK
.INDENT 0.0
.INDENT 3.5
.sp
.nf
.ft C
$ pip install lark\-parser
.ft P
.fi
.UNINDENT
.UNINDENT
.SH SYNTAX HIGHLIGHTING
.INDENT 0.0
.IP \(bu 2
\fI\%Sublime Text & TextMate\fP
.IP \(bu 2
\fI\%Visual Studio Code\fP (Or install through the vscode plugin system)
.IP \(bu 2
\fI\%Intellij & PyCharm\fP
.IP \(bu 2
\fI\%Vim\fP
.UNINDENT
.SH RESOURCES
.INDENT 0.0
.IP \(bu 2
philosophy
.IP \(bu 2
features
.IP \(bu 2
\fI\%Examples\fP
.IP \(bu 2
\fI\%Online IDE\fP
.IP \(bu 2
Tutorials
.INDENT 2.0
.IP \(bu 2
\fI\%How to write a DSL\fP \- Implements a toy LOGO\-like language with
an interpreter
.IP \(bu 2
json_tutorial \- Teaches you how to use Lark
.IP \(bu 2
Unofficial
.INDENT 2.0
.IP \(bu 2
\fI\%Program Synthesis is Possible\fP \- Creates a DSL for Z3
.UNINDENT
.UNINDENT
.IP \(bu 2
Guides
.INDENT 2.0
.IP \(bu 2
how_to_use
.IP \(bu 2
how_to_develop
.UNINDENT
.IP \(bu 2
Reference
.INDENT 2.0
.IP \(bu 2
grammar
.IP \(bu 2
tree_construction
.IP \(bu 2
visitors
.IP \(bu 2
forest
.IP \(bu 2
classes
.IP \(bu 2
nearley
.IP \(bu 2
\fI\%Cheatsheet (PDF)\fP
.UNINDENT
.IP \(bu 2
Discussion
.INDENT 2.0
.IP \(bu 2
\fI\%Gitter\fP
.IP \(bu 2
\fI\%Forum (Google Groups)\fP
.UNINDENT
.UNINDENT
.SH AUTHOR
Erez Shinan
.SH COPYRIGHT
2020, Erez Shinan
.\" Generated by docutils manpage writer.
.