Interpreter - Base structure
Compilers and interpreters are pretty complex programs.
So it makes sense to split them into smaller parts.
The base structure
The base structure of an interpreter can be split into three main parts:
- Lexer
- Parser
- Interpreter
Diagram
data:image/s3,"s3://crabby-images/4ee7c/4ee7cfbee46fc9ab6a976a403fc6b295128f7010" alt="Diagram showing the three main parts of an interpreter: lexer, parser and interpreter itself"
1. Lexer
The lexer traverses every single character from the given source file and splits it into tokens.
Each token contains a token type and the string.
The list of tokens is the result of the lexer that is passed to the parser.
<?php
if (true) {
return 42;
}
// Token list
$tokens = [
[ 'type' => 'KEYWORD', 'value' => 'if' ],
[ 'type' => 'OPERATOR_OR_PUNCTUATOR', 'value' => '(' ],
[ 'type' => 'KEYWORD', 'value' => 'true' ],
[ 'type' => 'OPERATOR_OR_PUNCTUATOR', 'value' => ')' ],
[ 'type' => 'OPERATOR_OR_PUNCTUATOR', 'value' => '{' ],
[ 'type' => 'KEYWORD', 'value' => 'return' ],
[ 'type' => 'NUMBER', 'value' => 42 ],
[ 'type' => 'OPERATOR_OR_PUNCTUATOR', 'value' => ';' ],
[ 'type' => 'OPERATOR_OR_PUNCTUATOR', 'value' => '}' ],
];
2. Parser
The parser traverses the array of tokens and creates an abstract syntax tree, short AST.
While traversing the tokens, the parser checks if the syntax is correct and creates the AST.
The AST represents the structure of the source code in a tree structure.
The root of the tree is the program itself.
The children of the root are the statements of the program.
The children of the statements are the expressions.
And so on.
The deeper a leaf/node is in the tree, the higher is its priority. This means it is processed before its parent node.
The AST is the result of the parser that is passed to the interpreter.
<?php
$tokens = [
[ 'type' => 'KEYWORD', 'value' => 'if' ],
[ 'type' => 'OPERATOR_OR_PUNCTUATOR', 'value' => '(' ],
[ 'type' => 'KEYWORD', 'value' => 'true' ],
[ 'type' => 'OPERATOR_OR_PUNCTUATOR', 'value' => ')' ],
[ 'type' => 'OPERATOR_OR_PUNCTUATOR', 'value' => '{' ],
[ 'type' => 'KEYWORD', 'value' => 'return' ],
[ 'type' => 'NUMBER', 'value' => 42 ],
[ 'type' => 'OPERATOR_OR_PUNCTUATOR', 'value' => ';' ],
[ 'type' => 'OPERATOR_OR_PUNCTUATOR', 'value' => '}' ],
];
// Check syntax of if statement
if (currToken('type') !== 'KEYWORD' && currToken('value') !== 'if') {
readToken();
if (currToken('type') !== 'OPERATOR_OR_PUNCTUATOR' || currToken('value') !== '(') {
throw new Exception('Syntax error: Expected "("');
}
$condition = parseExpression();
if (currToken('type') !== 'OPERATOR_OR_PUNCTUATOR' || currToken('value') !== ')') {
throw new Exception('Syntax error: Expected ")"');
}
$body = parseBlock();
return NewIfStatement($condition, $body);
}
// Node of the AST:
$ifStatement = [
'type' => 'IF_STATEMENT',
'condition' => [
'type' => 'BOOLEAN_EXPRESSION',
'value' => true,
],
'body' => [
'type' => 'BLOCK',
'statements' => [
[
'type' => 'RETURN_STATEMENT',
'value' => [
'type' => 'NUMBER',
'value' => 42,
],
],
],
],
];
3. Interpreter
The interpreter is the executing part of the "interpreter" program. It traverses the AST and executes the statements and expressions in the correct order. It keeps track of the variables, constants, etc. in the current scope inside of environment objects.
Sources
On my still ongoing journey to learn more about compilers and interpreters, I found/used the following sources:
1. Video playlist
Build a Custom Scripting Language In Typescript by tylerlaceby
This playlist was my first steps into interpreters. By following the videos you can create an interpreter for a simple scripting language.
Advantages:
- Quick progress
- Only a few hours to get a simple interpreter up and running
- Theory is missing
- Scripting language is very simple
2. Writing an Interpreter in Go
Writing An Interpreter In Go by Thorsten Ball
This book is a step by step guide to create an interpreter in the Go programming language for the "Monkey" programming language.
Advantages:
- More complex than the scripting language from playlist
- More theory
- Still missing structures like classes
- Still a bit light on the theory side
3. Compilers
Compiler by Jeffrey D. Ullman, Monica S. Lam, Ravi Sethi, Alfred V. Aho (German book)
This book is primary about the theory of compilers. A lot of this can be applied to interpreters as well.
Automata theory, transformation & optimizing algorithms, ...
If you want getting started with compilers, this book is not the best choice.
This book is a very deep dive into the theory of compilers, e.g. how to optimize your interpreter/compiler.
It is written in an academic style and requires a lot of time to read and understand.
Advantages:
- A lot of theory
- Academic writing style
4. Crafting Interpreters
Crafting Interpreters by Robert Nystorm
So far the best combination of theory and practice I have read.
The author has a very entertaining writing style and explains the theory in a very understandable way.
And you can read the book online for free.
Advantages:
- Good combination of theory and practice
- Details about different existing programing languages
- Implemented language is very similar to existing languages with classes, inheritance, etc.
- None so far