David's Coding Blog

Interpreter - Base structure

Compilers and interpreters are pretty complex programs.
So it makes sense to split them into smaller parts.

The base structure

The base structure of an interpreter can be split into three main parts:

Lexer
Parser
Interpreter

Diagram

1. Lexer

The lexer traverses every single character from the given source file and splits it into tokens. Each token contains a token type and the string.
The list of tokens is the result of the lexer that is passed to the parser.


<?php
if (true) {
    return 42;
}

// Token list
$tokens = [
    [ 'type' => 'KEYWORD', 'value' => 'if' ],
    [ 'type' => 'OPERATOR_OR_PUNCTUATOR', 'value' => '(' ],
    [ 'type' => 'KEYWORD', 'value' => 'true' ],
    [ 'type' => 'OPERATOR_OR_PUNCTUATOR', 'value' => ')' ],
    [ 'type' => 'OPERATOR_OR_PUNCTUATOR', 'value' => '{' ],
    [ 'type' => 'KEYWORD', 'value' => 'return' ],
    [ 'type' => 'NUMBER', 'value' => 42 ],
    [ 'type' => 'OPERATOR_OR_PUNCTUATOR', 'value' => ';' ],
    [ 'type' => 'OPERATOR_OR_PUNCTUATOR', 'value' => '}' ],
];

2. Parser

The parser traverses the array of tokens and creates an abstract syntax tree, short AST. While traversing the tokens, the parser checks if the syntax is correct and creates the AST.

The AST represents the structure of the source code in a tree structure.
The root of the tree is the program itself.
The children of the root are the statements of the program.
The children of the statements are the expressions.
And so on.

The deeper a leaf/node is in the tree, the higher is its priority. This means it is processed before its parent node.

The AST is the result of the parser that is passed to the interpreter.


<?php
$tokens = [
    [ 'type' => 'KEYWORD', 'value' => 'if' ],
    [ 'type' => 'OPERATOR_OR_PUNCTUATOR', 'value' => '(' ],
    [ 'type' => 'KEYWORD', 'value' => 'true' ],
    [ 'type' => 'OPERATOR_OR_PUNCTUATOR', 'value' => ')' ],
    [ 'type' => 'OPERATOR_OR_PUNCTUATOR', 'value' => '{' ],
    [ 'type' => 'KEYWORD', 'value' => 'return' ],
    [ 'type' => 'NUMBER', 'value' => 42 ],
    [ 'type' => 'OPERATOR_OR_PUNCTUATOR', 'value' => ';' ],
    [ 'type' => 'OPERATOR_OR_PUNCTUATOR', 'value' => '}' ],
];

// Check syntax of if statement
if (currToken('type') !== 'KEYWORD' && currToken('value') !== 'if') {
    readToken();

    if (currToken('type') !== 'OPERATOR_OR_PUNCTUATOR' || currToken('value') !== '(') {
        throw new Exception('Syntax error: Expected "("');
    }

    $condition = parseExpression();

    if (currToken('type') !== 'OPERATOR_OR_PUNCTUATOR' || currToken('value') !== ')') {
        throw new Exception('Syntax error: Expected ")"');
    }

    $body = parseBlock();

    return NewIfStatement($condition, $body);
}

// Node of the AST:
$ifStatement = [
    'type' => 'IF_STATEMENT',
    'condition' => [
        'type' => 'BOOLEAN_EXPRESSION',
        'value' => true,
    ],
    'body' => [
        'type' => 'BLOCK',
        'statements' => [
            [
                'type' => 'RETURN_STATEMENT',
                'value' => [
                    'type' => 'NUMBER',
                    'value' => 42,
                ],
            ],
        ],
    ],
];

3. Interpreter

The interpreter is the executing part of the "interpreter" program. It traverses the AST and executes the statements and expressions in the correct order. It keeps track of the variables, constants, etc. in the current scope inside of environment objects.

Sources

On my still ongoing journey to learn more about compilers and interpreters, I found/used the following sources:

1. Video playlist

Build a Custom Scripting Language In Typescript by tylerlaceby

This playlist was my first steps into interpreters. By following the videos you can create an interpreter for a simple scripting language.

Advantages:

Quick progress
Only a few hours to get a simple interpreter up and running

Disadvantages:

Theory is missing
Scripting language is very simple

2. Writing an Interpreter in Go

Writing An Interpreter In Go by Thorsten Ball

This book is a step by step guide to create an interpreter in the Go programming language for the "Monkey" programming language.

Advantages:

More complex than the scripting language from playlist
More theory

Disadvantages:

Still missing structures like classes
Still a bit light on the theory side

3. Compilers

Compiler by Jeffrey D. Ullman, Monica S. Lam, Ravi Sethi, Alfred V. Aho (German book)

This book is primary about the theory of compilers. A lot of this can be applied to interpreters as well.
Automata theory, transformation & optimizing algorithms, ...

If you want getting started with compilers, this book is not the best choice. This book is a very deep dive into the theory of compilers, e.g. how to optimize your interpreter/compiler.
It is written in an academic style and requires a lot of time to read and understand.

Advantages:

A lot of theory

Disadvantages:

Academic writing style

4. Crafting Interpreters

Crafting Interpreters by Robert Nystorm

So far the best combination of theory and practice I have read. The author has a very entertaining writing style and explains the theory in a very understandable way. And you can read the book online for free.

Advantages:

Good combination of theory and practice
Details about different existing programing languages
Implemented language is very similar to existing languages with classes, inheritance, etc.

Disadvantages:

None so far