Interpreter in Go - 2

July 29, 2021

Writing an Interpreter In Go by Thorsten Ball will be my personal introduction to writing an interpreter. I’ve never taken a comp sci class before, so I know nothing about compilers. On a lark, I decided to explore this area now, nearly 20 years after I started to learn computer programming.

If you are interested in this book as well, you might might the AST Explorer a useful companion.

I was told as some point in the past, that compilation can be broken down into four stages:

Lexical analysis
Syntactic analysis
Type checking
Code generation

An interpeter is not very different. Naturally, Ball starts with the lexical analysis and talks about different design decisions in lexers, for example, treatment of whitespaces.

In other languages, like Python, the length of whitespace is significant. That means the lexer can’t just “eat up” the whitespace and newline characters. It has to output the whitespace characters as tokens so the parser can later on make sense of them.

I naively hoped there can be a "clean" separation between the lexer and parser. But what Ball is telling me is that often the lexer and parser are closely coupled. As Sabrina Maxwell would say, "Too bad, so sad."

In Ball’s design, a token has a value and a type (similarly, Bob Nystrom says “A token is just a chunk of meaningful code with a type and a string associated with it”). Monkey has the following token types: ASSIGN, PLUS, COMMA, SEMICOLON, FUNCTION, and LET.

Ball’s lexer looks like:

type Lexer struct {
	source       []byte
	ch           byte
	position     int // points to ch
	readPosition int // points to next ch
}

Most of the fields in Lexer are pretty self-explanatory. The ones that might cause some confusion right now are position and readPosition. Both will be used to access characters in input by using them as an index, e.g.: l.input[l.readPosition]. The reason for these two “pointers” pointing into our input string is the fact that we will need to be able to “peek” further into the input and look after the current character to see what comes up next. readPosition always points to the “next” character in the input. position points to the character in the input that corresponds to the ch byte.

I’m afraid this wasn’t a clear-enough explanation.

It doesn't help that "r-e-a-d" means both the present tense, i.e., the thing that is about to be examined, as well as the past-tense, i.e., the thing that was already examined.

I was facing a chicken-and-egg problem when writing this section. Is it better to explain this algorithm in abstract terms and then show the implementation, possibly causing you to jump back and forth between pages, or to show the implementation with the explanation following, causing you to probably skip over the implementation and not getting a lot out of the explanation?

I find Ball's coverage of Pratt Parser to be lacking, but he does link to a better article later in the book. A further aside here, I've heard about Robert Nystrom's Crafting Interpreters before.

Evaluation is also where interpreter implementations (regardless of which language they’re interpreting) diverge the most. There are a lot of different strategies to choose from when evaluating source code. I’ve already hinted at this in the introduction of this book, where we took a brief look at different interpreter architectures. Now that we’re here, AST in hand, the question of what to do with it and how to evaluate this shiny tree of ours is more relevant than ever, so looking at different options again is worthwhile.

I loved this section (3.2 - Strategies of Evaluation). Ball covers a number of approches here and they are all fun to read about and I always enjoy a historical perspectives to technical decisions.

Now I’m interesting in all sorts of articles on parsers. Like this one I found on Lobsters.