Interpreter in Go - 2
Writing an Interpreter In Go by Thorsten Ball will be my personal introduction to writing an interpreter. I’ve never taken a comp sci class before, so I know nothing about compilers. On a lark, I decided to explore this area now, nearly 20 years after I started to learn computer programming.
If you are interested in this book as well, you might might the AST Explorer a useful companion.
I was told as some point in the past, that compilation can be broken down into four stages:
- Lexical analysis
- Syntactic analysis
- Type checking
- Code generation
An interpeter is not very different. Naturally, Ball starts with the lexical analysis and talks about different design decisions in lexers, for example, treatment of whitespaces.
In other languages, like Python, the length of whitespace is significant. That means the lexer can’t just “eat up” the whitespace and newline characters. It has to output the whitespace characters as tokens so the parser can later on make sense of them.
In Ball’s design, a token has a value and a type (similarly, Bob Nystrom says “A token is just a chunk of meaningful code with a type and a string associated with it”). Monkey has the following token types: ASSIGN, PLUS, COMMA, SEMICOLON, FUNCTION, and LET.
Ball’s lexer looks like:
type Lexer struct {
source []byte
ch byte
position int // points to ch
readPosition int // points to next ch
}
Most of the fields in Lexer are pretty self-explanatory. The ones that might cause some confusion right now are position and readPosition. Both will be used to access characters in input by using them as an index, e.g.: l.input[l.readPosition]. The reason for these two “pointers” pointing into our input string is the fact that we will need to be able to “peek” further into the input and look after the current character to see what comes up next. readPosition always points to the “next” character in the input. position points to the character in the input that corresponds to the ch byte.
I’m afraid this wasn’t a clear-enough explanation.
I was facing a chicken-and-egg problem when writing this section. Is it better to explain this algorithm in abstract terms and then show the implementation, possibly causing you to jump back and forth between pages, or to show the implementation with the explanation following, causing you to probably skip over the implementation and not getting a lot out of the explanation?
Evaluation is also where interpreter implementations (regardless of which language they’re interpreting) diverge the most. There are a lot of different strategies to choose from when evaluating source code. I’ve already hinted at this in the introduction of this book, where we took a brief look at different interpreter architectures. Now that we’re here, AST in hand, the question of what to do with it and how to evaluate this shiny tree of ours is more relevant than ever, so looking at different options again is worthwhile.
Now I’m interesting in all sorts of articles on parsers. Like this one I found on Lobsters.