Understanding Interpreters

How programming language interpreters work with scanners, parsers, and ASTs

From the user’s perspective, as long as the resulting contraption faithfully follows the language’s specification, it’s all implementation detail.

![[Pasted image 20250620223827.png]]

A scanner (or lexer) takes in the linear stream of characters and chunks them together into a series of something more akin to “words”. In programming languages, each of these words is called a token.

A parser takes the flat sequence of tokens and builds a tree structure that mirrors the nested nature of the grammar. These trees have a couple of different names—parse tree or abstract syntax tree—depending on how close to the bare syntactic structure of the source language they are. In practice, language hackers usually call them syntax trees, ASTs, or often just trees.

![[Pasted image 20250620224021.png]]

The first bit of analysis that most languages do is called binding or resolution. For each identifier, we find out where that name is defined and wire the two together. This is where scope comes into play—the region of source code where a certain name can be used to refer to a certain declaration.