Creating a Sinclair BASIC interpreter
Given my new website design, I figured I’d also have a go at making an interpreter for Sinclair BASIC so that I can run my own ZX Spectrum programs. I’ve got a few aims:
- Create something that can properly parse Sinclair BASIC
- Run the interpreted software in the browser
- Allow user input
What this isn’t supposed to be:
- A perfectly-performant implementation
- An emulator
I’m not a Rust developer, and I’m only learning myself, so everything I write here will be suboptimal. If anyone reading wants to give me some pointers (pun intended), I’d be forever grateful.
I’m going to use the ZX Basic Instruction Manual as my main reference for this project.
Source code for the project is available on Github
Getting started
I’m using Rust for this, so I create a new Cargo project:
cargo new basic-interpreter
And I know I’m going to need to parse input, so I’m going to use nom for parsing:
cargo add nom
Hello, World
Okay, so to begin with we’re going to implement the simplest program we can: Hello World. It’ll be a single line program that just prints out the string. Here’s the program in question:
10 PRINT "Hello, World"
There are three parts to this statement:
- The line number - this is in theory optional, but we’ll handle that later
- The command, in this case
PRINT
- The input. There is some different variations of input, but for now we’re just going to handle single strings
Parsing
Okay so let’s get started with our parser! We’ll start by writing a test for a line to make sure it parses okay:
#[test]
fn it_parses_a_print_command() {
let input = "10 PRINT \"Hello, world\"";
let expected = (10, super::Command::Print(String::from("Hello, world")));
let (_, result) = super::parse_line(input).unwrap();
assert_eq!(expected, result);
}
And let’s create our types:
pub type Line = (u32, Command);
#[derive(Debug, PartialEq, Eq)]
pub enum Command {
Print(String),
None
}
To start with, we’ll extract the line number:
pub fn parse_line(line: &str) -> IResult<&str, Line> {
let (i, line_number) = terminated(ccu32, tag(" "))(line)?;
Ok((line, (line_number, Command::None)))
}
Then we need to parse the command:
fn read_string(i: &str) -> IResult<&str, &str> {
take_until("\"")(i)
}
fn parse_command(i: &str) -> IResult<&str, Command> {
let (i, (command, _)) = tuple((take_until(" "), tag(" ")))(i)?;
let (i, cmd) = match command {
"PRINT" => map(delimited(tag("\""), read_string, tag("\"")), Command::Print)(i)?,
_ => (i , Command::None)
};
Ok((i, cmd))
}
pub fn parse_line(line: &str) -> IResult<&str, Line> {
let (i, line_number) = terminated(ccu32, tag(" "))(line)?;
let (i, command) = parse_command(i)?;
Ok((i, (line_number, command)))
}
Finally, let’s write some code to quickly run our program:
use std::fs;
mod basic;
fn main() {
let file = fs::read_to_string("./src/inputs/hello_world.bas").unwrap();
let lines = file.lines().next().unwrap();
let (_, (_, command)) = basic::parse_line(lines).unwrap();
match command {
basic::Command::Print(input) => {
println!("{}", input);
}
_ => {
panic!("Command not recognised");
}
};
}
And we can run it:
$ cargo run
Compiling basic-interpreter v0.1.0 (/Users/lewis/development/personal/basic-interpreter)
Finished dev [unoptimized + debuginfo] target(s) in 0.51s
Running `target/debug/basic-interpreter`
Hello, world
Cool, that works!
Escaped characters
Okay, but what about if I change my program to print quote characters ("
)?. To do this, we need to escape the strings:
10 PRINT "Hello, \"World\""
Which we would expect to result in:
Hello, "World"
However because we’re using take_until
, our parser stops at the first escaped quote, resulting in:
Hello, \
To fix this, we need to use the escaped_transform
parser:
fn read_string(i: &str) -> IResult<&str, &str> {
delimited(
tag("\""),
escaped_transform(
none_of("\\\""),
'\\',
alt((value("\\", tag("\\")), value("\"", tag("\"")))),
),
tag("\""),
)(i)
}
What we’re saying here is accept any character that doesn’t match either \
or "
(none_of("\\\"")
), where \
is our escape character. Finally, we match escaped quote characters and escaped backslashes and un-escape them so that they print properly (otherwise our output will include escape characters when printed).
Basic looping
Alright, next is everybody’s favourite command: GO TO
, the lets us jump to a a different part of the program.: Here’s a short program using our two commands that will print “Hello World” infintely:
10 PRINT "Hello World"
20 GO TO 10
Parsing
The first thing that leaps out to me from this command is that GO TO
contains a space. That won’t work with our current parser, which reads a string until it meets a space. Instead, we should try and be more specific:
#[derive(Debug, PartialEq, Eq)]
pub enum Command {
Print(String),
GoTo(usize),
None,
}
fn match_command(i: &str) -> IResult<&str, &str> {
alt((
tag("PRINT"),
tag("GO TO")
))(i)
}
fn parse_command(i: &str) -> IResult<&str, Command> {
let (i, command): (&str, &str) = match_command(i).unwrap_or((i, ""));
println!("{}", command);
let (i, _) = tag(" ")(i)?;
let (i, cmd) = match command {
"PRINT" => map(read_string, Command::Print)(i)?,
"GO TO" => map(ccu64, |line| Command::GoTo(line as usize))(i)?,
_ => (i, Command::None),
};
Ok((i, cmd))
}
Building a program
For GO TO
to function, we need a structure to actually store our program. We need to:
- Store a command as a line
- Easily move to the next line
- Search a program for a line by number
(A real compiler would do lots of clever things here that would enable it to drop unreachable code and optimise things, but that’s not what we’re here for).
We… might need a Linked List. They’re pretty notoriously a headache in Rust due to ownership rules - but we can use Box
to help mitigate this:
pub enum Node {
None,
Link { item: Line, next: Box<Node> }
}
We’ll need to add the Copy
trait to our Command
enum for this to work:
#[derive(Debug, PartialEq, Eq, Copy)]
pub enum Command {
Print(String),
GoTo(usize),
None,
}
And then implement a (very basic) Linked List:
#[derive(Debug, PartialEq, Eq, Clone)]
pub enum Node {
None,
Link { item: Line, next: Box<Node> }
}
impl Node {
fn push(&mut self, val: Line) {
*self = match self {
Self::Link { item, next } => {
next.push(val);
Self::Link { item: item.clone(), next: next.clone() }
},
Self::None => Self::Link { item: val, next: Box::new(Self::None) }
}
}
}
We also want to be able to find a line, so we’ll write a simple find_line
function too:
fn find_line(&self, line: usize) -> Option<Node> {
if let Self::Link { item, next } = self {
if item.0 == line {
Some(self.clone())
} else {
next.find(line)
}
} else {
None
}
}
Finally, build a parser to read every line and store it in a list of nodes:
pub fn read_program(i: &str) -> IResult<&str, Node> {
let (i, lines) = separated_list0(tag("\n"), parse_line)(i)?;
let mut node = Node::None;
for line in lines.iter() {
node.push(line.clone());
}
Ok((i, node))
}
Running the program
We have a list of instructions. Now we need to martial them and provide an interface to run them. To do this, I’ve created a Program
struct that holds a reference to the complete program, and a cursor for the instruction currently being executed:
#[derive(Debug, PartialEq, Eq, Clone)]
pub struct Program {
nodes: Node,
current: Node,
}
I’m also going to implement Iterator
for the struct, so that we can easily loop over all of the instructions:
impl Iterator for Program {
type Item = Node;
fn next(&mut self) -> Option<Self::Item> {
let curr = self.current.clone();
match &self.current {
Node::Link { item: _, next } => {
self.current = *next.clone();
Some(curr)
}
Node::None => None,
}
}
}
Then we add an execute
function, as well as a function to jump to a line:
impl Program {
pub fn new(node: Node) -> Self {
Program {
nodes: node.clone(),
current: node,
}
}
pub fn to_line(&mut self, line: usize) {
if let Some(node) = self.nodes.find_line(line) {
self.current = node;
} else {
panic!("Cannot jump to line {}, it does not exist", line);
}
}
pub fn execute(&mut self) {
let mut iter = self.clone();
while let Some(node) = iter.next() {
match node {
Node::Link { item, next } => {
match item.1 {
Command::Print(line) => println!("{}", line),
Command::GoTo(line) => iter.to_line(line),
_ => panic!("Unrecognised command")
}
},
_ => ()
};
}
}
}
Now, we can run a new sample program, using the provided struct:
fn main() {
let file = fs::read_to_string("./inputs/simple_program.bas").unwrap();
let (_, mut program) = basic::read_program(&file).unwrap();
program.execute();
}
$ cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
Running `target/debug/basic-interpreter`
Hello World
Hello World
Hello World
Hello World
Hello World
(truncated ∞ lines)
I think that’s where I’ll leave this (surprisingly long) post. This was loads of fun, to be honest. I think for my next post I’ll be adding some logic statements (IF
) - aim to get this as close to a functioning piece of software as quickly as possible, and then work on some of the more fun commands.
I’m also going to refactor things a bit, because my basic.rs
file got quite messy towards the end.