StreASM Retrospective

2018/03/02

There haven’t been many projects that I enjoyed doing whilst at university, and I enjoyed even fewer that were pair or group working tasks. Most university projects are some brand of tedious, and often for pair/group projects, you can’t choose your team members - I usually end up doing the lion’s share of the work. :(

StreASM was completed as part of my Programming Language Concepts module in the second semester of my second year (Spring 2016), and I really enjoyed it. I worked on it with Daniel Lockyer, and we worked well to create a programming language that I’m still very proud of. You can see Lockyer’s writeup about it here, read the report/user manual here, or browse the implementation code here.

The specification required us to create a programming language that would operate on streams of numbers. We were given 5 problems (eg: sum two streams of numbers, prefix a stream with zero, convert one stream into two copies) that our programming language must be able to solve. A further 5 problems would be given to solve with our programming language after it had been designed and submitted. The catch, which made things particularly interesting, was that if we needed to adapt our programming language to solve those additional problems, we’d lose 25% of the marks. This meant that we had to design a robust language that could solve any stream problem that we could think of, and that the interpreter for that language had to work properly in all cases.

StreASM concept

We strategically decided to design our language to be similar to Assembly programming, for various reasons:

Assembly is a very simple language and the concept of streams as described in the specification was a simple domain - they were a natural fit.
Assembly doesn’t worry about things like scoping - everything is global. Designing a language that requires accurate scoping rules such as parameterised function calls, variables which reuse the same names, etc, would have greatly increased the complexity of the interpreter.
A rich instruction set of Assembly is able to compute anything that a high-level programming language is able to compute. This meant that if the second set of problems were much more complex than the first, we would theoretically be able to implement them, even if we didn’t have language specific features to support them.
The simplicity of Assembly would allow us to easily keep our grammar unambiguous, ensuring that our programs would run deterministically.
The simplicity of Assembly would also save us time, allowing us to implement things faster and spend longer debugging.
Assembly and stream processing could be used to make a nifty name, ‘StreASM’, a portmanteau of ‘Stream’ and ‘ASM’ (the common abbreviation for Assembly).
Most importantly - I was very passionate about Assembly from my work with Frequency Central. Lockyer thought Assembly was cool too.

Assembly features in StreASM

We implemented a set of 25 standard assembly instructions, allowing mathematical operations such as addition and subtraction; bitwise instructions such as bit setting/clearing, AND, XOR, and complement; conditional branching such as test if zero, test if one register is greater than another, test if two registers are equal; as well as unconditional jumping, and jump and return.

Our control flow commands use labels as in regular assembly. We also implemented the @NEXT and @END labels, which jump to the next instruction and the end of the program, respectively. On reflection, a flaw here is that we don’t allow arbitrary jumping based on the instruction pointer. Eg: JMP $-2 to jump two instructions backwards isn’t possible.

We also implemented features for programmer convenience. We allowed the programmer to add aliases (we called them ‘Nice Names’ at the time) to registers (eg: #DEF count r0 to allow you to write MOV count, 1 instead of MOV r0, 1. We also added commenting! - for such an easy feature to implement, this seemed like a rarity among our class!

StreASM special features

As we were designing a language to be interpreted, and not something which would convert directly to machine code, we were able to implement some convenience features that are not present in regular assembly. For example, if r0 is a register containing the value 22, and r22 is a register containing the value 64, you can indirectly refer to register r22 with the syntax r[r0], which, when evaluated, has the value of 64. Programmers should recognise that our ‘registers’ are also essentially arrays.

Our language needed to be able to read from stdin and write to stdout, so we created the NXT instruction, which can be used to read or write the next value of each stream to or from a specified identifier (representing a block of registers). As multiple streams can be input or output at once, the zeroth register in the identifier (again, think arrays) notes how many streams there are, whilst the first register contains the value from the first stream, and so on. So for example, if the incoming streams has 21 as the first value for the first stream and 74 for the second stream, running NXT i, stdin will fill i0 with 2 (because there are two streams), i1 with 21, and i2 with 74.

I feel like these two design features neatly filled the gap between the range of Assembly and the domain of stream processing.

Implementation

Lockyer and I are both skilled programmers, and we worked together to solve the problems, so we didn’t have too many problems (though I remember hating the OCaml syntax!). We sat side by side in the ECS labs pair-programming. At the time, we didn’t expect our repository to become public-facing, so there are some interesting commit messages among the jumble.

As we were implementing quite a simple language (as Lockyer notes in his write up, our language was a total of only 377 lines of code), we finished well in advance of the deadline. We used our remaining time to harden our language, test for bugs, and thought about the types of problems we might have difficulty solving.

It turned out to be a shrewd decision to designed a simple language, as some of our more competent coursemates had engineered highly complex languages leading them to overrun the deadline when they encountered problems (Losing a total of 10% every day past the deadline!). Some teams cut their losses and submitted code that didn’t fully work, and had to forfeit the 25% marks for adapting their programming language during the second half.

Assessment

The second set of problems were quite trivial compared to some of the problems we had been expecting, and I was able to implement solutions for them using StreASM within two hours. It turned out, implementing the bitwise operations was completely unnecessary! We didn’t use them!

Our lecturer had this to say about StreASM:

StreASM — an assembler. Amazing! I can’t say that assembler programming was what I had in mind, but full marks for originality. The manual is very nicely done. Obviously “programmer convenience” is something alien to any real assembler programmer, but comments were a nice touch. Great work.
You get 25/30 for the qualitative aspects (don’t tell anyone, otherwise people will spit at me at programming language conferences).
Your total coursework mark is 95/100 — well done!
– Dr. Pawel Sobocinski

Sorry Pawel, but I think I just leaked that to the internet! I hope the consequences aren’t too severe!

What made this coursework fun

As I mentioned at the start of this post, I generally dislike University courseworks.

This one wasn’t like that. This coursework was actually fun; it was a notable coursework that I can reflect on and am still proud of the work I produced.

Most University projects give you a specification for what you must achieve and essentially lays out exactly what you must do, without any room for creativity - sometimes, creativity is outright punished. Specific implementations may be different but everyone will produce pretty much the same thing. This project wasn’t like that. The inputs and outputs were described, but everything else was left to the student. This let us have creativity to solve the problem, and a lot of the teams put in a lot of effort. Every team produced unique works, and I’m sure most are proud of what they made. One team used strange keywords for their language, so their programs were like some kind of tribal war chant, which I thought was particularly cool.

The fact that the work we produced in the first part would have an effect on our ability to complete the second part gave a reason to try to create something that worked well and that could solve the problems posed. Often coursework is some kind of pointless throw away project that doesn’t really matter how well it works or how functional it is, as long as the ‘learning objectives’ are met.

The examiner (Dr. Pawel Sobocinski) showed that he had actually examined our works with his specific comments - some feedback I have received in different courseworks from different lecturers could easily have been sent to a different team and still have made sense, because they’re often woefully unspecific. This extra attention makes it feel like the special effort you put in mattered.

The very nature of everyone doing their own thing made it really fun to see what other teams were doing, and how they were solving the problems. Similarly, because everyone was doing their own thing, I’m sure plagiarism was very easy to detect and didn’t really occur much.

Doing your own thing means that you produce a unique work that you can be proud of later - I’m pretty much doing that now, writing this post two years on. The end result wasn’t just another student apartments booking website that is near identical to that produced by your peers.

Example StreASM code

;Stream arithmetic
;Take two sequences a1 a2 a3 a4
;               and b1 b2 b3 b4
; and produce the single sequence
; a1 + 3b1
; a2 + 3b2
; a3 + 3b3
; a4 + 3b4

main:
	NXT i, stdin		;Get the two streams
	TSTE i0, 2, @NEXT, @END	;Check we had two streams, otherwise terminate
	MUL r2, i2, 3		;Multiply the second stream value by 3
	ADD r3, i1, r2		;Add the first stream value and the multiplied value
	MOV o1, r3		;Output the calculated value
	NXT stdout, o
	JMP main		;Loop

;Natural Numbers
;Take a sequence a1 a2 a3 a4 a5 ... as an input
;and output 
;a1
;2a1 + a2
;3a1 + 2a2 + a3
;4a1 + 3a2 + 2a3 + a4
;5a1 + 4a2 + 3a3 + 2a4 + a5

	MOV r0, 0		;This is the current overall total
	MOV r1, 0		;This is the sum of a1, a2, a3, ...
main:
	NXT i, stdin		;Get the input
	TSTZ i0, @END, @NEXT	;Test if we reached the end of the stream, if so terminate
	ADD r1, r1, i1		;Add the new value to the sum of 1 of each values
	ADD r0, r0, r1		;Add the sum to the current total
	MOV o1, r0		;Output the new total
	NXT stdout, o
	JMP main		;Loop

;Reverse Single Finite Stream
;Take a sequence a1 a2 a3 a4 a5 as an input
;and output      a5 a4 a3 a2 a1

	MOV r0, 0		;Counter of total number of values
collect:
	NXT i, stdin		;Get the input
	TSTZ i0, reverse, @NEXT	;If we reached the end of the stream, reverse it
	INCR r0			;Increment counter
	MOV r[r0], i1		;Move the new stream value to it's own register
	JMP collect		;Loop until we reach end of stream
reverse:
	TSTZ r0, @END, @NEXT	;Terminate if we outputted everything
	MOV o1m r[r0]		;Output the current latest value
	NXT stdout, o
	DECR r0			;Decrease the counter to previous value in sequence
	JMP reverse		;Loop until we outputted everything

P.S. I'm late to the party, but I recently got a twitter account that you can follow here.