TD4 4-bit DIY CPU – Part 7
Once the idea was floated, in Part 6 of creating an Arduino “direct to ROM” assembler, I had to just do it, so this post is a little diversion from the hardware discussion into how that could work.
- Part 1 – Introduction, Discussion and Analysis
- Part 2 – Building and Hardware
- Part 3 – Programming and Simple Programs
- Part 4 – Some hardware enhancements
- Part 5 – My own PCB version
- Part 6 – Replacing the ROM with a microcontroller
- Part 7 – Creating an Arduino “assembler” for the TD4
- Part 8 – Extending the address space to 5-bits and an Arduino ROM PCB
Basic Concepts
This relies on using an Arduino as the ROM as described in Part 6, but the Arduino now has the option to change the ROM contents independently of the TD4 itself.
The Arduino sketch will do the following:
- Run the TD4 ROM routine off a timer interrupt so that it is always running and responsive.
- Take input over the Arduino serial port to allow basic control, e.g. list, clear, etc.
- Allow the direct input of assembler instructions, such as MOVE A,B or OUT B and so on.
- Provide a means of selecting which line of the program to change.
The code will thus have a number of key sections:
- The TD4 ROM routine.
- Some kind of serial-port command-line interpreter.
- Handler routines for all the commands.
- An assembler.
- A disassembler.
The TD4 ROM routine has already been fully described in Part 6. The only difference is that the scanning routine will be driven from a 1mS timer using the TimerOne library.
As I want to still support a built-in demo, I now have the concept of ROM being the demo code and RAM being the “live” code to pass onto the TD4. The Arduino will initialise the RAM on startup from the ROM.
As far as the TD4 is concerned of course, this is all still ROM.
Command Line Interpreter
The standard Arduino Serial routines will be used to scan for input via the serial port. It will support a line-oriented input as follows:
bool cmdRunner (void) {
while (Serial.available()) {
char c = Serial.read();
if (c == '\n') {
strcpy(cmdSaved, cmdInput);
cmdIdx = 0;
return true;
}
else if (cmdIdx < CMD_BUFFER-1) {
cmdInput[cmdIdx++] = c;
cmdInput[cmdIdx] = '\0';
}
}
return false;
}
This will keep adding any received characters to the cmdInput buffer until a newline is received, at which point the command is saved in cmdSaved and the routine will return true indicating a full line is ready to be processed.
Once a complete line is received, then a processing function will parse it.
Key to the processing of commands is a command table that stores the text to match and the handler function to call on finding a valid command. There is an additional parameter that will be passed into the handler function to allow the same handler function to support several commands. This will be used in the assembler itself later.
struct cmd_t {
char cmd[CMD_BUFFER+1];
hdlr_t pFn;
uint8_t idx;
};
const cmd_t PROGMEM cmdTable[NUM_CMDS] = {
{"H", hdlrHelp, 0},
{"L", hdlrList, 0},
{"G", hdlrGoto, 0},
};
The algorithm for parsing commands is as follows:
cmdProcess:
Look for a space or newline
IF found a space THEN
This is the start of the parameter
Look for the command in the command table
IF command found THEN
Call the handler function with the parameters
The implementation is a bit complex, as it uses string pointers and has to chop and parse strings as it goes. It is also detailing with the command table in the Arduino’s PROGMEM which is an additional complication too.
In order to be able to use the same command line interpreter for the input of assembler instructions, I’ve had to simplify the syntax. There are no spaces in opcodes and there has to be a space between the opcode and immediate value if used.
Here are some examples:
IN A -> INA
MOVE A,B -> MOVAB
OUT im -> OUT im
JNC im -> JNC im
ADD A,im -> ADDA im
Handler Routines
All handler routines have the following prototype:
typedef void (*hdlr_t)(int idx, char *param);
void hdlrHelp(int idx, char *pParam) {
Serial.print("\nHelp\n----\n");
Serial.println("H: Help");
}
The idx parameter is the number in the last field of the command table. pParam will be a pointer to the parameter string for the command (if used).
As we’re dealing with strings all the time, there are a number of helper functions to do things like convert strings to numbers as well as others to print numbers in various formats.
Number formats are assumed to be as follows:
0..9 - decimal digits
0x0..F - hex digits
b0..1 - binary digits
The code provides the following:
- str2num – the basic string parsing routine to recognise all three number formats as strings.
- printbin – print a number in b0..1 format.
- printhex – print a number in 0x0..F format, allowing for a possible leading zero if required.
- printins – print an instruction in textual format.
- printop – print an instruction in binary and hex opcode format.
- printline – print a line number in a consistent binary and hex format.
The code supports the following commands, so each has its own handler function:
- H – help – show the list of commands.
- L – list – show the disassembly of the whole working memory (RAM).
- G – goto – set the working line number.
- C – clear – reset all working memory (RAM) to zeros.
- R – restore – restore the working memory (RAM) to the pre-build demo code (ROM).
- O – opcodes – list the supported opcodes.
Assembler
As already mentioned, I’m using the same command line interpreter code to create the assembler. To do this, each opcode has an entry in the command table:
const cmd_t PROGMEM cmdTable[NUM_CMDS] = {
// Assembly commands - must be first
{"ADDA", hdlrAsm, 0},
{"MOVAB", hdlrAsm, 1},
{"INA", hdlrAsm, 2},
{"MOVA", hdlrAsm, 3},
{"MOVBA", hdlrAsm, 4},
{"ADDB", hdlrAsm, 5},
{"INB", hdlrAsm, 6},
{"MOVB", hdlrAsm, 7},
{"OUTB", hdlrAsm, 8},
{"OUT2B", hdlrAsm, 9},
{"OUT", hdlrAsm, 10},
{"OUT2", hdlrAsm, 11},
{"JNCB", hdlrAsm, 12},
{"JMPB", hdlrAsm, 13},
{"JNC", hdlrAsm, 14},
{"JMP", hdlrAsm, 15},
// Other commands
{"H", hdlrHelp, 0},
{"L", hdlrList, 0},
{"G", hdlrGoto, 0},
{"C", hdlrClear, 0},
{"R", hdlrRestore, 0},
{"O", hdlrOpcodes, 0},
};
The order corresponds to the opcode command value, as does the parameter. As these are at the start of the table, I can assume that the position in the table is the same as the command value. This does mean that I also need to account for the duplicated instructions even if I don’t need to use them.
I’m making the following design decisions:
- There is the concept of a “current line” which can be set with the G (goto) command.
- Entering a valid opcode automatically moves the current line on by 1.
- No line information is entered as part of the opcode.
The main logic of the assembler handler is as follows:
Assembler:
Command value is the provided index parameter
Determine the im value from the provided string parameter
RAM[line] = cmd << 4 + im
Increment current line
Disassembler
Disassembly is really largely a look-up table matching opcode command values to text. This is all hidden away behind the two print routines printins() and printop().
void printins (uint8_t ins) {
uint8_t cmd = ins >> 4;
uint8_t im = ins & 0x0F;
Serial.print(FSH(cmdTable[cmd].cmd));
if (HASIM(cmd)) {
Serial.print(" b");
printbin(im,4);
} else {
Serial.print(" ");
}
}
void printop (uint8_t op) {
uint8_t cmd = op >> 4;
uint8_t im = op & 0x0F;
Serial.print("b");
printbin(cmd,4);
Serial.print(" ");
printbin(im,4);
Serial.print("\t0x");
printhex(op,2);
}
The main complexity is pulling the strings out of the command table. I’ve had to include a macro to provide access to the strings from the Arduino’s PROGMEM:
#define FSH(x) ((const __FlashStringHelper *)x)
This feels like a bit of a hack, but apparently this is how it should be done for the kind of thing I need to do!
There is another macro here that needs explaining:
#define HASIM(op) (op==0||op==3||op==5||op==7||op>9)
This is a set of conditions that if true means that the command supports an immediate value. This is used in a few places to know how to parse the commands.
Whilst in principle all commands could use the immediate value, the “official” statement of how they work assumes im=0 in many cases. So, for example, OUT B does not require an immediate value, but if one is provided then OUT B becomes OUT B+im.
I’m not really supporting that with this code at the moment.
Putting it all together
Here is a serial output log of a session using the assembler.
> H
Help
----
H: Help
L: List
G: Goto
C: Clear
R: Restore
O: Opcodes
OpCode
OpCode im
Current line: b0000 [0]
> L
RAM Disassembly
b0000 [0]: JNC b1000b1110 10000xE8
b0001 [1]: JMP b0011b1111 00110xF3
b0010 [2]: OUT b0010b1010 00100xA2
b0011 [3]: ADDB b0001b0101 00010x51
b0100 [4]: OUT b0100b1010 01000xA4
b0101 [5]: ADDA b0001b0000 00010x01
b0110 [6]: OUT b1000b1010 10000xA8
b0111 [7]: ADDB b0001b0101 00010x51
b1000 [8]: OUT b0100b1010 01000xA4
b1001 [9]: ADDA b0001b0000 00010x01
b1010 [A]: OUT b0010b1010 00100xA2
b1011 [B]: ADDB b0001b0101 00010x51
b1100 [C]: JMP b0000b1111 00000xF0
b1101 [D]: ADDA b0000b0000 00000x00
b1110 [E]: ADDA b0000b0000 00000x00
b1111 [F]: ADDA b0000b0000 00000x00
Current line: b0010 [2]
> G 13
Goto line 13
Current line: b1101 [D]
> OUTB
Assemble:
b1101 [D] OUTB b1000 00000x80
Current line: b1110 [E]
> L
RAM Disassembly
b0000 [0]: JNC b1000b1110 10000xE8
b0001 [1]: JMP b0011b1111 00110xF3
b0010 [2]: OUT b0010b1010 00100xA2
b0011 [3]: ADDB b0001b0101 00010x51
b0100 [4]: OUT b0100b1010 01000xA4
b0101 [5]: ADDA b0001b0000 00010x01
b0110 [6]: OUT b1000b1010 10000xA8
b0111 [7]: ADDB b0001b0101 00010x51
b1000 [8]: OUT b0100b1010 01000xA4
b1001 [9]: ADDA b0001b0000 00010x01
b1010 [A]: OUT b0010b1010 00100xA2
b1011 [B]: ADDB b0001b0101 00010x51
b1100 [C]: JMP b0000b1111 00000xF0
b1101 [D]: OUTB b1000 00000x80
b1110 [E]: ADDA b0000b0000 00000x00
b1111 [F]: ADDA b0000b0000 00000x00
Current line: b1110 [E]
> O
Supported OpCodes:
b0000 dataADDA im
b0001 0000MOVAB
b0010 0000INA
b0011 dataMOVA im
b0100 0000MOVBA
b0101 dataADDB im
b0110 0000INB
b0111 dataMOVB im
b1000 0000OUTB
b1001 0000OUT2B
b1010 dataOUT im
b1011 dataOUT2 im
b1100 dataJNCB im
b1101 dataJMPB im
b1110 dataJNC im
b1111 dataJMP im
> C
Clearing RAM ... Done
Find the code on GitHub here.
Conclusion
The basics for this actually came together fairly quickly, but I must admit to spending a fair bit of time fiddling about with output formats and refactoring various bits of code to try to give some consistency in terms of when newlines are applied, what is shown in binary, what in hex, and so on.
I can’t guarantee everything has been caught, but I’ve typed in all the code (using the newer, limited syntax) from Part 3 and they all seem to work.
It would be nice to be able to automatically reset the TD4 from the Arduino, but for now, pressing the button when required is fine.
For the most part, unless there is a loop to get caught in, the code will cycle back to the start anyway.
In terms of possible updates and enhancements, there are a few on my mind:
- It would be nice to support the undocumented use of immediate values somehow.
- It might be nice to have a way to save/load the code. It only needs to be a string of 16 2-byte hex codes.
- It might be nice to have several demo programs to choose from.
If I expand the instruction set and architecture, then I’ll have to think again about chunks of this code, but for now, it seems to work pretty well.
Kevin
#4bit #arduinoUno #define #td4