Hello world

Our CPU is finally ready to run the program every programmer must run first, hello world.

Building on the assembly from the last post we will create the init sequence for the LCD and send characters one by one.

CPU simulator is at the end of the page.

LCD init

More complicated devices like the LCD have a whole protocol that you need to use to communicate with them. The steps we need to go through to get our display running are not long:


    // clear the screen
    send 0x401
    // return the pointer
    send 0x402
    // set pointer direction 
    // to the right
    send 0x406
    // turn cursor on
    send 0x40F
    // turn shift on
    send 0x41C
    // make LCD write 2 lines 
    // in simple font
    send 0x438

    //now we can send whatever
    //we want

The sequence has to be in this order.

LCD input is connected directly to the bus and in Micromemory programming we created the "PRINT" instruction to directly send the data to it.

"PRINT" sends the data that is in the accumulator, so to send the above instructions to the LCD we first load it to the ACC and than use the "PRINT" instruction.


 // clear the screen
 LDAI 0x401
 PRINT 0
 // return the pointer
 LDAI 0x402
 PRINT 0
 // set pointer direction 
 // to the right
 LDAI 0x406
 PRINT 0
 // turn cursor on
 LDAI 0x40F
 PRINT 0
 // turn shift on
 LDAI 0x41C
 PRINT 0
 // make LCD write 2 lines 
 // in simple font
 LDAI 0x438
 PRINT 0

 //now we can send whatever
 //we want
loop 
 B loop

Loading is done using LDAI, remember, LDAI loads the number next to the instruction to the ACC. You can refresh your memory of the supported instructions here.

Printing letters

To print a letter we need to send the ascii representation of it to the LCD. If we were to follow its protocol we would also need to set another pin of the LCD high, but that is done automatically in my processor. So to write an 'A' we just send the bitwise representation of it via the PRINT instruction.

Here is a simple way to write 'Hello world' on the LCD (you can always test this assembly in the simulator at the bottom of the page).


loop 

 LDAI 0x401 
 PRINT 0 
 LDAI 0x402 
 PRINT 0 
 LDAI 0x406 
 PRINT 0 
 LDAI 0x40F 
 PRINT 0 
 LDAI 0x41C 
 PRINT 0 
 LDAI 0x438 

 PRINT 0 
 LDAI 'H' 
 PRINT 0 
 LDAI 'e' 
 PRINT 0 
 LDAI 'l' 
 PRINT 0 
 LDAI 'l' 
 PRINT 0 
 LDAI 'o' 
 PRINT 0 
 LDAI 32
 PRINT 0 
 LDAI 'W' 
 PRINT 0 
 LDAI 'o' 
 PRINT 0 
 LDAI 'r' 
 PRINT 0 
 LDAI 'l'  
 PRINT 0 
 LDAI 'd' 
 PRINT 0 
 LDAI '!' 
 PRINT 0 
 B loop

Note, simulator does not support the string representation of space (' ') so we insert the number representation of it (32).

We are manually loading characters to the ACC and sending the to the LCD. After we do all of that we do an unconditional branch jump to the beginning and do all of that again. And again. And again. Is there a better way?

Using SP

If we would be writing in c we would write something like this:


    int data[18] = {
        0x401, 0x402, 0x406,
        0x40F, 0x41C, 0x438,
        'H', 'e', 'l', 'l', 'o',
        ' ',
        'W', 'o', 'r', 'l', 'd',
        '!'
    }
    while (true){
        int i=0;
        for (i=0; i<18; i++)
            print(data[i])
    }

The code separates data and instructions. Data is stored in an array called "data" and we send the components of it with a for loop. We go from 0 to lenth of it and send data pieces one by one.

We would like to implement something like this in our assembler, but we somehow need to implement the for (and while) loop. Another important thing is that if we are using purely the accumulator, there is no way to dereference it. What does that mean? There is no way to interpret its value as memory address and get get contents of that address.

In other words, how do we do "data[i]" operation if i is stored in ACC and we do not have a direct instruction to fetch the value from the bus? We have to use the stack pointer.

data[i]

We created 4 nice instructions that work with stack pointer. "SP2ACC", "ACC2SP", "LDAFS" and "STA2SP". Than to fetch the contents of memory at address dictated by the ACC we do:


    ACC2SP 0
    LDAFS 0

Note that we do not need the address parameters of these instructions, but have to include them for all instructions to be of the same size. The value of ACC will stay inside SP.

for loop

Let us try to do this part:


    int i=0;
    for (i=0; i<18; i++)
        print(data[i])

How do we do the for loop? We need to use conditional branching. If the condition is fulfilled we go at the top of the loop and in that way create the looping behaviour. If the condition is not met we exit the loop, which will in our case mean just treating that instruction as NOP.


    LDAI 0
loop
    ADDI 1
    if_ACC_is_not_18_goto_loop

This code will loop 18 times. ACC will start at 0 and be incremented by 1 in every loop till it comes to 18 at which point the if_ACC_is_not_18_goto_loop will not do anything (behave as NOP) and execution will continue. In this way we succesfully created the for loop. Now what is undreneath "if_ACC_is_not_18_goto_loop"?

We need to use conditional branching instruction, we have 2 of them "BEQ" which branches when ACC is 0 and "BNEQ" which branches when ACC is not 0. But how to make it branch if it is 18?

Easy, 18 is just an offset, we can just substract 18 from ACC and check if it is 0:


    LDAI 0
loop
    ADDI 1
    ADDI -18
    BNEQ loop

This way if ACC was 18 than after substraction (addition of -18) it will be 0. The only problem is that we are destroying the ACC value when we do ADDI. We can save the value and recover it:


    B begin
    0
begin
    LDA 2
    ADDI 1
    STA 2
    ADDI -18
    BNEQ begin

This will run 18 times. We are using memory location 2 as a temporary store for the ACC. In the real solution we do not have to load and store ACC many times because we will use SP to hold the index value. ACC will than serve as a messager between different parts.

Putting it together

So if we combine the data, SP, looping and branching we get this:


 B hop
 3
 0x401
 0x402
 0x406
 0x40F
 0x41C
 0x438
 'H'
 'e'
 'l'
 'l'
 'o'
 32
 'W'
 'o'
 'r'
 'l'
 'd'
 '!'
hop
 LDAI 3
 ACC2SP 0
loop
 LDAFS 0
 STA 2
 ADDISP 1
 ACC2SP 0
 LDA 2
 PRINT 0
 SP2ACC 0
 ADDI -19
 BNEQ loop
 B hop

So to recap, this:


    LDAI 3
    ACC2SP 0

Is just loading the address at which our data starts, which is 3. 0 and 1 are B hop and 2 is the temporary variable. And this is the meat of the program:


loop
// get value from memory
 LDAFS 0

// increment SP (i in the c program)
// we load store ACC
 STA 2
 ADDISP 1
 ACC2SP 0
 LDA 2

// print to LCD
 PRINT 0

// check if SP (i) == 20 which is last memory 
// address we need to print
 SP2ACC 0
 ADDI -20
 BNEQ loop

We also have "B hop" which enables us to do "while (true)"

Check your understanding

How much is our final approach faster than the naive (first) one for large strings (example 'Hello world'*1000) ? ▼

We need to see how many instructions are needed to print out a single character. We can approximate that every instruction lasts the same amount (most of the time is spent in fetch phase so this is a valid thing to do). Naive way has 2 instructions per char and complicated one has 9. So for large sequences complicated one is 9/2=4.5 times slower.

How much more memory efficient is our final approach compared to the naive one? (again for large strings with many characters) ▼

To print one character in the simple case we need 2 instructions which have 2 memory addresses each, which is 2*2=4 memory addresses. Complicated one has data separate so uses 1 address per character. So complicated one is 4/1=4 times more memory efficient.

Simulator

acc:

pc:

tmp:

sp:

ir:

dc:

Clock rate (Hz)