Adder

We will end the ALU exploration with the most complicated operation, addition.

We start this post with general addition overview. In the end we will go into how adder is implemented with NAND gates and we will connect it to the bus.

We will explore the concept of adders, starting with the basic operation of adding two 4-bit numbers and gradually delving into more advanced topics like Carry Lookahead Adders.

Adding Two 4-Bit Numbers

Let's begin by understanding how we can add two 4-bit binary numbers. For this example, let's add "A = 1001" and "B = 1011". These two binary numbers represent 9 and 11, so their sum should be 20 or "10100" in binary. Here are the steps to perform this addition:

1. Bitwise Addition

We start by adding the rightmost bits of both numbers, just like how we do in decimal addition. We than move to the next bit and so on:

100 1 + 101 1 ------- 10 Everything normal, adding 1+1 = 10. The 1 is carry and the 0 is sum for that digit.

2. Carry Propagation

In binary addition, if the sum of two bits is '2' (or "10" in binary) or greater, a carry is generated and propagated to the next higher bit. In our example, we have a carry from the addition from the first step. We now add the second rightmost bits and add the carry we generated.

We write the carry bit above the next bit in line.

1 <- generated carry 10 01 + 10 11 ------- 100

As you can see the result is 01+11=100. We again have one bit extra (carry).

3. Continue Addition

Repeat the above steps for each pair of bits from right to left, taking into account any carry generated in the previous step. Here's the complete addition:

11 <- carry bits 1001 + 1011 ------- 10100

So, the sum of `1001` and `1011` is `10100` in binary, which is equivalent to 20 in decimal.

1-Bit Adders

The process we just followed for adding two 4-bit numbers can be broken down into smaller 1-bit additions with carry in and carry out. By chaining several of those small adders we can create a whole number adder.

We will break up the addition vertically, one of the small adders will add the bits at the same place and the carry from one step before. This is the small adder circuit we are looking for:

1 bit adder
Loading the circuit ....

Setting either A or B gives sum=1 and carry_out 0. If there are at least two out of {A, B, carry_in} active that means we have at least two 1s in our sum. Which means that the result is at least 2 or in other words carry_out is 1.

To create the adder for 3-bit numbers we can chain these small adders.

3 bit adder
Loading the circuit ....

You can see how the carry_out of one adder is connected to carry in of another. To make a 4-bit adder we would just create another small adder and connect A3 and B3 to its A and B and carry out of 3rd bit to its CarryIn.

To sum 2 + 7 (10 + 111) we would set:

A2 = 0 A1 = 1 A0 = 0 B2 = 1 B1 = 1 B0 = 1

The result should be 9 or 1001.

This small adder has 3 inputs, bit A, bit B and carry, and creates two outputs, sum and carry. This is how it works:

Carry In Bit A Bit B Sum Carry Out
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1

We basically just sum first three columns, the most significant digit of the result is "Carry out" and least "Sum". Interesting thing is that we can decompose the summation of those three numbers (Carry in, bit A, bit B) into three summations:

A + B -> sumAB, carryAB carry_in + sumAB -> sumCAB, carryCAB carryAB + carryCAB -> sum2, carry_total the result is sumCAB, carry_total
Here is one fulladder decomposed into 3 halfadders
Loading the circuit ....

We will call a circuit that takes two bits and outputs sum and carry "HalfAdder". Why half? Why not third? Yes currently we need 3 of them, but we will show we only need 2 of them. This means one of the circuits is half the whole adder, or "HalfAdder".

So why do we need only 2 of them? Let us look at the role of the last halfadder. It does:

carryAB + carryCAB -> sum2, carry_total

The thing is carry_total is always 0. For it to be 1, carryAB and carryCAB have to be 1, but both cannot be 1. You can look at the table above, but if carryAB is 1 than sumAB is 0 which means summing that with carry_in cannot generate carryCAB=1.

We just deduced that carryAB and carryCAB cannot both be 1, so to get their sum we can just do OR operation:

Full adder out of half adders
Loading the circuit ....

Half Adders

Let us see how we can construct the half adder:

Half adder
Loading the circuit ....

A Half Adder is a digital circuit that takes two 1-bit binary inputs, `A` and `B`, and produces two outputs: the sum (`S`) and the carry out (`carry_out`). Here's a truth table for a Half Adder:

A B Sum Carry Out
0 0 0 0
0 1 1 0
1 0 1 0
1 1 0 1

If you read logic gates chapter, you can easily figure out which logic gates we can use. Carry Out is 1 only when both of the A and B are 1, this is the definition of AND gate.

Sum is 1 when only on of {A, B} is 1. This is the definition of XOR gate, so this is our circuit:

Half adder
Loading the circuit ....

While it may seem small, it is not. XOR gate hides a lot of complexity, we need 4 NAND gates for it, converting the AND gate to NAND we get 1 NAND and 1 NOT. With the total of 5 NAND and 1 NOT gate.

Let us plug this into the fulladder and see can we shorten something.

Full adder
Loading the circuit ....

We need to turn the 2 AND and one OR gate to things we can implement (NAND, NOR and NOT):

Full adder
Loading the circuit ....

The right part, arount NOR gate can be optimized using De Morgan law:

NOT NOR(NOT A, NOT B) = NOT NOT OR(NOT A, NOT B) // remove double NOT = OR(NOT A, NOT B) // this is by De Morgan = NOT AND (A, B) // this is just a NAND = NAND(A, B)

This is the result:

Full adder
Loading the circuit ....

We can actually remove 2 more NAND gates. If we recall how we implemented the XOR gate like this:

XOR gate
Loading the circuit ....

In the half adder NAND gate and XOR gate have the same inputs. If we look again at the XOR gate implementation we can see that we already have a NAND gate that has the same inputs as the one in the half adder.

We can than just remove the NAND gate in the half adder and use the one in the XOR gate implementation. The final full adder circuit is than:

Full adder from NAND gates
Loading the circuit ....

As we can see there are in total 9 NAND gates. The cluster in the top left is the first XOR gate. The cluster in the bottom right is the second XOR gate. This means we can implement it in 18 transistors.

Connecting to the bus

To let the output of the adder be connected to the TMP registers input bus (learn more about it in How is it all connected) we need to add bus connectors.

This is relatively easy with the standard connector we developed:

2 bit adder wit bus gating
Loading the circuit ....

Carry Lookahead Adders

I do not have Carry Lookahead Adder but it seems important to mention.

The problem with this Serial adder we have is that it is slow. To calculate the last sum, we need to calculate the carry of the one before. To calculate that carry we need the one before and so on. There is a short delay in every one of those caluclations.

Carry Lookahead Adders (CLA) are a more advanced type of adder that reduces the propagation delay associated with traditional adders. They basically calculate all the carries in parallel.

This eliminates the need for the carries to ripple through the entire adder, making CLAs much faster for large numbers. This requires more transistors but is faster.

Check your understanding

How many transistors does my 11bit adder have (include the gating transistors)? If the processor has 2000 transistors, what is the percentage that is allocated to the adder?
Full adder has 9 NAND gates, we need to add 1 more NOT and NAND for the gating. So each bit has 10 NAND and 1 NOT gate. That is 10*2+1=21 transistors per bit. 11 bits is 11*21 = 231, which is 231/2000 = 11.5% of transistors.
If we are always doing a simple addition between 2 numbers, can we simplify the first full adder?
If we are doing only addition we can remove the first carry_in and replace the full adder with half adder. I actually did not do that in my processor. The reason is because carry_in can be used to do subtraction.
How much slower is 32 bit adder compared to 4 bit one?
In our serial adder design the last bit of calculation has the highest latency. It is only calculated after every other bit is calculated. The one before it is also calculated when all of the ones before it are. This means that the last full adder waits nbits-1 times. So when we include it that is nbits*full_adder_latency time needed. This means 32 bit adder is 32/4=8 times slower.