Introduction

The Am2900 Family

The Am2900 Family consists of a series of LSI building blocks designed for use in microprogrammed computers and controllers. Each device is designed to be expandable and sufficiently flexible to be suitable for emulation of many existing machines.

Figure 1 illustrates a typical system architecture. There are two "sides" to the system. At the left is the control circuitry and on the right is the data manipulation circuitry. The block labeled "2901 array" consists of the ALU, scratchpad registers, and data steering logic (all internal to the Am2901's), plus left/right shift control and carry lookahead circuit. Data is processed by moving it from main memory (not shown) into the 2901 registers, performing the required operations on it, and returning the result to main memory. Memory addresses may also be generated in the 2901's and sent out to the memory address register (MAR). The four status bits from the 2901's ALU are captured in the status register after each operation.

The logic on the left side is the control section of the computer. This is where the Am2909 is used. The entire system is controlled by a memory, usually PROM, which contains long words called microinstructions. Each microinstruction contains bits to control each of the data manipulation elements in the system. There are, for example, 9 bits for the 2901 instruction lines, 8 bits for the A and B register addresses, 2 or 3 bits to control the shifting multiplexers at the ends of the 2901 array, and bits to control the register enables on the MAR, instruction register, and various bus transceivers. When the bits in a microinstruction are applied to all the data elements and everything is clocked, then one small operation (such as a data transfer or a register-to-register add) will occur.

Each microinstruction contains not only bits to control the data hardware, but also bits to define the location in PROM of the next microinstruction to be executed. The fields are labeled in Fig. 1 as I, CC, and BA. The I field controls the sequencer. It indicates where the next address is located-the \( \mu \) PC, the stack, or the direct inputs and whether the stack is to be pushed or popped.

The CC field contains bits indicating the conditions under which the I field applies. These are compared with the condition codes in the status register and may cause modification to the I field. The comparing and modification occurs in the block labeled "control logic." Frequently this is just a PROM. The BA field is a branch address or the address of a subroutine.

Pipelining

The address for the microinstructions is generated by the sequencer, starting from a clock edge. The address goes from the sequencer to the ROM, and an access time later, the microinstruction is at the ROM outputs.
A pipeline register is a register placed on the output of the microprogram memory to essentially split the system in two. The pipeline register contains the microinstruction currently being executed\(^\text{1}\). (Refer to the circled numbers in Fig. 1.) The data manipulation control bits go out to the system elements and a portion of the microinstruction is returned to the sequencer to determine the address of the next microinstruction to be executed. That address\(^\text{2}\) is sent to the ROM, and the next microinstruction\(^\text{3}\) sits at the input of the pipeline register. So while the 2901's are executing one instruction, the next instruction is being fetched from ROM. Note that there is no sequential logic in the sequencer between the select lines and the output. This is important because the loop\(^\text{4}\) to \(^\text{5}\) to \(^\text{6}\) to \(^\text{7}\) must occur during a single clock cycle. During the same time, the loop from\(^\text{1}\) to \(^\text{8}\) must occur in the 2901's. These two paths are roughly the same (around 200 ns worst case for a 16-bit system). The presence of the pipeline register allows the microinstruction fetch to occur in parallel with the data operation rather than serially, allowing the clock frequency to be doubled.

The emulation of an existing machine by Fig. 1 works as follows. A sequence of microinstructions in the PROM is executed to fetch an instruction from main memory. This requires that the program counter, often in a 2901 working register, be sent to the memory address register and incremented. The data returned from memory is loaded into the instruction register. The contents of the instruction register are passed through a PROM or PLA to generate the address of the first microinstruction which must be executed to perform the required function. A branch to this address occurs through the sequencer. Several microinstructions may be executed to fetch data from memory, perform ALU operations, test for overflow, and so forth. Then a branch will be made back to the instruction fetch cycle. At this point, there may be branches to other sections of microcode. For example, the machine might test for an interrupt here and obtain an interrupt service routine address from another mapping ROM rather than start on the next machine instruction.

Am2901: Four-Bit Bipolar Microprocessor Slice

The device, as shown in Fig. 2, consists of a 16-word by 4-bit two-port RAM, a high-speed ALU, and the associated shifting, decoding, and multiplexing circuitry. The 9-bit microinstruction word is organized into three groups of 3 bits each and selects the ALU source operands, the ALU function, and the ALU destination register. The microprocessor is cascadable with full lookahead or with ripple carry, has three-state outputs, and provides various status flag outputs from the ALU. Advanced low-power Schottky processing is used to fabricate this 40-lead LSI chip.

Architecture

A detailed block diagram of the bipolar microprogrammable microprocessor structure is shown in Fig. 3. The circuit is a 4-bit slice cascadable to any number of bits. Therefore, all data paths within the circuit are 4 bits wide. The two key elements in the Fig. 3 block diagram are the 16-word by 4-bit two-port RAM and the high-speed ALU.

Data in any of the 16 words of the random-access memory (RAM) can be read from the A port of the RAM as controlled by the 4-bit A address field input. Likewise, data in any of the 16 words of the RAM as defined by the B address field input can be simultaneously read from the B port of the RAM. The same code can be applied to the A select field and B select field, in which case the identical file data will appear at both the RAM A port and B port outputs simultaneously.

When enabled by the RAM write enable (RAM EN), new data is always written into the field (word) defined by the B address field of the RAM. The RAM data-input field is driven by a three-input multiplexer. This configuration is used to shift the ALU output data (F) if desired. This three-input multiplexer scheme allows the data to be shifted up one bit position, shifted down one bit position, or not shifted in either direction.
The RAM A port data outputs and RAM B port data outputs drive separate 4-bit latches. These latches hold the RAM data while the clock input is LOW. This eliminates any possible race conditions that could occur while new data is being written into the RAM.

The high-speed Arithmetic Logic Unit (ALU) can perform three binary arithmetic and five logic operations on the two 4-bit words R and S. The R input field is driven from a two-input multiplexer, while the S input field is driven from a three-input multiplexer. Both multiplexers also have an inhibit capability; that is, no data is passed. This is equivalent to a zero source operand.

In Fig. 3, the ALU R-input multiplexer has the RAM A port and the direct data inputs (D) connected as inputs. Likewise, the ALU S-input multiplexer has the RAM A port, the RAM B port, and the Q register connected as inputs.

The two source operands not fully described as yet are the D input and Q input. The D input is the 4-bit-wide direct data-field input. This port is used to insert all data into the working registers inside the device. Likewise, this input can be used in the ALU to modify any of the internal data files. The Q register is a separate 4-bit file intended primarily for multiplication and division routines, but it can also be used as an accumulator or holding register for some applications.

This multiplexer scheme gives the capability of selecting various pairs of the A, B, D, Q, and O inputs as source operands to the ALU. These five inputs, when taken two at a time, result in ten possible combinations of source operand pairs. These combinations include AB, AD, AQ, AO, BD, BQ, BO, DQ, DO, and QO. It is apparent that AD, AQ, and AO are somewhat redundant with BD, BQ, and BO in that if the A address and B address are the same, the identical function results. Thus, there are only seven completely non-redundant source operand pairs for the ALU. The Am2901 microprocessor implements eight of these pairs. The microinstruction inputs used to select the ALU source operands are the I0, I1, and I2 inputs. The definitions of I0, I1, and I2 for the eight source operand combinations are as shown in Table 1. Also shown is the octal code for each selection.

The I3, I4, and I5 microinstruction inputs are used to select the ALU function. The definition of these inputs is shown in Table 2. The octal code is also shown for reference. The normal technique for cascading the ALU of several devices is in a lookahead carry mode. Carry generate, G, and carry propagate, P, are outputs of the device for use with a carry-lookahead generator such as the

**Table 1 ALU Source Operand Control**

<table>
<thead>
<tr>
<th>Microcode</th>
<th>ALU source operands</th>
</tr>
</thead>
<tbody>
<tr>
<td>( I_2 )</td>
<td>( I_1 )</td>
</tr>
<tr>
<td>L</td>
<td>L</td>
</tr>
<tr>
<td>L</td>
<td>L</td>
</tr>
<tr>
<td>L</td>
<td>H</td>
</tr>
</tbody>
</table>
Microcode | ALU function | Symbol
---|---|---
I₁₅ I₁₄ I₁₃ | Octal code | |
L L L | 0 | R plus S | R+S
L L H | 1 | S minus R | S–R
L H L | 2 | R minus S | R–S
L H H | 3 | R OR S | R V S
H L L | 4 | R AND S | R ∧ S

Fig. 3. Detailed Am2901 microprocessor block diagram.

Note: LSB is numbered “0”, MSB is numbered “7”.
Am2902 (‘182). A carry-out, C_{n+4}, is also generated and is available as an output for use as the carry flag in a status register. Both carry-in (C_n) and carry-out (C_{n+4}) are active HIGH.

The ALU has three other status-oriented outputs. These are F_3, F = 0, and overflow (OVR). The F_3 output is the most significant (sign) bit of the ALU and can be used to determine positive or negative results without enabling the three-state data outputs. F_3 is non-inverted with respect to the sign bit output Y_3. The F = 0 output is used for zero detect. It is an open-collector output and can be wire ORed between microprocessor slices. F = 0 is HIGH when all F outputs are LOW. The overflow output (OVR) is used to flag arithmetic operations that exceed the available 2’s complement number range. The overflow output (OVR) is HIGH when overflow exists; that is, when C_{n+3} and C_{n+4} are not the same polarity.

The ALU data output is routed to several destinations. It can be a data output of the device and it can also be stored in the RAM or the Q register. Eight possible combinations of ALU destination functions are available as defined by the I_6, I_7, and I_8 microinstruction inputs. These combinations are shown in Table 3.

The 4-bit data output field (Y) features three-state outputs and can be directly bus-organized. An output control (OE) is used to enable the three-state outputs. When OE is HIGH, the Y outputs are in the high-impedance state.

A two-input multiplexer is also used at the data output such that either the A port of the RAM or the ALU outputs (F) are selected at the device Y outputs. This selection is controlled by the I_6, I_7, and I_8 microinstruction inputs. Refer to Table 3 for the selected output for each microinstruction code combination.

As was discussed previously, the RAM inputs are driven from a three-input multiplexer. This allows the ALU outputs to be entered non-shifted, shifted up one position (multiplied by 2), or shifted down one position (divided by 2). The shifter has two ports; one is labeled RAM_0 and the other is labeled RAM_3. Both of these ports consist of a buffer-driver with a three-state output and an input to the multiplexer. Thus, in the shift-up mode, the RAM_3 buffer is enabled and the RAM_0 multiplexer input is enabled. Likewise, in the shift-down mode, the RAM_0 buffer and RAM_3 input are enabled. In the no-shift mode, both buffers are in the high-impedance state and the multiplexer inputs are not selected. This shifter is controlled from the I_6, I_7, and I_8 microinstruction inputs as defined in Table 3.
### Table 3 ALU Destination Control

<table>
<thead>
<tr>
<th>Microcode</th>
<th>RAM function</th>
<th>Q-register function</th>
<th>RAM shifter</th>
<th>Q shifter</th>
</tr>
</thead>
<tbody>
<tr>
<td>$I_8$ $I_7$ $I_6$ Octal Code</td>
<td>Shift</td>
<td>Load</td>
<td>Shift</td>
<td>Load</td>
</tr>
<tr>
<td>L L L 0</td>
<td>X</td>
<td>None</td>
<td>None</td>
<td>F→Q</td>
</tr>
<tr>
<td>L L H 1</td>
<td>X</td>
<td>None</td>
<td>X</td>
<td>None</td>
</tr>
<tr>
<td>L H L 2</td>
<td>None</td>
<td>F→B</td>
<td>X</td>
<td>None</td>
</tr>
<tr>
<td>L H H 3</td>
<td>None</td>
<td>F→B</td>
<td>X</td>
<td>None</td>
</tr>
<tr>
<td>H L L 4</td>
<td>Down</td>
<td>F/2→B</td>
<td>Down</td>
<td>Q/2→Q</td>
</tr>
<tr>
<td>H L H 5</td>
<td>Down</td>
<td>F/2→B</td>
<td>X</td>
<td>None</td>
</tr>
<tr>
<td>H H L 6</td>
<td>Up</td>
<td>2F→B</td>
<td>Up</td>
<td>2Q→Q</td>
</tr>
<tr>
<td>H H H 7</td>
<td>Up</td>
<td>2F→B</td>
<td>X</td>
<td>None</td>
</tr>
</tbody>
</table>

X-Don't care. Electrically, the shift pin is a TTL input internally connected to a three-state output which is in the high impedance state.

B-Register Addressed by B inputs.

Up is toward MSB. Down is toward LSB.
Similarly, the Q register is driven from a three-input multiplexer. In the no-shift mode, the multiplexer enters the ALU data into the Q register. In either the shift-up or shift-down mode, the multiplexer selects the Q register data appropriately shifted up or down. The Q shifter also has two ports; one is labeled Q₀ and the other is Q₃. The operation of these two ports is similar to the RAM shifter and is also controlled from I₆, I₇, and I₈ as shown in Table 3.

The clock input to the Am2901 controls the RAM, the Q register, and the A and B data latches. When enabled, data is clocked into the Q register on the LOW-to-HIGH transition of the clock. When the clock input is HIGH, the A and B latches are open and will pass whatever data is present at the RAM outputs. When the clock input is LOW, the latches are closed and will retain the last data entered. If the RAM EN is enabled, new data will be written into the RAM file (word) defined by the B address field when the clock input is LOW.

There are eight source operand pairs available to the ALU as selected by the I₀, I₁, and I₂ instruction inputs. The ALU can perform eight functions—five logic and three arithmetic. The I₃, I₄, and I₅ instruction inputs control this function selection. The carry input, Cₙ, also affects the ALU results when in the arithmetic mode. The Cₙ input has no effect in the logic mode. When I₀ through I₅ and Cₙ are viewed together, the matrix of Table 4 results. This matrix fully defines the ALU/source operand function for each state.

The ALU functions can also be examined on a "task" basis, i.e., add, subtract, AND, OR, etc. In the arithmetic mode, the carry will affect the function performed; while in the logic mode, the carry will have no bearing on the ALU output. Table 5 defines the various logic operations that the Am2901 can perform, and Table 6 shows the arithmetic functions of the device. Both carry-in LOW (Cₙ = 0) and carry-in HIGH (Cₙ = 1) are defined in these operations.

**Logic Functions for G, P, Cₙ₊₄, and OVR**

The four signals, G, P, Cₙ₊₄, and OVR are designed to indicate carry and overflow conditions when the Am2901 is in the add or subtract mode. Table 7 indicates the logic equations for these four signals for each of the eight ALU functions. The II and S inputs are the two inputs selected according to Table 1.
### Table 4 Source Operand and ALU Function Matrix

<table>
<thead>
<tr>
<th>Octal</th>
<th>ALU function</th>
<th>0 $A, Q$</th>
<th>1 $A, B$</th>
<th>2 $O, Q$</th>
<th>3 $O, B$</th>
<th>4 $O, A$</th>
<th>5 $D, A$</th>
<th>6 $D, Q$</th>
<th>7 $D, O$</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>$C_n = L$</td>
<td>$A + Q$</td>
<td>$A + B$</td>
<td>$Q$</td>
<td>$B$</td>
<td>$A$</td>
<td>$D + A$</td>
<td>$D + Q$</td>
<td>$D$</td>
</tr>
<tr>
<td></td>
<td>$R$ plus $S$</td>
<td>$A + Q - 1$</td>
<td>$A + B + 1$</td>
<td>$Q + 1$</td>
<td>$B + 1$</td>
<td>$A + 1$</td>
<td>$D - A + 1$</td>
<td>$D - Q + 1$</td>
<td>$D + 1$</td>
</tr>
<tr>
<td>1</td>
<td>$C_n = L$</td>
<td>$O - A - 1$</td>
<td>$B - A - 1$</td>
<td>$O - 1$</td>
<td>$B - 1$</td>
<td>$A - 1$</td>
<td>$D - A - 1$</td>
<td>$D - Q - 1$</td>
<td>$D - 1$</td>
</tr>
<tr>
<td>$S$ minus $R$</td>
<td>$Q - A$</td>
<td>$B - A$</td>
<td>$Q$</td>
<td>$B$</td>
<td>$A$</td>
<td>$A - D$</td>
<td>$Q - D$</td>
<td>$D - D$</td>
<td>$D - D$</td>
</tr>
<tr>
<td>$C_n = H$</td>
<td>$A - Q - 1$</td>
<td>$A - B - 1$</td>
<td>$O - 1$</td>
<td>$B - 1$</td>
<td>$A - 1$</td>
<td>$D - A$</td>
<td>$D - Q$</td>
<td>$D - 1$</td>
<td>$D - 1$</td>
</tr>
<tr>
<td>2</td>
<td>$C_n = L$</td>
<td>$A + Q$</td>
<td>$A + B$</td>
<td>$Q$</td>
<td>$B$</td>
<td>$A$</td>
<td>$D + A$</td>
<td>$D + Q$</td>
<td>$D$</td>
</tr>
<tr>
<td>$R$ minus $S$</td>
<td>$O - Q$</td>
<td>$O - Q$</td>
<td>$O - Q$</td>
<td>$O - Q$</td>
<td>$O - Q$</td>
<td>$O - Q$</td>
<td>$O - Q$</td>
<td>$O - Q$</td>
<td>$O - Q$</td>
</tr>
<tr>
<td>$C_n = H$</td>
<td>$A - Q$</td>
<td>$A - B$</td>
<td>$O$</td>
<td>$B$</td>
<td>$A$</td>
<td>$O - A$</td>
<td>$O - Q$</td>
<td>$O$</td>
<td>$O$</td>
</tr>
<tr>
<td>3</td>
<td>$R$ OR $S$</td>
<td>$A \lor Q$</td>
<td>$A \lor B$</td>
<td>$Q$</td>
<td>$B$</td>
<td>$A$</td>
<td>$D \lor A$</td>
<td>$D \lor Q$</td>
<td>$D$</td>
</tr>
<tr>
<td>4</td>
<td>$R$ AND $S$</td>
<td>$A \land Q$</td>
<td>$A \land B$</td>
<td>$O$</td>
<td>$O$</td>
<td>$O$</td>
<td>$D \land A$</td>
<td>$D \land Q$</td>
<td>$D$</td>
</tr>
<tr>
<td>5</td>
<td>$R$ AND $S$</td>
<td>$A \land Q$</td>
<td>$A \land B$</td>
<td>$O$</td>
<td>$O$</td>
<td>$O$</td>
<td>$D \land A$</td>
<td>$D \land Q$</td>
<td>$D$</td>
</tr>
<tr>
<td>6</td>
<td>$R$ EX-OR $S$</td>
<td>$A \oplus Q$</td>
<td>$A \oplus B$</td>
<td>$Q$</td>
<td>$B$</td>
<td>$A$</td>
<td>$D \oplus A$</td>
<td>$D \oplus Q$</td>
<td>$D$</td>
</tr>
<tr>
<td>7</td>
<td>$R$ EX-NOR $S$</td>
<td>$A \iff Q$</td>
<td>$A \iff B$</td>
<td>$Q$</td>
<td>$B$</td>
<td>$A$</td>
<td>$D \iff A$</td>
<td>$D \iff Q$</td>
<td>$D$</td>
</tr>
</tbody>
</table>

$+$ = Plus, $-$ = Minus, $\lor$ = OR, $\land$ = AND, $\oplus$ = EX-OR

### Table 5 ALU Logic Mode Functions ($C_n$ Irrelevant)

<table>
<thead>
<tr>
<th>Octal</th>
<th>Group</th>
<th>Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>AND</td>
<td>$A \land Q$</td>
</tr>
<tr>
<td>1</td>
<td>AND</td>
<td>$A \land B$</td>
</tr>
<tr>
<td>2</td>
<td>$O$</td>
<td>$O$</td>
</tr>
<tr>
<td>3</td>
<td>$B$</td>
<td>$B$</td>
</tr>
<tr>
<td>4</td>
<td>$Q$</td>
<td>$Q$</td>
</tr>
<tr>
<td>5</td>
<td>$D$</td>
<td>$D$</td>
</tr>
<tr>
<td>6</td>
<td>$O$</td>
<td>$O$</td>
</tr>
<tr>
<td>7</td>
<td>$B$</td>
<td>$B$</td>
</tr>
<tr>
<td>8</td>
<td>$Q$</td>
<td>$Q$</td>
</tr>
<tr>
<td>9</td>
<td>$D$</td>
<td>$D$</td>
</tr>
<tr>
<td>10</td>
<td>$O$</td>
<td>$O$</td>
</tr>
<tr>
<td>11</td>
<td>$B$</td>
<td>$B$</td>
</tr>
<tr>
<td>12</td>
<td>$Q$</td>
<td>$Q$</td>
</tr>
<tr>
<td>13</td>
<td>$D$</td>
<td>$D$</td>
</tr>
<tr>
<td>14</td>
<td>$O$</td>
<td>$O$</td>
</tr>
<tr>
<td>15</td>
<td>$B$</td>
<td>$B$</td>
</tr>
<tr>
<td>16</td>
<td>$Q$</td>
<td>$Q$</td>
</tr>
<tr>
<td>17</td>
<td>$D$</td>
<td>$D$</td>
</tr>
</tbody>
</table>

$+$ = Plus, $-$ = Minus, $\lor$ = OR, $\land$ = AND, $\oplus$ = EX-OR

"ZERO"
**Pin Definitions**

A0-3  The four address inputs to the register stack used to select one register whose contents are displayed through the A port.

B0-3  The four address inputs to the register stack used to select one register whose contents are displayed through the B port and into which new data can be written when the clock goes LOW.

I0-8  The nine instruction control lines to the Am2901, used to determine what data sources will be applied to the ALU (I0,1,2), what functions the ALU will perform (I3,4,5), and what data is to be deposited in the Q register or the register stack (I6,7,8).

O3, RAM3  A shift line at the MSB of the Q register (Q3) and the RAM3 register stack (RAM3). Electrically these lines are three-state outputs connected to TTL inputs internal to the Am2901. When the destination code on I6,7,8 indicates an up shift (octal 6 or 7), the three-state outputs are enabled and the MSB of the Q register is available on the Q3 pin and the MSB of the ALU output is available on the RAM3 pin. Otherwise, the three-state outputs are OFF (high-impedance) and the pins are electrically LS-TTL inputs. When the destination code calls for a down shift, the pins are used as the data inputs to the MSB of the Q register (octal 4) and RAM (octal 4 or 5).

O0, RAM0  Shift lines like Q3 and RAM8, but at the LSB of the Q register and RAM. These pins are tied to the Q3 and RAM3 pins of the adjacent device to transfer data between devices for up and down shifts of the Q register and ALU data.

D0~3  Direct data inputs. A 4-bit data field which may be selected as one of the ALU data sources for entering data into the Am2901. D0 is the LSB.

Y0-3  The four data outputs of the Am2901. These are three-state output lines. When enabled, they display either the four outputs of the ALU or the data on the A port of the register stack, as determined by the destination code I6,7,8.

OE  Output enable. When OE is HIGH, the Y outputs are OFF; when OE is LOW, the Y outputs are active (HIGH or LOW).

P, G  The carry generate and propagate outputs of the Am2901's ALU. These signals are used with the Am2902 for carry-lookahead. See Table 7 for the logic equations.
OVR  Overflow. This pin is logically the Exclusive-OR of the carry-in and carry-out of the MSB of the ALU. At the most significant end of the word, this pin indicates that the result of an arithmetic 2's complement operation has overflowed into the sign bit. See Table 7 for logic equation.

F = 0  This is an open-collector output which goes **HIGH (OFF)** if the four ALU outputs F_0~3 are all LOW. In positive logic, it indicates the result of an ALU operation is 0.

C_n  The carry-in to the Am2901's ALU.

C_{n+4}  The carry-out of the Am2901's ALU. See Table 7 for equations.

CP  The clock to the Am2901. The Q register and register stack outputs change on the clock LOW-to-
HIGH transition. The clock LOW time is internally the write enable to the 16 × 4 RAM which comprises the "master" latches of the register stack. While the clock is LOW, the "slave" latches on the RAM outputs are closed, storing the data previously on the RAM outputs. This allows synchronous master-slave operation of the register stack.

### Expansion of The Am2901

Any number of Am2901's can be interconnected to form CPU's of 12, 16, 24, 36, or more bits, in 4-bit increments. Figure 4 illustrates the interconnection of three Am2901's to form a 12-bit CPU, using ripple carry. Figure 5 illustrates a 16-bit CPU using carry lookahead, and Fig. 6 is the general carry lookahead scheme for long words.

With the exception of the carry interconnection, all expansion schemes are the same. The Q₃ and RAM₃ pins are bidirectional left/right shift lines at the MSB of the device. For all devices except the most significant, these lines are connected to the Q₀ and RAM₀ pins of the adjacent more significant device. These connections allow the Q registers of all Am2901's to be shifted left or right as a contiguous n-bit register, and also allow the ALU output data to be shifted.

---

### Table 6: ALU Arithmetic Mode Functions

<table>
<thead>
<tr>
<th>Octal</th>
<th>Group</th>
<th>Function</th>
<th>Group</th>
<th>Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>0</td>
<td>ADD</td>
<td>0</td>
<td>A + Q</td>
</tr>
<tr>
<td>0</td>
<td>1</td>
<td>A + B</td>
<td>0</td>
<td>A + B + 1</td>
</tr>
<tr>
<td>0</td>
<td>5</td>
<td>D + A</td>
<td>0</td>
<td>D + A + 1</td>
</tr>
<tr>
<td>0</td>
<td>6</td>
<td>D + Q</td>
<td>0</td>
<td>D + Q + 1</td>
</tr>
<tr>
<td>0</td>
<td>2</td>
<td>PASS</td>
<td>0</td>
<td>Q + 1</td>
</tr>
<tr>
<td>0</td>
<td>3</td>
<td>B</td>
<td>0</td>
<td>B + 1</td>
</tr>
<tr>
<td>0</td>
<td>4</td>
<td>A</td>
<td>0</td>
<td>A - 1</td>
</tr>
<tr>
<td>0</td>
<td>7</td>
<td>D</td>
<td>0</td>
<td>D + 1</td>
</tr>
<tr>
<td>1</td>
<td>2</td>
<td>Decrement</td>
<td>0</td>
<td>B - 1</td>
</tr>
<tr>
<td>1</td>
<td>3</td>
<td>PASS</td>
<td>0</td>
<td>B</td>
</tr>
<tr>
<td>1</td>
<td>4</td>
<td>A - 1</td>
<td>0</td>
<td>A</td>
</tr>
<tr>
<td>1</td>
<td>7</td>
<td>D - 1</td>
<td>0</td>
<td>D</td>
</tr>
<tr>
<td>2</td>
<td>2</td>
<td>1's compl</td>
<td>1</td>
<td>-Q - 1</td>
</tr>
<tr>
<td>2</td>
<td>3</td>
<td>2's compl</td>
<td>1</td>
<td>-Q</td>
</tr>
<tr>
<td>2</td>
<td>4</td>
<td>A - 1</td>
<td>1</td>
<td>-Q - 1</td>
</tr>
<tr>
<td>2</td>
<td>5</td>
<td>D - 1</td>
<td>1</td>
<td>-Q</td>
</tr>
<tr>
<td>2</td>
<td>6</td>
<td>D - 1</td>
<td>1</td>
<td>-Q</td>
</tr>
<tr>
<td>10</td>
<td>0</td>
<td>Subtract</td>
<td>1</td>
<td>B - A</td>
</tr>
<tr>
<td>10</td>
<td>1</td>
<td>Subtract</td>
<td>1</td>
<td>B - A</td>
</tr>
<tr>
<td>10</td>
<td>5</td>
<td>A - D - 1</td>
<td>1</td>
<td>A - C</td>
</tr>
<tr>
<td>10</td>
<td>6</td>
<td>Q - D - 1</td>
<td>1</td>
<td>Q - D</td>
</tr>
<tr>
<td>20</td>
<td>0</td>
<td>A - Q - 1</td>
<td>1</td>
<td>A - Q</td>
</tr>
<tr>
<td>20</td>
<td>1</td>
<td>A - B - 1</td>
<td>1</td>
<td>A - B</td>
</tr>
<tr>
<td>20</td>
<td>5</td>
<td>D - A - 1</td>
<td>1</td>
<td>D - A</td>
</tr>
<tr>
<td>20</td>
<td>6</td>
<td>D - Q - 1</td>
<td>1</td>
<td>D - Q</td>
</tr>
</tbody>
</table>

### Table 7: Definitions (× = OR)

<table>
<thead>
<tr>
<th>Iₐux</th>
<th>Function</th>
<th>Iₐux</th>
<th>Function</th>
<th>Iₐux</th>
<th>Function</th>
<th>Iₐux</th>
<th>Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>R × S</td>
<td>0</td>
<td>0 × A</td>
<td>0</td>
<td>0 × A</td>
<td>0</td>
<td>0 × A</td>
</tr>
<tr>
<td>1</td>
<td>R × S</td>
<td>0</td>
<td>0 × A</td>
<td>0</td>
<td>0 × A</td>
<td>0</td>
<td>0 × A</td>
</tr>
<tr>
<td>2</td>
<td>0 × S</td>
<td>0</td>
<td>0 × A</td>
<td>0</td>
<td>0 × A</td>
<td>0</td>
<td>0 × A</td>
</tr>
<tr>
<td>3</td>
<td>G × S</td>
<td>0</td>
<td>0 × A</td>
<td>0</td>
<td>0 × A</td>
<td>0</td>
<td>0 × A</td>
</tr>
<tr>
<td>4</td>
<td>R × S</td>
<td>0</td>
<td>0 × A</td>
<td>0</td>
<td>0 × A</td>
<td>0</td>
<td>0 × A</td>
</tr>
<tr>
<td>5</td>
<td>R × S</td>
<td>0</td>
<td>0 × A</td>
<td>0</td>
<td>0 × A</td>
<td>0</td>
<td>0 × A</td>
</tr>
<tr>
<td>6</td>
<td>0 × S</td>
<td>0</td>
<td>0 × A</td>
<td>0</td>
<td>0 × A</td>
<td>0</td>
<td>0 × A</td>
</tr>
<tr>
<td>7</td>
<td>R × S</td>
<td>0</td>
<td>0 × A</td>
<td>0</td>
<td>0 × A</td>
<td>0</td>
<td>0 × A</td>
</tr>
</tbody>
</table>

Note: [f₁, f₂, f₃, f₄] = 0, f₅ = 0, f₆, f₇, f₈, f₉, f₁₀, f₁₁, f₁₂, f₁₃
left or right as a contiguous n-bit word prior to storage in the RAM. At the LSB and MSB of
the CPU, the shift pins should be connected to three-state multiplexers which can be
controlled by the microcode to select the appropriate input signals to the shift inputs. (See Fig.
7.)

The open-collector F = 0 outputs of all the Am2901's are connected together and to a pull-up
resistor. This line will go HIGH if and only if the output of the ALU contains all zeros. Most
systems will use this line as the Z (zero) bit of the processor status word.

The overflow and F8 pins are generally used only at the most significant end of the array, and
are meaningful only when 2's complement signed arithmetic is used. The overflow pin is the
Exclusive-OR of the carry-in and carry-out of the sign bit (MSB). It will go HIGH when the
result of an arithmetic operation is a number requiring more bits than are available, causing
the sign bit to be erroneous. This is the overflow (V) bit of the processor status word. The F8
pin is the MSB of the ALU output. It is the sign of the result in 2's complement notation, and
should be used as the negative (N) bit of the processor status word.

The carry-out from the most significant Am2901 (Cn+4 pin) is the carry-out from the array,
and is used as the carry (C) bit of the processor status word.

Carry interconnections between devices may use either ripple carry or carry lookahead. For
ripple carry, the carry-out (Cn+4) of each device is connected to the carry-in (Cn) of the next
more significant device. Carry lookahead uses the Am2901 lookahead carry generator. The
scheme is identical with that used with the 74181/74182. Figures 5 and 6 illustrate single- and
multiple-level lookahead.

**Shift I/O Lines at the End of the Array**

The Q-register and RAM left/right shift data transfers occur between devices over
bidirectional lines. At the ends of the array, three-state multiplexers are used to select what
the new inputs to the registers should be during shifting. Figure 7 shows two Am25LS253
dual four-input multiplexers connected to provide four shift modes. Instruction bit 17 (from
the Am2901) is used to select whether the left-shift multiplexer or the right-shift multiplexer is
active. (See Table 8.) The four shift modes in this example are:

<table>
<thead>
<tr>
<th>Mode</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>Zero</td>
<td>A LOW is shifted into the MSB of the RAM on a down shift. If the Q register is also shifted then a LOW is deposited in the Q-register MSB. If the RAM or both registers are shifted up LOWs are placed in the LSBs.</td>
</tr>
<tr>
<td>One</td>
<td>Same as zero but a HIGH level is deposited in the LSB or MSB.</td>
</tr>
<tr>
<td>Rotate</td>
<td>A single-precision rotate. The RAM MSB shifts into the LSB on a right shift and the LSB shifts into the MSB on a left shift. The Q register if shifted, will rotate in the same manner.</td>
</tr>
</tbody>
</table>
Fig. 4. Three Am2901's used to construct 12-bit CPU with ripple carry. Corresponding A, B, and 1 pins on all devices are connected together.

Fig. 5. Four Am2901's in a 16-bit CPU using the Am2902 for carry lookahead.

Fig. 6. Carry lookahead scheme for 48-bit CPU using 12 Am2901's. The carry-out flag (C48) should be taken from the lower Am2902 rather than the rightmost Am2901 for higher speed.

Fig. 7. Three-state multiplexers used on shift I/O lines.
Arithmetic: A double-length arithmetic shift if Q is also shifted. On an up shift a zero is loaded into the Q-register LSB and the Q-register MSB is loaded into the RAM LSB. On a down shift, the RAM LSB is loaded into the Q-register MSB and the ALU output MSB ($F_n$, the sign bit) is loaded into the RAM MSB. (This same bit will also be in the next less significant RAM bit.)

**Hardware Multiplication**

Figure 8 illustrates the interconnections for a hardware multiplication using the Am2901. The system shown uses two devices for $8 \times 8$ multiplication, but the expansion to more bits is simple: the significant connections are at the LSB and MSB only.

The basic technique used is the "add and shift" algorithm. One clock cycle is required for each bit of the multiplier. On each cycle, the LSB of the multiplier is examined; if it is a 1, then the multiplicand is added to the partial product to generate a new partial product. The partial product is then shifted one place toward the LSB, and the multiplier is also shifted one place toward the LSB. The old LSB of the multiplier is discarded. The cycle is then repeated on the new LSB of the multiplier available at $Q_0$.

The multiplier is in the Am2901 Q register. The multiplicand is in one of the registers in the register stack, $R_a$. The product will be developed in another of the registers in the stack, $R_b$.

The A address inputs are used to address the multiplicand in $R_a$, and the B address inputs are used to address the partial product in $R_b$. On each cycle, $R_a$ is conditionally added to $R_b$, depending on the LSB of Q as read from the $Q_0$ output, and both the Q and the ALU output are shifted left one place. The instruction lines to the Am2901 on every cycle will be:

- $I_{8,7,6} = 4$ (shift register stack input and Q register left)
- $I_{5,4,3} = 0$ (Add)
- $I_{2,1,0} = 1$ or $3$ (select A, B or O, B as ALU sources)

Figure 8 shows the connections for multiplication. The circled numbers refer to the paragraphs below.

1. The adjacent pins of the Q register and RAM shifters are connected together so that the Q registers of both (or all) Am2901's shift left or right as a unit. Similarly, the entire

**Table 8**

<table>
<thead>
<tr>
<th>Code</th>
<th>Source of new data</th>
<th>Type</th>
</tr>
</thead>
<tbody>
<tr>
<td>$I_7$</td>
<td>$S_1$</td>
<td>$S_0$</td>
</tr>
<tr>
<td>$H$</td>
<td>$L$</td>
<td>$L$</td>
</tr>
<tr>
<td>$H$</td>
<td>$L$</td>
<td>$H$</td>
</tr>
<tr>
<td>$H$</td>
<td>$H$</td>
<td>$L$</td>
</tr>
<tr>
<td>$H$</td>
<td>$H$</td>
<td></td>
</tr>
</tbody>
</table>
8-bit (or more) ALU output can be shifted as a unit prior to storage in the register stack.

2 The shift output at the LSB of the Q register determines whether the ALU source operands will be A and B (add multiplicand to partial product) or 0 and B (add nothing to partial product). Instruction bit I1 can select between A, B or 0, B as the source operands; it can be driven directly from the complement of the LSB of the multiplier.

3 As the new partial product appears at the input to the register stack, it is shifted left by the RAM shifter. The new LSB of the partial product, which is complete and will not be affected by future operations, is available on the RAM0 pin. This signal is returned to the MSB of the Q register. On each cycle then, the just-completed LSB of the product is deposited in the MSB of the Q register; the Q register fills with the least significant half of the product.

4 As the ALU output is shifted down on each cycle, the sign bit of the new partial product should be inserted in the RAM MSB shift input. The F3 flag will be the correct sign of the partial product unless overflow has occurred. If overflow occurs during an addition or subtraction, the OVR flag will go HIGH and F3 is not the sign of the result. The sign of the result must then be the complement of F3. The correct sign bit to shift into the MSB of the partial product is therefore F3 ⊕ OVR; that is, F3 if overflow has not occurred and F3 if overflow has occurred. On the last cycle, when the MSB of the multiplier is examined, a conditional subtraction rather than addition should be performed, because the sign bit of the multiplier carries negative rather than positive arithmetic weight.

\[ Y = -Y_1 2^i + Y_{i-1} 2^{i-1} + \ldots + Y_0 2^0 \]
This scheme will produce a correct 2’s complement product for all multiplicands and multipliers in 2’s complement notation.

Figure 9 is a table showing the input states of the Am2901’s for each step of a signed 2’s complement multiplication.

Am2909 Microprogram Sequencer

General Description

The Am2909 is a 4-bit-wide address controller intended for sequencing through a series of microinstructions contained in a ROM or PROM. Two Am2909’s may be interconnected to generate an 8-bit address (256 words), and three may be used to generate a 12-bit address (4096 words). Figure 10 is a block diagram of the Am2909.

The Am2909 can select an address from any of four sources. They are: (1) a set of external direct inputs (D); (2) external data from the R inputs, stored in an internal register; (3) a 4-word-deep push/pop stack; or (4) a program counter register (which usually contains the last address plus one). The push/pop stack includes certain control lines so that it can efficiently execute nested subroutine linkages. Each of the four outputs can be ORed with an external input for conditional skip or branch instructions, and a separate line forces the outputs to all zeros. The outputs are three-state.

Architecture of the Am2909

A detailed logic diagram is shown in Fig. 11. The device contains a four-input multiplexer that is used to select either the address register, direct inputs, microprogram counter, or file as the source of the next microinstruction address. This multiplexer is controlled by the S₀ and S₁ inputs.

The address register consists of four D-type, edge-triggered flip-flops with a common clock enable. When the address register enable is LOW, new data is entered into the register on the clock LOW-to-HIGH transition. The address register is available at the multiplexer as a
source for the next microinstruction address. The direct input is a 4-bit field of inputs to the multiplexer and can be selected as the next microinstruction address.

The Am2909 contains a microprogram counter (μPC) that is composed of a 4-bit incrementer followed by a 4-bit register. The incrementer has carry-in (Cₙ) and carry-out (Cₙ₊₄) such that cascading to larger word lengths is straightforward. The μPC can be used in either of two ways. When the least significant carry-in to the increment is HIGH, the microprogram register is loaded on the next clock cycle with the current Y output word plus one (Y + 1 → μPC). Thus sequential microinstructions can be executed. If this least significant C₀ is LOW, the incrementer passes the Y output word unmodified and the micro program register is loaded with the same Y word on the next cycle (Y → μPC). Thus, the same microinstruction can be executed any number of times by using the least significant Cₙ as the control.

The last source available at the multiplexer input is the 4 × 4 file (stack). The file is used to provide return address linkage when executing microsubroutines. The file contains a built-in stack pointer (SP) which always points to the last file word written. This allows stack reference operations (looping) to be performed without a push or pop.

The stack pointer operates as an up/down counter with separate push/pop and file enable inputs. When the file enable input is LOW and the push/pop input is HIGH, the PUSH operation is enabled. This causes the stack pointer to increment and the file to be written with the required return linkage—the next microinstruction address following the subroutine jump which initiated the PUSH.

If the file enable input is LOW and the push/pop control is LOW, a POP operation occurs. This implies the usage of the return linkage during this cycle and thus a return from subroutine. The next LOW-to-HIGH clock transition causes the stack pointer to decrement. If the file enable is HIGH, no action is taken by the stack pointer regardless of any other input.

The stack pointer linkage is such that any combination of pushes, pops, and stack references can be achieved. One microinstruction subroutines can be performed. Since the stack is 4 words deep, up to four microsubroutines can be nested.
The ZERO input is used to force the four outputs to the binary zero state. When the ZERO input is LOW, all Y outputs are LOW regardless of any other inputs (except OE). Each Y output bit also has a separate OR input such that a conditional logic 1 can be forced at each Y output. This allows jumping to different microinstructions on programmed conditions.

The Am2909 features three-state Y outputs. These can be particularly useful in military designs requiring external ground support equipment (GSE) to provide automatic checkout of the microprocessor. The internal control can be placed in the high-impedance state, and preprogrammed sequences of micro instructions can be executed via external access to the control ROM/PROM.

**Definition of Terms**

A set of symbols is used to represent various internal and external registers and signals used with the Am2909. Since its principal application is as a controller for a microprogram store, it is necessary to define some signals associated with the microcode itself. Figure 12 illustrates the basic interconnection of
Am2909, memory, and microinstruction register. The definitions here apply to this architecture.

*Inputs to Am2909*

- $S_1S_0$: Control lines for address source selection.
- FE, PUP: Control lines for push/pop stack.
- RE: Enable line for internal address register.
- OR: Logic OR inputs on each address output line.
- ZERO: Logic AND input on the output lines.
- OE: Output enable. When OE is HIGH, the Y outputs are OFF (high impedance).
- $C_n$: Carry-in to the incrementer.
- $R_i$: Inputs to the internal address register.
- $D_i$: Direct inputs to the multiplexer.
- CP: Clock input to the AR and $\mu$ PC register and push-pop stack.

*Outputs from the Am2909*

- $Y_i$: Address outputs from Am2909 (address inputs to control memory).
- $C_{n+4}$: Carry-out from the incrementer.

*Internal Signals*

- $\mu$ PC: Contents of the microprogram counter.
REG: Contents of the register.

STK0-STK3: Contents of the push/pop stack. By definition, the word in the $4 \times 4$ file addressed by the stack pointer is STK0. Conceptually data is pushed into the stack at STK0; a subsequent push moves STK0 to STK1; a pop implies STK3 $\rightarrow$ STK2 $\rightarrow$ STK1 $\rightarrow$ STK0. Physically, only the stack pointer changes when a push or pop is performed. The data does not move. I/O occurs at STK0.

SP: Contents of the stack pointer.

External to the Am2909

A: Address to the control memory.

I(A): Instruction in control memory at address A.

$\mu$: Contents of the microword register (at output of control memory). The microword register contains the instruction currently being executed.

WR: Time period (cycle) n.

Operation of the Am2909

Figure 13 lists the select codes for the multiplexer. The two bits applied from the microword register (and additional combinational logic for branching) determine which data source contains the address for the next microinstruction. The contents of the selected source will appear on the Y outputs. Figure 13 also shows the truth table for the output control and for the control of the push/pop stack. Table 9 shows in detail the effect of $S_0$, $S_1$, FE, and PUP on the Am2909. These four signals define what address appears on the Y outputs and what the state of all the internal registers will be following the clock LOW-to-HIGH edge. In this illustration, the
microprogram counter is assumed to contain initially some word J, the address register some word K, and the four words in the push/pop stack $R_a$ through $R_d$.

Figure 14 illustrates the execution of a subroutine using the Am2909. The configuration of Fig. 11 is assumed. The instruction being executed at any given time is the one contained in the microword register ($\mu$ WR). The contents of the $\mu$ WR also control (indirectly, perhaps) the four signals $S_0$, $S_1$, FE, and PUP. The starting address of the subroutine is applied to the D inputs of the Am2909 at the appropriate time.

In the columns on the left is the sequence of microinstructions to be executed. At address $J + 2$, the sequence control portion of the microinstruction contains the command "Jump to subroutine at A." At the time $T_2$, this is in the $\mu$ WR, and the Am2909 inputs are set up to execute the jump and save the return address. The subroutine address $A$ is applied to the D inputs from the $\mu$ WR and appears on the Y outputs. The first instruction of the subroutine, $I(A)$, is accessed and is at the inputs of the $\mu$ WR. On the next clock transition, $I(A)$ is loaded into the $\mu$ WR for execution, and the return address $J + 3$ is pushed onto the stack. The return instruction is executed at $T_5$. 

<table>
<thead>
<tr>
<th>OCTAL</th>
<th>$S_1$</th>
<th>$S_2$</th>
<th>SOURCE FOR Y OUTPUTS</th>
<th>SYMBOL</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>L</td>
<td>L</td>
<td>Microprogram Counter</td>
<td>µPC</td>
</tr>
<tr>
<td>1</td>
<td>L</td>
<td>H</td>
<td>Register</td>
<td>REG</td>
</tr>
<tr>
<td>2</td>
<td>H</td>
<td>L</td>
<td>Push/Pop stack</td>
<td>STK0</td>
</tr>
<tr>
<td>3</td>
<td>H</td>
<td>H</td>
<td>Direct Inputs</td>
<td>$D_i$</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>$C_{\mu}$</th>
<th>$ZERO$</th>
<th>$SE$</th>
<th>$Y_j$</th>
</tr>
</thead>
<tbody>
<tr>
<td>X</td>
<td>X</td>
<td>H</td>
<td>Z</td>
</tr>
<tr>
<td>X</td>
<td>L</td>
<td>L</td>
<td>L</td>
</tr>
<tr>
<td>H</td>
<td>L</td>
<td>H</td>
<td>H</td>
</tr>
<tr>
<td>L</td>
<td>L</td>
<td>L</td>
<td>Source selected by $S_0$, $S_1$</td>
</tr>
</tbody>
</table>

$2 = \text{High Impedance}$

H = High
L = Low
X = Don’t Care
### Table 9 Output and Internal Next-Cycle Register States for Am2909

<table>
<thead>
<tr>
<th>Cycle</th>
<th>$S_f$, $S_b$, FE, PUP</th>
<th>$\mu PC$</th>
<th>REG</th>
<th>STK0</th>
<th>STK1</th>
<th>STK2</th>
<th>STK3</th>
<th>$Y_{OUT}$</th>
<th>Comment</th>
<th>Principal use</th>
</tr>
</thead>
<tbody>
<tr>
<td>N</td>
<td>0 0 0 0</td>
<td>J</td>
<td>K</td>
<td>Ra</td>
<td>Rb</td>
<td>Re</td>
<td>Rd</td>
<td>J</td>
<td>Pop stack</td>
<td>End loop</td>
</tr>
<tr>
<td>N+1</td>
<td>. . . .</td>
<td>J+1</td>
<td>K</td>
<td>Rb</td>
<td>Re</td>
<td>Rd</td>
<td>Ra</td>
<td>. .</td>
<td></td>
<td></td>
</tr>
<tr>
<td>N</td>
<td>0 0 0 1</td>
<td>J</td>
<td>K</td>
<td>Ra</td>
<td>Rb</td>
<td>Re</td>
<td>Rd</td>
<td>J</td>
<td>Push $\mu$ PC</td>
<td>Setup loop</td>
</tr>
<tr>
<td>N+1</td>
<td>. . . .</td>
<td>J+1</td>
<td>K</td>
<td>J</td>
<td>Ra</td>
<td>Rb</td>
<td>Re</td>
<td>. .</td>
<td></td>
<td></td>
</tr>
<tr>
<td>N</td>
<td>0 0 1 X</td>
<td>J</td>
<td>K</td>
<td>Ra</td>
<td>Rb</td>
<td>Re</td>
<td>Rd</td>
<td>J</td>
<td>Continue</td>
<td></td>
</tr>
<tr>
<td>N+1</td>
<td>. . . .</td>
<td>J+1</td>
<td>K</td>
<td>Ra</td>
<td>Rb</td>
<td>Re</td>
<td>Rd</td>
<td>. .</td>
<td></td>
<td></td>
</tr>
<tr>
<td>N</td>
<td>0 1 0 0</td>
<td>J</td>
<td>K</td>
<td>Ra</td>
<td>Rb</td>
<td>Re</td>
<td>Rd</td>
<td>K</td>
<td>Pop stack;</td>
<td>End loop</td>
</tr>
<tr>
<td>N+1</td>
<td>. . . .</td>
<td>K+1</td>
<td>K</td>
<td>Rb</td>
<td>Re</td>
<td>Rd</td>
<td>Ra</td>
<td>. .</td>
<td>Use AR for address</td>
<td></td>
</tr>
<tr>
<td>N</td>
<td>0 1 0 1</td>
<td>J</td>
<td>K</td>
<td>Ra</td>
<td>Rb</td>
<td>Re</td>
<td>Rd</td>
<td>K</td>
<td>Push $\mu$ PC;</td>
<td>JSR AR</td>
</tr>
<tr>
<td>N+1</td>
<td>. . . .</td>
<td>K+1</td>
<td>K</td>
<td>J</td>
<td>Ra</td>
<td>Rb</td>
<td>Re</td>
<td>. .</td>
<td>Jump to address in AR</td>
<td></td>
</tr>
<tr>
<td>N</td>
<td>0 1 1 X</td>
<td>J</td>
<td>K</td>
<td>Ra</td>
<td>Rb</td>
<td>Re</td>
<td>Rd</td>
<td>K</td>
<td>Jump to address in AR</td>
<td>JMP AR</td>
</tr>
<tr>
<td>N+1</td>
<td>. . . .</td>
<td>K+1</td>
<td>K</td>
<td>Ra</td>
<td>Rb</td>
<td>Re</td>
<td>Rd</td>
<td>. .</td>
<td>Jump to address in AR</td>
<td></td>
</tr>
<tr>
<td>N</td>
<td>1 0 0 0</td>
<td>J</td>
<td>K</td>
<td>Ra</td>
<td>Rb</td>
<td>Re</td>
<td>Rd</td>
<td>Ra</td>
<td>Jump to address in STK0;</td>
<td>RTS</td>
</tr>
<tr>
<td>N+1</td>
<td>. . . .</td>
<td>Ra+1</td>
<td>K</td>
<td>Rb</td>
<td>Re</td>
<td>Rd</td>
<td>Re</td>
<td>. .</td>
<td>Pop stack</td>
<td></td>
</tr>
</tbody>
</table>
N 1 0 0 1 J K Ra Rb Rc Rd R Jump to address in STK0;
N+1 ........ Ra+1 K J Ra Rb Ra ... Push μ PC
N 1 0 1 X J K Ra Rb Rc Rd Ra Jump to address in STK0 Stack ref (loop)
N + 1 ........ Ra+1 K Ra Rb Rc Rd ... 
N 1 1 0 0 J K Ra Rb Rc Rd D Pop stack; End loop
N +1 ........ D+1 K Rb Rc Rd Ra ... Jump to address on D
N 1 1 0 1 J K Ra Rb Rc Rd D Jump to address on D JSR D
N +1 ........ D+1 K J Ra Rb Rc ... Push μ PC
N 1 1 1 X J K Ra Rb Rc Rd D Jump to address on D JMP D
N +1 ........ D+1 K Ra Rb Rc Rd ... 

X = Don't care, 0 = LOW, 1 = HIGH, Assume Cn = HIGH

Note: STK0 is the location addressed by the stack pointer.
**APPENDIX 1 AM2909 ISP DESCRIPTION**

**AR200**

- **Start**

  - Description of AM2909 bit slice microprogram sequencer.
  - The AR200 is a 32-bit dual 8-bit address controller.
  - The controller is designed to be used with the AM2909 microprocessor.
  - It has one instruction memory.
  - Simulation of the AR200 state is possible, but operation of any
    computer system requires that this description be joined with
    appropriate signals and connections.

- **Macro States**
  - MICRO(10) = micro(10) = 10000: Microprogram counter
  - REG(s) = registers = 00000: Address register
  - SP(s) = stack = 00001: Stack pointer
  - SRM(s) = control register file

- **External States**
  - E Quân = Interrupts
  - E SAVE = Save
  - E PVC = Power
  - E XCR = XCR
  - E STK = STK
  - E YCR = YCR
  - E RES = RES
  - E DBM = DBM

- **Internal State**
  - E Quân = Interrupts
  - E SAVE = Save
  - E PVC = Power
  - E XCR = XCR
  - E STK = STK
  - E YCR = YCR
  - E RES = RES
  - E DBM = DBM

- **Initialization**
  - Initialization
  - INC(D) = Increment
  - MOV (A) = Move

- **Instruction Variables**
  - INC(D) = Increment
  - MOV (A) = Move

- **Operation Cycle**

  - **Initialization**
    - INC(D) = Increment
    - MOV (A) = Move

  - **Instruction**
    - INC(D) = Increment
    - MOV (A) = Move
APPENDIX 2 AM9901 ISP DESCRIPTION

**Register Description**
The AM9901 is a 4-bit microprocessor. This section contains the declaration of all registers and the source code for writing the native assembly language.

**Instruction Set**
The instruction set of the AM9901 is designed to be compatible with the 8080 architecture. It includes basic instructions for arithmetic, logic, and memory operations.

**Instruction Execution**

```
**Instruction Execution**

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Operation</th>
</tr>
</thead>
<tbody>
<tr>
<td>ADD</td>
<td>Add A and B, store result in A</td>
</tr>
<tr>
<td>ADDC</td>
<td>Add A and B, store result in A, carry in C</td>
</tr>
<tr>
<td>SUB</td>
<td>Subtract B from A, store result in A</td>
</tr>
<tr>
<td>SUBC</td>
<td>Subtract B from A, store result in A, borrow in B</td>
</tr>
<tr>
<td>MUL</td>
<td>Multiply A and B, store result in A</td>
</tr>
<tr>
<td>DIV</td>
<td>Divide A by B, store quotient in A</td>
</tr>
</tbody>
</table>

**Register Description**

<table>
<thead>
<tr>
<th>Register</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>A</td>
<td>Accumulator, holds data for arithmetic operations</td>
</tr>
<tr>
<td>B</td>
<td>General-purpose register, used for arithmetic and logic operations</td>
</tr>
<tr>
<td>C</td>
<td>Carry flag, used in arithmetic operations</td>
</tr>
<tr>
<td>D</td>
<td>Status register, holds control and status information</td>
</tr>
</tbody>
</table>

**Implementation Details**

```

```
The Am2903/2910

General Description of the Am2903

The Am2903 is a 4-bit expandable bipolar microprocessor slice. The Am2903 performs all functions performed by the industry standard Am2901A and, in addition, provides a number of significant enhancements that are especially useful in arithmetic-oriented processors. Infinitely expandable memory and three-port, three-address architecture are provided by the Am2903. In addition to its complete arithmetic and logic instruction set, the Am2903 provides a special set of instructions which facilitate the implementation of multiplication, division, normalization, and other previously time-consuming operations. The Am2903 is supplied in a 48-pin dual in-line package.

Architecture of the Am2903

The Am2903 is a high-performance cascadable 4-bit bipolar microprocessor slice designed for use in CPU's, peripheral controllers, microprogrammable machines, and numerous other applications. The 9-bit microinstruction selects the ALU sources, function, and destination. The Am2903 is cascadable with full lookahead or ripple carry, has three-state outputs, and provides various ALU status flag outputs. Advanced low-power Schottky processing is used to fabricate this 48-pin LSI circuit.

All data paths within the device are 4 bits wide. As shown in Fig. 1, the device consists of a 16-word by 4-bit two-port RAM with latches on both output ports, a high-performance ALU and shifter, a multi-purpose Q register with shifter input, and a 9-bit instruction decoder.

Two-Port RAM

Any two RAM words addressed at the A and B address ports can be read simultaneously at the respective RAM A and B output ports. Identical data appears at the two output ports when the same address is applied to both address ports. The latches at the RAM output ports are transparent when the clock input, CP, is HIGH, and they hold the RAM output data when CP is LOW. Under control of the OEB three-state output enable, RAM data can be read directly at the Am2903 DB I/O port.

External data at the Am2903 Y I/O port can be written directly into the RAM, or ALU shifter output data can be enabled onto the Y I/O port and entered into the RAM. Data is written into the RAM at the B address when the write enable input, WE, is LOW and the clock input, CP, is LOW.

Arithmetic Logic Unit

The Am2903 high-performance ALU can perform seven arithmetic and nine logic operations on two 4-bit operands. Multiplexers at the ALU inputs provide the capability to select various pairs of ALU source operands. The EA input selects either the DA external data input or RAM output port A for use as one ALU operand, and the OEB and I0 inputs select RAM output port B, DB external data input, or the Q-register content for use as the second ALU operand. Also,
during some ALU operations, zeros are forced at the ALU operand inputs. Thus, the Am2903 ALU can operate on data from two external sources, from an internal and external source, or from two internal sources. Table 1 shows all possible pairs of ALU source operands as a function of the EA, OEB, and I0 inputs.

When instruction bits I4, I3, I2, I1, and I0 are LOW, the Am2903 executes special functions. Table 4 defines these special functions and the operation which the ALU performs for each. When the 2903 executes instructions other than the nine special functions, the ALU operation is determined by instruction bits I4, I3, I2, and I1. Table 2 defines the ALU operation as a function of these four instruction bits.

Am2903's may be cascaded in either a ripple carry or lookahead carry fashion. When a number of Am2903's are cascaded, each slice must be programmed to be a most significant slice (MSS), intermediate slice (IS), or least significant slice (LSS) of the array. The carry generate, G, and carry propagate, P, signals required for a lookahead carry scheme are generated by the Am2903 and are available as outputs of the least significant and intermediate slices.

The Am2903 also generates a carry-out signal, Cn+4, which is generally available as an output of each slice. Both the carry-in, Cn, and carry-out, Cn+4, signals are active HIGH. The ALL generates two other status outputs. These are negative, N, and overflow, OVR. The N output is generally the most significant (sign) bit of the ALU output and can be used to determine positive or negative results. The OVR output indicates that the arithmetic operation being performed exceeds the available 2's complement number range. The N and OVR signals are available as outputs of the most significant slice. Thus the multi-purpose G/N and P/OVR outputs indicate G and P at the least significant and intermediate slices, and sign and overflow at the most significant slice. To some extent, the meanings of the Cn+4, P/OVR, and G/N signals vary with the ALL function being performed. Refer to Table 5 for an exact definition of these four signals as a function of the Am2903 instruction.

1 ALU Operand Sources

$E_A$  $I_0$  $OE_B$  $ALU$ operand $R$  $ALU$ operand $S$

$L$  $L$  $L$  RAM output A  RAM output B
$L$  $L$  $H$  RAM output A  $DB_{0-3}$
$L$  $H$  $X$  RAM output A  Q Register
$H$  $L$  $L$  $DA_{0-3}$  RAM output B
$H$  $L$  $H$  $DA_{0-3}$  $DB_{1-3}$
$H$  $H$  $X$  $DA_{0-3}$  Q Register

$L =$ LOW  $H =$HIGH  $X =$ don't care
**ALU Shifter**

Under instruction control, the ALU shifter passes the ALU output (F) non-shifted, shifts it up one bit position (2F), or shifts it down one bit position (F/2). Both arithmetic and logical shift operations are possible. An arithmetic shift operation shifts data around the most significant (sign) bit position of the most significant slice, and a logical shift operation shifts data through this bit position (see Fig. 2). SIO0 and SIO3 are bidirectional serial shift inputs/outputs. During a shift-up operation, SIO0 is generally a serial shift input.

**Table 2 ALU Functions**

<table>
<thead>
<tr>
<th>I₄</th>
<th>I₃</th>
<th>I₂</th>
<th>I₁</th>
<th>Hex code</th>
<th>ALU functions</th>
</tr>
</thead>
<tbody>
<tr>
<td>L</td>
<td>L</td>
<td>L</td>
<td>L</td>
<td>0</td>
<td>I₀ = H</td>
</tr>
<tr>
<td>L</td>
<td>L</td>
<td>L</td>
<td>H</td>
<td>1</td>
<td>F = S Minus R Minus 1 Plus Cₙ</td>
</tr>
<tr>
<td>L</td>
<td>L</td>
<td>H</td>
<td>L</td>
<td>2</td>
<td>F = R Minus S Minus 1 Plus Cₙ</td>
</tr>
<tr>
<td>L</td>
<td>L</td>
<td>H</td>
<td>H</td>
<td>3</td>
<td>F = R Plus S Plus Cₙ</td>
</tr>
<tr>
<td>L</td>
<td>H</td>
<td>L</td>
<td>L</td>
<td>4</td>
<td>F = S Plus Cₙ</td>
</tr>
<tr>
<td>L</td>
<td>H</td>
<td>L</td>
<td>H</td>
<td>5</td>
<td>F = ~S Plus Cₙ</td>
</tr>
<tr>
<td>L</td>
<td>H</td>
<td>H</td>
<td>L</td>
<td>6</td>
<td>F = R Plus Cₙ</td>
</tr>
<tr>
<td>L</td>
<td>H</td>
<td>H</td>
<td>H</td>
<td>7</td>
<td>F = ~R Plus Cₙ</td>
</tr>
<tr>
<td>H</td>
<td>L</td>
<td>L</td>
<td>L</td>
<td>8</td>
<td>Fᵢ = LOW</td>
</tr>
<tr>
<td>H</td>
<td>L</td>
<td>L</td>
<td>H</td>
<td>9</td>
<td>Fᵢ = Rᵢ AND Sᵢ</td>
</tr>
<tr>
<td>H</td>
<td>L</td>
<td>H</td>
<td>L</td>
<td>A</td>
<td>Fᵢ = Rᵢ Exclusive-NOR Sᵢ</td>
</tr>
<tr>
<td>H</td>
<td>L</td>
<td>H</td>
<td>H</td>
<td>B</td>
<td>Fᵢ = Rᵢ Exclusive-OR Sᵢ</td>
</tr>
<tr>
<td>H</td>
<td>H</td>
<td>L</td>
<td>L</td>
<td>C</td>
<td>Fᵢ = Rᵢ AND Sᵢ</td>
</tr>
<tr>
<td>H</td>
<td>H</td>
<td>L</td>
<td>H</td>
<td>D</td>
<td>Fᵢ = Rᵢ NOR Sᵢ</td>
</tr>
<tr>
<td>H</td>
<td>H</td>
<td>H</td>
<td>L</td>
<td>E</td>
<td>Fᵢ = Rᵢ NAND Sᵢ</td>
</tr>
<tr>
<td>H</td>
<td>H</td>
<td>H</td>
<td>H</td>
<td>F</td>
<td>Fᵢ = Rᵢ 0R Sᵢ</td>
</tr>
</tbody>
</table>

L=LOW H=HIGH i= 0 to 3
and SIO3 a serial shift output. During a shift down operation, SIO3 is generally a serial shift input and SIO0 a serial shift output.

To some extent, the meaning of the SIO0 and SIO3 signals is instruction-dependent. Refer to Tables 3 and 4 for an exact definition of these pins.

The ALU shifter also provides the capability to sign-extend at slice boundaries. Under instruction control, the SIO0 (sign) input can be extended through Y0, Y1, Y2, and Y3 and propagated to the SIO3 output.

A cascadable 5-bit parity generator/checker is designed into the Am2903 ALU shifter and provides ALU error detection capability. Parity for the F0, F1, F2, and F3 ALU outputs and SIOs input is generated and, under instruction control, is made available at the SIO0 output.

The instruction inputs determine the ALU shifter operation. Table 4 defines the special functions and the operation the ALU shifter performs for each. When the Am2903 executes instructions other than the nine special functions, the ALU shifter operation is determined by instruction bits I8I7I6I5. Table 3 defines the ALU shifter operation as a function of these four bits.

**Q Register**

The Q register is an auxiliary 4-bit register. It is intended primarily for use in multiplication and division operations; however, it can also be used as an accumulator or holding register for some applications. The ALU output, F, can be loaded into the Q register, and/or the Q register can be selected as the source for the ALU S operand. The shifter at the input to the Q register provides the capability to shift the Q-register contents up one bit position (2Q) or down one bit position (Q/2). Only logical shifts are performed. QIO0 and QIO3 are bidirectional shift serial inputs/outputs. During a Q-register shift-up operation, QIO0 is a serial shift input and QIO3 is a serial shift output. During a shift-down operation, QIO3 is a serial shift input and QIO0 is a serial shift output.

Double-length arithmetic and logical shifting capability is provided by the Am2903. The double-length shift is performed by connection QIO3 of the most significant slice to SIO0 of the least significant slice, and executing an instruction which shifts both the ALU output and the Q register.

The Q register and shifter are controlled by the instruction inputs. Table 4 defines the Am2903 special functions and the operations which the Q register and shifter perform for each. When the Am2903 executes instructions other than the nine special functions, the Q register and shifter operation is controlled by instruction bits I8I7I6I5. Table 3 defines the Q register and shifter operation as a function of these four bits.
Output Buffers

The DB and Y ports are bidirectional I/O ports driven by three-state output buffers with external output enable controls. The Y output buffers are enabled when the OEY input is LOW and are in the high-impedance state when OEY is HIGH. Likewise, the DB output buffers are enabled when the OEB is LOW and in the high-impedance state when OEB is HIGH.

The zero, Z, pin is an open-collector input/output that can be wired ORed between slices. As an output it can be used as a zero detect status flag and generally indicates that the Y0-3 pins are all LOW, whether they are driven from the Y output buffers or from an external source connected to the Y0-3 pins. To some extent the meaning of this signal varies with the instruction being performed. Refer to Table 5 for an exact definition of this signal as a function of the Am2903 instruction.

Instruction Decoder

The Instruction Decoder generates required internal control signals as a function of the nine instruction inputs, I0-8; the Instruction Enable input, IEN; the LSS input; and the WRITE/ MSS input/output.

The WRITE output is LOW when an instruction which writes data into the RAM is being executed. Refer to Tables 3 and 4 for a definition of the WRITE output as a function of the Am2903 instruction inputs.

When IEN is HIGH, the WRITE output is forced HIGH and the Q register and Sign Compare Flip-Flop contents are preserved.

When IEN is LOW, the WRITE output is enabled and the Q register and Sign Compare Flip-Flop can be written according to the Am2903 instruction. The Sign Compare Flip-Flop is an on-chip flip-flop which is used during an Am2903 divide operation (see Fig. 3).

Programming the Am2903 Slice Position

Tying the LSS input LOW programs the slice to operate as a least significant slice (LSS) and enables the WRITE output signal onto the WRITE/MSS bidirectional I/O pin. When LSS is tied HIGH, the WRITE/MSS pin becomes an input pin. Tying the WRITE/ MSS pin HIGH programs the slice to operate as an intermediate slice (IS), and tying it LOW programs the slice to operate as a most significant slice (MSS).

Am2903 Special Functions

The Am2903 provides nine special functions which facilitate the implementation of the following operations:

- Single- and double-length normalization
- 2's complement division
- Conversion between 2's complement and sign magnitude representation
• Incrementation by 1 or 2

Table 4 defines these special functions.

The single-length and double-length normalization functions can be used to adjust a single-precision or double-precision floating-point number in order to bring its mantissa within a specified range.

Three special functions which can be used to perform a 2's complement, non-restoring divide operation are provided by the Am2903. These functions provide both single- and double-precision divide operations and can be performed in \( n \) clock cycles, where \( n \) is the number of bits in the quotient.

The unsigned multiply special function and the two 2's complement multiply special functions can be used to multiply two n-bit unsigned or 2's complement numbers in \( n \) clock cycles. These functions utilize the conditional add and shift algorithm. During the last cycle of the 2's complement multiplication, a conditional subtraction, rather than addition, is performed because the sign bit of the multiplier carries negative weight.

The sign/magnitude-2's complement special function can be used to convert number representation systems. A number expressed in sign/magnitude representation can be converted to the 2's complement representation, and vice-versa, in one clock cycle.

The increment by 1 and increment by 2 special functions can be used to increment an unsigned or 2's complement number by 1 or 2. This is useful in 16-bit-word, byte-addressable machines, where the word addresses are multiples of 2.

**Pin Definitions**

\( A_{0-3} \) Four RAM address inputs which contain the address of the RAM word appearing at the RAM A output port.

\( B_{0-3} \) Four RAM address inputs which contain the address of the RAM word appearing at the RAM B output port.

**Table 3 ALU Destination Control for \( I_0 \) OR \( I_1 \) OR \( I_2 \) OR \( I_3 \) OR \( I_4 \) HIGH, IEN = LOW**

<table>
<thead>
<tr>
<th>( I_8 )</th>
<th>( I_7 )</th>
<th>( I_6 )</th>
<th>( I_5 )</th>
<th>Hex code</th>
<th>ALU shifter function</th>
<th>SIO3</th>
<th>Y3</th>
</tr>
</thead>
<tbody>
<tr>
<td>L</td>
<td>L</td>
<td>L</td>
<td>L</td>
<td>0</td>
<td>Arith. ( F/2 \rightarrow Y )</td>
<td>Input</td>
<td>Input</td>
</tr>
<tr>
<td>L</td>
<td>L</td>
<td>L</td>
<td>H</td>
<td>1</td>
<td>Log. ( F/2 \rightarrow Y )</td>
<td>Input</td>
<td>Input</td>
</tr>
</tbody>
</table>
Parity = $F_3 \lor F_2 \lor F_1 \lor F_0 \lor SIO_3$

$\lor$ = Exclusive OR

output port and into which new data is written when the WE input and the CP input are LOW

**WE**  The RAM write enable input. If WE is LOW, data at the Y I/O port is written into the RAM when the CP input is LOW. When WE is HIGH, writing data into the RAM is inhibited.

**DA_{0-3}**  A 4-bit external data input which can be selected as one of the Am2903 ALU operand sources; DA_0 is the least significant bit.

**EA**  A control input which, when HIGH, selects DA_{0-3} and, when LOW, selects RAM output A as the ALU R operand.

**DB_{0-3}**  A 4-bit external data input/output. Under control of the OEB input, RAM output port B can be directly read on these lines, or input data on these lines can be selected as the ALU S operand.
OE_B  A control input which, when LOW, enables RAM output B onto the DB0-3 lines and, when HIGH, disables the RAM output B tri-state buffers.

C_n  The carry-in input to the Am2903 ALU.

I_0-8  The nine instruction inputs used to select the Am2903 operation to be performed.

IEN  The Instruction enable input which, when LOW, enables the WRITE output and allows the Q register and the Sign Compare Flip-Flop to be written. When IEN is HIGH, the WRITE output is forced HIGH and the Q register and Sign Compare Flip-Flop are in the hold mode.

C_n+4  This output generally indicates the carry-out of the Am2903 ALU. Refer to Table 5 for an exact definition of this pin.

G /N  A multi-purpose pin which indicates the carry generate, G, function at the least significant and intermediate slices, and generally indicates the sign, N, of the ALU result at the most significant slice. Refer to Table 5 for an exact definition of this pin.

P /OVR  A multi-purpose pin which indicates the carry

<table>
<thead>
<tr>
<th>Y_2</th>
<th>Y_1</th>
<th>Y_0</th>
<th>S1O_0</th>
<th>Write</th>
<th>Q Reg. &amp; shifter function</th>
<th>QIO_3</th>
<th>QIO_0</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Most sig. slice</strong></td>
<td><strong>Other slices</strong></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>SIO_3</td>
<td>F_3</td>
<td>F_2</td>
<td>F_1</td>
<td>F_0</td>
<td>L</td>
<td>Hold</td>
<td>Hi-Z</td>
</tr>
<tr>
<td>F_3</td>
<td>F_3</td>
<td>F_2</td>
<td>F_1</td>
<td>F_0</td>
<td>L</td>
<td>Hold</td>
<td>Hi-Z</td>
</tr>
<tr>
<td>SIO_3</td>
<td>F_3</td>
<td>F_2</td>
<td>F_1</td>
<td>F_0</td>
<td>L</td>
<td>Log. Q/2 → Q</td>
<td>Input</td>
</tr>
<tr>
<td>F_3</td>
<td>F_3</td>
<td>F_2</td>
<td>F_1</td>
<td>F_0</td>
<td>L</td>
<td>Log. Q/2 → Q</td>
<td>Input</td>
</tr>
<tr>
<td>F_2</td>
<td>F_2</td>
<td>F_1</td>
<td>F_0</td>
<td>Parity</td>
<td>L</td>
<td>Hold</td>
<td>Hi-Z</td>
</tr>
<tr>
<td>F_2</td>
<td>F_2</td>
<td>F_1</td>
<td>F_0</td>
<td>Parity</td>
<td>H</td>
<td>Log. Q/2 → Q</td>
<td>Input</td>
</tr>
<tr>
<td>F_2</td>
<td>F_2</td>
<td>F_1</td>
<td>F_0</td>
<td>Parity</td>
<td>H</td>
<td>F → Q</td>
<td>Hi-Z</td>
</tr>
<tr>
<td>F_2</td>
<td>F_2</td>
<td>F_1</td>
<td>F_0</td>
<td>Parity</td>
<td>L</td>
<td>F → Q</td>
<td>Hi-Z</td>
</tr>
<tr>
<td>F_1</td>
<td>F_1</td>
<td>F_0</td>
<td>SIO_0</td>
<td>Input</td>
<td>L</td>
<td>Hold</td>
<td>Hi-Z</td>
</tr>
<tr>
<td>F_1</td>
<td>F_1</td>
<td>F_0</td>
<td>SIO_0</td>
<td>Input</td>
<td>L</td>
<td>Hold</td>
<td>Hi-Z</td>
</tr>
<tr>
<td>F_1</td>
<td>F_1</td>
<td>F_0</td>
<td>SIO_0</td>
<td>Input</td>
<td>L</td>
<td>Log. 2Q → Q</td>
<td>Q_3</td>
</tr>
</tbody>
</table>
F1 F 1 F 0 SIO 0 Input L Log. 2Q → Q Q3 Input

F2 F 2 F 1 F 0 Hi-Z H Hold Hi-Z Hi-Z
F2 F 2 F 1 F 0 Hi-Z H Log. 2Q → Q Q3 Input
SIO0 SIO0 SIO0 SIO0 Input L Hold Hi-Z Hi-Z
F2 F 2 F 1 F 0 Hi-Z L Hold Hi-Z Hi-Z

L = LOW
H = HIGH
Hi-Z = high-impedance

propagate, P, function at the least significant and intermediate slices, and
indicates the conventional 2’s complement overflow, OVR, signal at the most
significant slice. Refer to Table 5 for an exact definition of this pin.

Z An open-collector input/output pin which, when HIGH, generally indicates
the Y0-3 outputs are all LOW. For some special functions, Z is used as an input
pin. Refer to Table 5 for an exact definition of this pin.

SIO0, SIO3 Bidirectional serial shift inputs/outputs for the ALU shifter. During a shift-up
operation, SIO0 is an input and SIO3 an output. During a shift-down operation,
SIO3 is an input and SIO0 is an output. Refer to Tables 3 and 4 for an exact
definition of these pins.

QIO0, QIO3 Bidirectional serial shift inputs/outputs for the Q shifter which operate like
SIO0 and SIO3. Refer to Tables 3 and 4 for an exact definition of these pins.

LSS An input pin which, when tied LOW, programs the chip to act as the least
significant slice (LSS) of an Am2903 array and enables the WRITE output
onto the WRITE/MSS pin. When LSS is tied HIGH, the chip is programmed
to operate as either an intermediate or most significant slice and the WRITE
output buffer is disabled.

WRITE/MSS When LSS is tied LOW, the WRITE output signal appears at this pin; the
WRITE signal is LOW when an instruction which writes data into the RAM is
being executed. When LSS is tied HIGH, WRITE/MSS is an input pin; tying it
HIGH programs the chip to operate as an intermediate slice (IS) and tying it
LOW programs the chip to operate as the most significant slice (MSS).

Y0-3 Four data inputs/outputs of the Am2903. Under control of the OEY input, the
ALU shifter output data can be enabled onto these lines, or these lines can be
used as data inputs when external data is written directly into the RAM.

<table>
<thead>
<tr>
<th>$I_3$</th>
<th>$I_2$</th>
<th>$I_1$</th>
<th>$I_0$</th>
<th>Hex code</th>
<th>Special function</th>
<th>ALL function</th>
<th>ALL shifter function</th>
<th>SIO&lt;sub&gt;2&lt;/sub&gt;</th>
<th>O Reg: b</th>
<th>shifter function</th>
<th>QIO&lt;sub&gt;2&lt;/sub&gt;</th>
<th>QIO&lt;sub&gt;1&lt;/sub&gt;</th>
<th>WRITE</th>
</tr>
</thead>
<tbody>
<tr>
<td>L</td>
<td>L</td>
<td>X</td>
<td>0, 1</td>
<td>Signed Multiply</td>
<td>$F = 0 + C_{in} + C_{out} + Z = L$</td>
<td>Log. $F_0$ = $y$</td>
<td>$y$ = Input</td>
<td>$F_1$</td>
<td>Log. $Q_0$ = $y$</td>
<td>$y$ = Input</td>
<td>$Q_1$</td>
<td>L</td>
<td></td>
</tr>
<tr>
<td>L</td>
<td>H</td>
<td>X</td>
<td>2, 3</td>
<td>Two's Complement Multiply</td>
<td>$F = 0 + C_{in} + C_{out} + Z = L$</td>
<td>Log. $F_0$ = $y$</td>
<td>$y$ = Input</td>
<td>$F_1$</td>
<td>Log. $Q_0$ = $y$</td>
<td>$y$ = Input</td>
<td>$Q_1$</td>
<td>L</td>
<td></td>
</tr>
<tr>
<td>L</td>
<td>H</td>
<td>L</td>
<td>4</td>
<td>Increment by One or Two</td>
<td>$F = 0 + C_{in} + C_{out} + Z = L$</td>
<td>$F = y$</td>
<td>$y$ = Input</td>
<td>$F_1$</td>
<td>High</td>
<td>$F_2$</td>
<td>Low</td>
<td>Hi-Z</td>
<td>L</td>
</tr>
<tr>
<td>L</td>
<td>H</td>
<td>H</td>
<td>5</td>
<td>Sign/magnitude, Two's Complement</td>
<td>$F = 0 + C_{in} + C_{out} + Z = L$</td>
<td>Log. $F_0$ = $y$</td>
<td>$y$ = Input</td>
<td>$F_1$</td>
<td>Log. $Q_0$ = $y$</td>
<td>$y$ = Input</td>
<td>$Q_1$</td>
<td>L</td>
<td></td>
</tr>
<tr>
<td>H</td>
<td>H</td>
<td>X</td>
<td>6, 7</td>
<td>Two's Complement Multiply, Correction</td>
<td>$F = 0 + C_{in} + C_{out} + Z = L$</td>
<td>Log. $F_0$ = $y$</td>
<td>$y$ = Input</td>
<td>$F_1$</td>
<td>Log. $Q_0$ = $y$</td>
<td>$y$ = Input</td>
<td>$Q_1$</td>
<td>L</td>
<td></td>
</tr>
<tr>
<td>H</td>
<td>L</td>
<td>X</td>
<td>8, 9</td>
<td>Single Length Normalize</td>
<td>$F = 0 + C_{in} + C_{out} + Z = L$</td>
<td>$F = y$</td>
<td>$F_1$</td>
<td>Hi-Z</td>
<td>Log. $Q_0$ = $y$</td>
<td>$y$ = Input</td>
<td>L</td>
<td></td>
<td></td>
</tr>
<tr>
<td>H</td>
<td>H</td>
<td>X</td>
<td>10</td>
<td>Double Length Normalize and First Divide Op</td>
<td>$F = 0 + C_{in} + C_{out} + Z = L$</td>
<td>Log. $F_0$ = $y$</td>
<td>$y$ = Input</td>
<td>$F_1$</td>
<td>Log. $Q_0$ = $y$</td>
<td>$y$ = Input</td>
<td>L</td>
<td></td>
<td></td>
</tr>
<tr>
<td>H</td>
<td>L</td>
<td>X</td>
<td>11</td>
<td>Two's Complement Divide</td>
<td>$F = 0 + C_{in} + C_{out} + Z = L$</td>
<td>Log. $F_0$ = $y$</td>
<td>$y$ = Input</td>
<td>$F_1$</td>
<td>Log. $Q_0$ = $y$</td>
<td>$y$ = Input</td>
<td>L</td>
<td></td>
<td></td>
</tr>
<tr>
<td>H</td>
<td>H</td>
<td>X</td>
<td>12</td>
<td>Two's Complement Divide, Correction and Remainder</td>
<td>$F = 0 + C_{in} + C_{out} + Z = L$</td>
<td>$F = y$</td>
<td>$F_1$</td>
<td>Hi-Z</td>
<td>Log. $Q_0$ = $y$</td>
<td>$y$ = Input</td>
<td>L</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**NOTES:**
1. At the most significant slice only, the $C_{in}$ signal is internally gated to the $y$ output.
2. At the most significant slice only, the $F_0$ or $F_2$ is internally gated to the $y$ output.
3. At the most significant slice only, $F_0$ or $F_1$ is generated at the $y$ output.
OEY  A control input which, when LOW, enables the ALU shifter output data onto the Y0-3 lines and, when HIGH, disables the Y0-3 three state output buffers.

SP  The clock input to the Am2903. The Q Register and Sign Compare Flip-Flop are clocked on the LOW-to-HIGH transition of the CP signal. When enabled by WE, data is written in the RAM when CP is LOW.
Using the Am2903

Am2903 Applications

The Am2903 is designed to be used in microprogrammed systems. Figure 4 illustrates a recommended architecture. The control and data inputs to the Am2903 normally will all come from registers clocked at the same time as the Am2903. The register inputs come from a ROM or PROM—the "microprogram store." This memory contains sequences of microinstructions which apply the proper control signals to the Am2903’s and other circuits to execute the desired operation.

The address lines of the microprogram store are driven from the Am2910 Microprogram Sequencer. This device has facilities for storing an address, incrementing an address, jumping to any address, and linking subroutines. The Am2910 is controlled by some of the bits coming from the microprogram store. Essentially, these bits are the "next instruction" control.

Note that with the microprogram register in between the microprogram memory store and the Am2903’s, a microinstruction accessed on one cycle is executed on the next cycle. As one microinstruction is executed, the next microinstruction is being read from microprogram memory. In this configuration, system speed is improved because the execution time in the Am2903’s occurs in parallel with the access time of the microprogram store. Without the "pipeline register," these two functions must occur serially.

Expansion of the Am2903

The Am2903 is a 4-bit CPU slice. Any number of Am2903’s can be interconnected to form CPU’s of 8, 16, 32, or more bits, in 4-bit increments. Figure 5 illustrates the interconnection of four Am2903’s to form a 16-bit CPU, using ripple carry.

With the exception of the carry interconnection, all expansion schemes are the same. The QIO₃ and SIO₃ pins are bidirectional left/right shift lines at the MSB of the device. For all devices except the most significant, these lines are connected to the QIO₀ and SIO₀ pins of the
adjacent more significant device. These connections allow the Q registers of all Am2903’s to be shifted left or right as a contiguous n-bit register, and also allow the ALU output data to be shifted left or right as a contiguous n-bit word prior to storage in the RAM. At the LSB and MSB of the CPU, the shift pins should be connected to a shift multiplexer which can be controlled by the microcode to select the appropriate input signals to the shift inputs.

Device 1 has been defined as the least significant slice (LSS) and its LSS pin has accordingly been grounded. The Write/Most Significant Slice (WRITE/MSS) pin of device I is now defined as being the Write output, which may now be used to drive the write enable (We) signal common to the four devices. Devices 2 and 3 are designated as intermediate slices and hence the LSS and WRITE/MSS pins are tied HIGH. Device 4 is designated the most significant slice (MSS) with the LSS pin tied HIGH and the WRITE/MSS pin held LOW. The open-collector, bidirectional Z pins are tied together for detecting zero or for interchip communication for some special instruction. The carry-out (C_{n+4}) is connected to the carry-in (C_n) of the next chip in the case of ripple carry. For a faster carry scheme, an AM2902 may be employed (as shown in Fig. 6) so that the G and P outputs of the Am2903 are connected to the appropriate G and P inputs of the Am2902, while the C_{n+4}, C_{n+2} and C_{n+2}, outputs of the Am2902 are connected to the C_n, input of the appropriate Am2903. Note that G /N and P /OVR pin functions are device-dependent. The most significant slice outputs N and OVR while all other slices output G and P.

The IEN pin of the Am2903 allows the option of conditional instruction execution. If IEN is LOW, all internal clocking is enabled, allowing the latches, RAM, and Q register to function, if IEN is HIGH, the RAM and Q register are disabled. The RAM is controlled by IEN if WE is connected to the WRITE output.

It would be appropriate at this point to mention that the Am2903 may be microcoded to work in either two- or three address architecture modes. The two-address modes allow A + B → B while the three-address mode makes possible A + B → C.
Implementation of a three-address architecture is made possible by varying the timing of LEN in relationship to the external clock and changing the B address. This technique is discussed in more detail under Memory Expansion.

**Parity**

The Am2903 computes parity on a chosen word when the instruction bits I5-8 have the values of 416 to 716 as shown in Table 3. The computed parity is the result of the Exclusive-OR of the individual ALU outputs and SIO3. Parity output is found on SIO0. Parity between devices may be cascaded by the interconnection of the SIO0 and SIO3 ports of the devices as shown in Fig. 6. The equation for the parity output at the SIO0 port of device 1 is given by 

\[ \text{SIO}_0 = F_{15} \ \& \ F_{14} \ \& \ \ldots \ \& \ F_1 \ \& \ F_0 \ \& \ \text{SIO}_{15}. \]

**Sign Extend**

Sign extend across any number of Am2903 devices can be done in one microcycle. Referring again to the table of instructions (Table 3), the sign extend instruction (Hex instruction E) on I5-8 causes the sign present at the SIO0 port of a device to be extended across the device and appear at the SIO3 port and at the Y outputs. If the least significant bit of the instruction (bit I5) is HIGH, Hex instruction F is present on I5-8, commanding a shifter pass instruction. At this time, F3 of the ALU is present on the SIO3 output pin. It is then possible to control the extension of the sign across chip boundaries by controlling the state of I5 when I6-8 are HIGH. Figure 7 outlines the Am2903 in sign extend mode. With I6-8 held HIGH, the individual chip sign extend is controlled by I5A-D. If, for example, I5A and I5B are HIGH while I5C and I5D are LOW, the signal present at the boundaries of devices 2 and 3 (F3 of device 2) will be extended across devices 3 and 4 at the SIO3 pin of device 4. The outputs of the four devices will be available at their respective Y data ports. The next positive edge of the clock will load the Y outputs into the address selected by the B port. Hence, the results of the sign extension are stored in the RAM.

**Special Functions**

When I0-4 = 0, the Am2903 is in the special function mode. In this mode, both the source and destination are controlled by I5-8. The special functions are in essence special
microinstructions that are used to reduce the number of microcycles needed to execute certain functions in the Am2903.

**Normalization, Single- and Double-Length**

Normalization is used as a means of referencing a number to a fixed radix point. Normalization strips out all leading sign bits such that the two bits immediately adjacent to the radix point are of opposite polarity.

Normalization is commonly used in such operations as fixed-to-floating point conversion and division. The Am2903 provides for normalization by using the Single-Length and Double-Length Normalize commands. Figure 5a represents the Q register of a 16-bit processor which contains a positive number. When the Single-Length Normalize command is applied, each positive edge of the clock will cause the bits to shift toward the most significant bit (bit 15) of the Q register. Zeros are shifted in via the QIOo port. When the bits on either side of the radix point (bits 14 and 15) are of opposite value, the number is considered to be normalized, as shown in Fig. 8b. The event of normalization is externally indicated by a HIGH level on the C_{n+4} pin of the most significant slice (C_{n+4} MSS = Q_3 MSS ? Q_1 MSS).

There are also provisions made for a normalization indication via the OVR pin one microcycle before the same indication is available on the C_{n+4} pin (OVR = Q_2 MSS ? Q_1 MSS). This is for use in applications that require a stage of register buffering of the normalization indication.

Since a number consisting of all zeros is not considered for normalization, the Am2903 indicates when such a condition arises. If the Q register is zero and the Single-Length
Normalization command is given, a HIGH level will be present on the Z line. The sign output, N, indicates the sign of the number stored in the Q register, Q₃ MSS. An unnormalized negative number (Fig. 9a) is normalized in the same manner as a positive number. The results of single-length normalization are shown in Fig. 9b. The device interconnection for single-length normalization is outlined in Fig. 10. During single-length normalization, the number of shifts performed to achieve normalization can be counted and stored in one of the working registers. This can be achieved by forcing a HIGH at the Cₙ input of the least significant slice, since during this special function the ALU performs the function \[ B + Cₙ \] and the result is stored in B.

Normalizing a double-length word can be done with the Double-Length Normalize command, which assumes that a user-selected RAM register contains the most significant portion of the word to be normalized while the Q register holds the least significant half (Fig. 11). The device interconnection for double-length normalization is shown in Fig. 12. The Cₙ⁴, OVR, N, and Z outputs of the most significant slice perform the same functions in double-length normalization as they did in single-length normalization except that Cnr₄, OVR, and N are derived from the output of the ALU of the most significant slice in the case of double-length normalization, instead of the Q register of the most significant slice as in single-length normalization. A high-level Z line in double-length normalization reveals that the outputs of the ALU and Q register are both zero, hence indicating that the double-length word is zero.

When double-length normalization is being performed, shift counting is done either with an extra microcycle or with an external counter.

**Sign/Magnitude-2's Complement Conversion**

As part of the special instruction set, the Am2903 can convert between 2's complement and sign/magnitude representations. Figure 13 illustrates the interconnection needed for sign/mag-
nitude-2's complement conversion. The \( C_n \), input of device 1 is connected to the \( Z \) pin. The sign bit (\( S_3 \) MSS) is brought out on the \( Z \) line and informs the other ALU's whether the conversion is being performed on a negative or a positive number. If the number attempted to be converted is the most negative number in 2's complement [i.e., 100 . . . 00(-2\( n \)], an overflow indication will occur. This is because \(-2^n\) is 1 greater than any number that can be represented in sign magnitude notation and hence an attempted conversion to sign magnitude from \(-2^n\) will cause an overflow. When minus zero in sign/magnitude notation (100 . . . 0) is converted to 2's complement notation, the correct result is obtained (0 . . . 0).

**Increment by 1 or 2**

Incrementation by 1 or 2 is made possible by the special function of the same name. This command is quite useful in the case of byte-addressable words. A word may be incremented by 1 if \( C_n \) is LOW or incremented by 2 if \( C_n \) is HIGH.

**Unsigned Multiply**

This special function allows for easy implementation of unsigned multiplication. Figure 14 is the multiply flow chart. The algorithm dictates that initially the BAM word addressed by address port B be zero, the multiplier be in the Q register, and the multiplicand be in the register addressed by address port A. The initial conditions for the execution of the algorithm are that (1) register \( R_0 \) be reset to zero; (2) the multiplicand be in \( R_1 \) and (3) the multiplier be in \( R_2 \). The first operation transfers the multiplier \( R_2 \) to the Q register. The Unsigned Multiply (2's complement)
multiply) instruction is then executed 16 (15) times. During the Multiply instruction, R0 is addressed by RAM address port B and the multiplicand is addressed by RAM address port A.

When the Unsigned Multiply command is given, the Z pin of device 1 becomes an output while the Z pins of the remaining devices are specified as inputs as shown in Fig. 15. The Z output of device 1 is the same state as the least significant bit of the Q register during the Unsigned Multiply instruction; therefore, the Z output of device 1 informs the ALU's of all the slices, via their Z pins, to output the sum of the partial product (referenced by the B address port) plus the multiplicand (referenced by the A address port) if Z = 1. If Z = 0, the output of the ALU is simply the partial product (referenced by the B address port). Since Cn is held LOW, it is not a factor in the computation. Each positive-going edge of the clock will internally shift the ALU outputs toward the least significant bit and simultaneously store the shifted results in the register selected by the B address port, thus becoming the new partial sum. During the down-shifting process, the Cn+4 generated in device 4 is internally shifted into the Y3 position of device 4. At this time, one bit of the multiplier will down-shift out of the QIO0 ports of each device into the QIO3 port of the next least significant slice. The partial product is shifted down between chips in a like manner, between the SIO0 and SIO3 ports, with SIO0 of device 1 being connected to QIO3 of device 4 for purposes of constructing a 32-bit-long register to hold the 32-bit product. At the finish of the 16 × 16 multiply, the most significant 16 bits of the product will be found in the registers referenced by the B address lines while the least significant 16 bits are stored in the Q register. Using a typical computer control unit (CCU),
as shown in Fig. 16, the unsigned multiply operation requires only two lines of microcode, as shown in Fig. 17, and is executed in 17 microcycles.

2's Complement Multiplication

The algorithm for 2's complement multiplication is illustrated by Fig. 14. The initial conditions for 2's complement multiplication are the same as for the unsigned multiply operation. The 2's Complement Multiply command is applied for 15 clock cycles in the case of $16 \times 16$ multiply. During the down-shifting process the term N-OVR generated in device 4 is internally shifted into the $Y_3$ position of device 4. The data flow shown in Fig. 16 is still valid. After 15 cycles, the sign bit of the multiplier is present at the Z output of device 1. At this time, the user must place the 2's Complement Multiply Last Cycle command on the instruction lines. The interconnection for this instruction is shown in Fig. 18. On the next positive edge of the clock, the Am2903 will adjust the partial product, if the sign of the multiplier is negative, by subtracting out the 2's complement representation of the multiplicand. If the sign bit is positive, the partial product is not adjusted. At this point, 2's complement multiplication is complete. Using a typical CCU, the 2's complement multiply operation requires only three lines of microcode, as shown in Fig. 19, and is executed in 17 microcycles.

2’s Complement Division
The division process is accomplished by using a four-quadrant non-restoring algorithm which yields an algebraically correct answer such that the divisor times the quotient plus the remainder equals the dividend. The algorithm works for both single-precision and multi-precision divide operations. The only condition that needs to be met is that the absolute magnitude of the divisor be greater than the absolute magnitude of the dividend. For multi-precision divide operations the least significant bit of the dividend is truncated. This is necessary if the answer is to be algebraically correct. Bias correction is automatically provided by forcing the least significant bit of the quotient to a 1, yet an algebraically correct answer is still maintained. Once the algorithm is completed, the answer may be modified to meet the user's formal requirements, such as rounding off or converting the remainder so that its sign is the same as the dividend's. These format modifications are accomplished using the standard Am2903 instructions.

The true value of the remainder is equal to the value stored in the working register \(2^{n-1}\) when \(n\) is the number of quotient digits.

The following paragraphs describe a double-precision divide operation. The double-precision flow chart is based upon the use of the architecture detailed in Fig. 18.

Referring to the flow chart outlined in Fig. 20, we begin the algorithm with the assumption that the divisor is contained in R0, while the most significant and least significant halves of the dividend reside in R1 and R4, respectively. The first step is to duplicate the divisor by copying the contents of R0 into R3. Next the most significant half of the dividend is copied by transferring the contents of R1 into R2 while simultaneously checking to ascertain if the divisor (R0) is zero. If the divisor is zero then division is aborted. If the divisor is not zero, the copy of the most significant half of the dividend in R2 is converted from its 2's complement to its sign/magnitude representation. The divisor in R3 is converted in like manner in the next step, while a test is done to see if the results of the dividend conversion yielded an indication on the overflow pin of the Am2903. If the output of the overflow pin is a 1 then the dividend is \(-2^n\) and hence is the largest possible number, meaning that it cannot be less than the divisor. What must be done in this case is to scale the dividend by down-shifting the upper and lower halves
stored in R1 and R4 respectively. After scaling, the routine requires that the algorithm be reinitiated at the beginning.

Conversely, if the output of the overflow pin is not a 1, the sign magnitude representation of the divisor (R3) is shifted up in the Am2903, removing the sign while at the same time testing the results of 2’s complement to sign/magnitude conversion of the divisor in the Am2910. If the results of the test indicate that the divisor is \(-2^n\), i.e., overflow equals 1, then the lower half of the dividend is placed in the Q register and division may proceed.

![Division flow chart—double precision divide.](image)

This is possible because the divisor is now guaranteed to be greater than the dividend. If overflow is not a 1 then we must proceed by shifting out the sign of the sign/magnitude representation of the dividend stored in R2. At this point we are able to check whether the divisor is greater than the dividend by subtracting the absolute value of the divisor (R3) from the absolute value of the upper half of the dividend (R2) and storing the results in R3. Next, the least significant half of the dividend is transferred from R4 to the Q register while simultaneously the carry from the result of the divisor-dividend subtraction is tested. If the carry (C_{n+4}) is 1, indicating the divisor is not greater than the dividend, then a scaling operation must occur. This involves either shifting up the divisor or shifting down the
dividend. If the carry is not 1 then the divisor is greater than the dividend and division may now begin.

The first divide operation is used to ascertain the sign bit of the quotient. The 2's Complement Divide instruction is then executed 14 times in the case of a 16-bit divisor and a 32-bit dividend. The final step is the 2's Complement Correction command, which adjusts the quotient by allowing the least significant bit of the quotient to be set to a 1. At the end of the division algorithm the 16-bit quotient is found in the Q register while the remainder now replaces the most significant half of the dividend in R1. It should be noted that the remainder must be shifted down 15 places to represent its true value. The interconnections for these instructions are shown in Figs. 21, 22, 23. Using a typical CCU as shown in Fig. 15, the double-precision divide operation requires only 11 lines of microcode, as shown in Fig. 24.

For those applications that require truncation instead of bias correction, the same algorithm as above should be implemented except one additional 2's Complement Divide instruction should be used in lieu of the 2's Complement Divide Correction and Remainder instruction. However, this technique results in an invalid remainder.

It is possible to do multiple-precision divide operations beyond the double-precision divide shown above. For example, to do a triple-precision divide for a 16-bit CPU, the upper two-thirds of the dividend are stored in R1 and Q as in the case for double-precision divide. The lower third of the dividend is stored in a scratch register, R5. After checking that the magnitude of the divisor is greater than the magnitude of the dividend, using the same tests as defined in Fig. 20, the procedure is as follows:

1. Execute a Double-Length Normalize/First Divide Operation instruction.
2. Execute the 2's Complement Divide instruction 15 times.
3. Transfer the contents of Q, the most significant half of the quotient, to R2.
4. Transfer R5 to Q.
5. Execute the 2's Complement Divide instruction 15 times.
6 Execute the 2's Complement Divide Correction and Remainder instruction. The upper half of the quotient is then in R2, the lower half of the quotient is in Q, and the remainder is in R1. This technique can be expanded for any precision which is required.

**Byte Swap**
The multi-port architecture of the Am2903 allows for easy implementation of high- and low-order byte swapping. Figure 25 outlines a byte-swap implementation utilizing two data ports. Initially, the lower-order 8-bit byte is stored in devices 1 and 2 while the high-order byte is in devices 3 and 4. When the user wishes to exchange the two bytes, the register location of the desired word is placed on the B address port. When the byte-swap line is brought LOW, the bytes to be swapped will be flowing from the DB ports of the Am2903 through the Am25LS240/244 three-state buffers. The outputs of the three-state buffers are permuted so that the byte swap is achieved. The resultant permuted data is presented to the DA ports of the Am2903, where it is reloaded into the memories of the Am2903 on the next positive edge of CP using the permuted data source and function commands of $F = A + C_n(C_n = 0)$ for the Am25LS240 or $F = A + C_n(C_n = 0)$ for the Am25LS244 and the destination command $F \to Y, B$.

A higher-speed technique for achieving the byte-swap operation uses the Y input/output ports with $OE_Y$ held HIGH rather than the DA port inputs. This technique bypasses the ALU, thus allowing faster operation. The Am2903 destination command $F \to Y, B$ should be used.

**Memory Expansion**

The Am2903 allows for a theoretically infinite memory expansion. Figure 26 pictures a 4-bit slice of a system which has 48 words of RAM and 16 words of ROM. RAM storage is provided by the Am2903 and the Am29705’s. The 29705 RAM is functionally identical to the Am2903 RAM. The Am29751 is used to store constants and masks and is addressable from address port A only. The system is organized around five data buses. Inter-bus communication may be done through the Am29705’s or the Am2903. The memory addressing scheme specifies the data source for the R input of the ALU emanating from the register locations specified by address field A. $A_{0:3}$ address 16 memory locations in each chip while address bits $A_{4:6}$ are decoded and used for the output enable for the desired chip. The B address field is used both to select the S input of the ALU and to specify the register location where the result of the ALU operation is to be stored.

Bits $B_{0:3}$ are for source register addressing in each chip. Bits $B_4$ and $B_5$ are used for chip output enable selection. $B_{6:9}$ access the 16 destination addresses on each chip, while bits $B_{10}$
and $B_{11}$ control the Write Enable of the desired chip. The source and destination register address are multiplexed so that when the clock is HIGH, the source register address is presented to the B address ports of the RAM's. The Instruction Enable (IEN) is HIGH at this time. The data flows from the $Y$ port or the internal B port, as selected by the decoder whose inputs are $B_4$ and $B_5$. When the clock goes LOW, the data emanating from the selected $Y$ outputs of the Am29705's and the RAM outputs of the Am2903 are latched and the destination address is now selected for use by the HAM address lines. When the destination address stabilizes on the address lines, the IEN pin is brought LOW. The WRITE output of the Am2903 will now go LOW, enabling the decoder sourced by address bits $B_{10}$ and $B_{11}$. The selected decoder line will go LOW, allowing the desired memory location to be written into.

To switch between two- and three-address architecture, the user simply makes the source and destination addresses the same, i.e., $B_{0-3} = B_{6-9}$. For two-address architecture, the MUX is removed from the circuit.

**General Description of the Am2910**

The Am2910 microprogram controller is an address sequencer intended for controlling the sequence of execution of microinstructions stored in microprogram memory. Besides the capability of sequential access, it provides conditional branching to any microinstruction within its 4096-microword range. A last-in, first-out stack provides microsubroutine return linkage and looping capability; there are five levels of nesting of microsubroutines. Microinstruction loop-count control is provided with a count capacity of 4096.

During each microinstruction, the microprogram controller provides a 12-bit address from one of four sources: (1) the microprogram address register ($\mu$ PC), which usually contains an address 1 greater than the previous address; (2) an external (direct) input (D); (3) a register/counter (R) retaining data loaded during a previous microinstruction; or (4) a five-deep last-in, first-out stack (F).
Fig. 26. Expanded memory.

Fig. 27. Am2910 block diagram.
Architecture of the Am2910

The Am2910 is a bipolar microprogram controller intended for use in high-speed microprocessor applications. It allows addressing of up to 4096 words of microprogram. A block diagram of the Am2910 is shown in Fig. 27, and its application in a microcomputer is depicted in Fig. 28.

The controller contains a four-input multiplexer that is used to select either the register/counter, direct input, microprogram counter, or stack as the source of the next microinstruction address.

The register/counter consists of 12 D-type, edge-triggered flip-flops, with a common clock enable. When its load control, RLD, is LOW, new data is loaded on a positive clock transition. A few instructions include load; in most systems, these instructions will be sufficient, simplifying the microcode. The output of the register/counter is available to the multiplexer as a source for the next microinstruction address. The direct input furnishes a source of data for loading the register/counter.

The Am2910 contains a microprogram counter (μPC) that is composed of a 12-bit incremener followed by a 12-bit register. The μPC can be used in either of two ways: When the carry-in to the incremener is HIGH, the microprogram register is loaded on the next clock cycle with the current Y output word plus one (Y + 1 → μPC). Sequential microinstructions are thus executed. When the carry-in is LOW, the incremener passes the Y output word unmodified so that μPC is reloaded with the same Y word on the next clock cycle (Y → μPC). The same microinstruction is thus executed any number of times.

The third source for the multiplexer is the direct (D) input. This source is used for branching.

The fourth source available at the multiplexer input is a 5-word by 12-bit stack (file). The stack is used to provide return address linkage when executing microsubroutines or loops. The stack contains a built-in stack pointer (SP) which always points to the last file word written. This allows stack reference operations (looping) to be performed without a pop.

The stack pointer operates as an up/down counter. During microinstructions 1, 4, and 5, the PUSH operation is performed. This causes the stack pointer to increment and the file to be written with the required return linkage. On the cycle following the PUSH, the return data is at the new location pointed to by the stack pointer.

During five microinstructions, a POP operation may occur. The stack pointer decrements at the next rising clock edge following a POP, effectively removing old information from the top of the stack.

The stack pointer linkage is such that any sequence of pushes, pops, or stack references can be achieved. At RESET (instruction 0), the depth of nesting becomes 0. For each PUSH, the nesting depth increases by 1; for each POP, the depth increases by 1. The depth can grow to 5. After a depth of 5 is reached, FULL goes LOW. Any further PUSHes onto a full stack overwrite information at the top of the stack but leave the stack pointer unchanged. This operation will usually destroy useful information and is normally avoided. A POP from an empty stack may place non-meaningful data on the Y outputs but is otherwise safe. The stack pointer remains at 0 whenever a POP is attempted from a stack already empty.
The register/counter is operated during three microinstructions (8, 9, and 15) as a 12-bit down-counter, with result = zero available as a microinstruction branch test criterion. This provides efficient iteration of microinstructions. The register/counter is arranged so that if it is preloaded with a number \( n \) and then used as a loop termination counter, the sequence will be executed exactly \( n + 1 \) times. During instruction 15, a three-way branch under combined control of the loop counter and the condition code is available.

The device provides three-state \( Y \) outputs. These can be particularly useful in designs requiring automatic checkout of the processor. The microprogram controller outputs can be forced into the high-impedance state, and pre-programmed sequences of microinstructions can be executed via external access to the address lines.

**Operation**

Table 6 shows the result of each instruction in controlling the multiplexer which determines the \( Y \) outputs, and in controlling the three enable signals PL, MAP, and VECT. The effect on
the register/counter and the stack after the next positive-going clock edge is also shown. The multiplexer determines which internal source drives the Y outputs. The value loaded into μ PC is either identical to the Y output or else 1 greater, as determined by CI. For each instruction, one and only one of the three outputs PL, MAP, and VECT is LOW. If these outputs control three-state enables for the primary source of microprogram jumps (usually part of a pipeline register), a PROM which maps the instruction to a microinstruction starting location, and an optional third source (often a vector from a DMA or interrupt source), respectively, the three-state sources can drive the D inputs without further logic.

Several inputs, as shown in Table 7, can modify instruction execution. The combination CC HIGH and CCEN LOW is used as a test in 10 of the 16 instructions. RLD, when LOW, causes the D input to be loaded into the register/counter, overriding any HOLD or DEC operation specified in the instruction. OE, normally LOW, may be forced HIGH to remove the Am2910 Y outputs from a three-state bus.

The Am2910 Instruction Set

The Am2910 provides 16 instructions which select the address of the next microinstruction to be executed. Four of the instructions are unconditional—their effect depends only on the instruction. Ten of the instructions have an effect which is partially controlled by an external, data-dependent condition. Three of the instructions have an effect which is partially controlled by the contents of the internal register/counter. The instruction set is shown in Table 6. In this discussion it is assumed that Cn is tied HIGH.

In the 10 conditional instructions, the result of the data-dependent test is applied to CC. If the CC input is LOW, the test is considered to have been passed, and the action specified in the name occurs; otherwise, the test has failed and an alternate (often simply the execution of the next sequential microinstruction) occurs. Testing of CC may be disabled for a specific microinstruction by setting CCEN HIGH, which unconditionally forces the action specified in the name; that is, it forces a pass. Other ways of using CCEN include (1) tying it HIGH, which is useful if no microinstruction is data-dependent; (2) tying it LOW if data-dependent instructions are never forced unconditionally; or (3) tying it to the source of Am2910 instruction bit I0, which leaves instructions 4, 6, and 10 as data-dependent but makes others unconditional. All of these tricks save one bit of microcode width.

The effect of three instructions depends on the contents of the register/counter. Unless the counter holds a value of zero, it is decremented; if it does hold zero, it is held and a different microprogram next address is selected. These instructions are useful for executing a microinstruction loop a known number of times. Instruction 15 is affected both by the external condition code and the internal register/counter.

Perhaps the best technique for understanding the Am2910 is to simply take each instruction and review its operation. In order to provide some feel for the actual execution of these instructions, Fig. 29 is included and depicts examples of all 16 instructions.

The examples given in Fig. 29 should be interpreted in the following manner: The intent is to show microprogram flow as various microprogram memory words are executed. For example, the CONTINUE instruction, instruction 14, as shown in Fig. 29, simply means that the contents of microprogram memory word 50 are executed and then the contents of word 51 are
executed. This is followed by the contents of microprogram memory word 52 and the contents of microprogram memory word 53. The microprogram addresses used in the examples were arbitrarily chosen and have no meaning other than to show instruction flow. The exception to this is the first example, JUMP ZERO, which forces the microprogram location counter to address ZERO. Each dot refers to the time that the contents of the microprogram memory word is in the pipeline register. While no special symbology is used for the conditional instructions, the test to follow will explain what the conditional choices are in each example.

It might be appropriate at this time to mention that AMD has a microprogram assembler called AMDASM, which has the capability of using the Am2910 instructions in symbolic representation. AMDASM's Am2910 instruction symbolics (or mnemonics) are given in Fig. 29 for each instruction and are also shown in Table 6.

**Instruction 0. JZ** (JUMP and ZERO, or RESET) unconditionally specifies that the address of the next microinstruction is zero. Many designs use this feature for power-up sequences and provide the power-up firmware beginning at microprogram memory word location 0.

**Instruction 1** is a CONDITIONAL JUMP-TO-SUBROUTINE via the address provided in the pipeline register. As shown in Fig. 29, the machine might have executed words at addresses 50, 51, and 52. When the contents of address 52 are in the pipeline register, the next address control function is the CONDITIONAL JUMP-TO-SUBROUTINE. Here, if the test is passed, the next instruction executed will be the contents of microprogram memory location 90. If the test has failed, the JUMP-TO-
### Table 7 Pin Functions

<table>
<thead>
<tr>
<th>Abbreviation</th>
<th>Name</th>
<th>Function</th>
</tr>
</thead>
<tbody>
<tr>
<td>D&lt;sub&gt;i&lt;/sub&gt;</td>
<td>Direct Input Bit i</td>
<td>Direct input to register/counter and multiplexer. D&lt;sub&gt;0&lt;/sub&gt; is LSB.</td>
</tr>
<tr>
<td>I&lt;sub&gt;i&lt;/sub&gt;</td>
<td>Instruction Bit i</td>
<td>Selects one-of-sixteen instructions for the AM 2910.</td>
</tr>
<tr>
<td>CC</td>
<td>Condition Code</td>
<td>Used as test criterion. Pass test is a LOW on CC.</td>
</tr>
<tr>
<td>CCEN</td>
<td>Condition Code Enable</td>
<td>Whenever the signal is HIGH, CC is ignored and the part operates as though CC were true (LOW).</td>
</tr>
<tr>
<td>CI</td>
<td>Carry-In</td>
<td>Low order carry input to incrementer for microprogram counter.</td>
</tr>
<tr>
<td>RLD</td>
<td>Register Load</td>
<td>When LOW forces loading of register/counter regardless of instruction or condition.</td>
</tr>
<tr>
<td>OE</td>
<td>Output Enable</td>
<td>Three-state control of Y&lt;sub&gt;i&lt;/sub&gt; outputs.</td>
</tr>
<tr>
<td>CP</td>
<td>Clock Pulse</td>
<td>Triggers all internal state changes at LOW-to-HIGH</td>
</tr>
</tbody>
</table>
SUBROUTINE will not be executed; the contents of microprogram memory location 53 will be executed instead. Thus, the CONDITIONAL JUMP-TO-SUBROUTINE instruction at location 52 will cause the instruction either in location 90 or in location 53 to be executed next. If the TEST input is such that location 90 is selected, value 53 will be pushed onto the internal stack. This provides the return linkage for the machine when the subroutine beginning at location 90 is completed. In this example, the subroutine was completed at location 93 and a RETURN-FROM-SUBROUTINE was found at location 93.

Instruction 2 is the JUMP MAP instruction. This is an unconditional instruction which causes the MAP output to be enabled so that the next microinstruction location is determined by the address supplied via the mapping PROMs. Normally, the JUMP MAP instruction is used at the end of the instruction fetch sequence for the machine. In the example of Fig. 29, microinstructions at locations 50, 51, 52, and 53 might have been the fetch sequence, and at its completion at location 53, the jump map function would be contained in the pipeline register. This example shows the mapping PROM outputs to be 90; therefore, an unconditional jump to microprogram memory address 90 is performed.

Instruction 3, CONDITIONAL JUMP PIPELINE, derives its branch address from the pipeline register branch address value (BR0-BR11 in Fig. 28). This instruction provides a technique for branching to various microprogram sequences depending upon the test condition inputs. Quite often, state machines are designed which simply execute tests on various inputs waiting for the condition to come true. When the true condition is reached, the machine then branches and executes a set of microinstructions to perform some function. This usually has the effect of resetting the input being tested until some point in the future. Figure 29 shows the conditional jump via the pipeline register address at location 52. When the contents of microprogram memory word 52 are in the pipeline register, the next address will be either location 53 or location 30 in this example. If the test is passed, the value currently in the pipeline register (3) will be selected. If the test fails, the next address selected will be contained in the microprogram counter, which in this example is 53.
Instruction 4 is the PUSH/CONDITIONAL LOAD COUNTER instruction and is used primarily for setting up loops in microprogram firmware. In Figure 29, when instruction 52 is in the pipeline register, a PUSH will be made onto the stack and the counter will be loaded on the basis of the condition. When a PUSH occurs, the value pushed is always the next sequential instruction address. In this case, the address is 53. If the test fails, the counter is not loaded; if it is passed, the counter is loaded with the value contained in the pipeline register branch address field. Thus, a single microinstruction can be used to set up a loop to be executed a specific number of times. Instruction 8 will describe how to use the pushed value and the register/counter for looping.

Instruction 5 is a CONDITIONAL JUMP-TO-SUBROUTINE via the register/counter or the contents of the Pipeline register. As shown in Fig. 29, a PUSH is always performed and one of two subroutines executed. In this example, either the subroutine beginning at address 80 or the subroutine beginning at address 90 will be performed. A return-from subroutine (instruction 10) returns the microprogram flow to address 55. In order for this microinstruction control sequence to operate correctly, both the next-address fields of
Instruction 53 and the next-address fields of instruction 54 have to contain the proper value. Let us assume that the branch address fields of instruction 53 contain the value 90 so that it will be in the Am2910 register/counter when the contents of address 54 are in the pipeline register. This requires that the instruction at address 53 load the register/counter. Now, during the execution of instruction 5 (at address 54), if the test fails, the contents of the register (value = 90) will select the address of the next microinstruction. If the test input passes, the pipeline register contents (value = 80) will determine the address of the next microinstruction. Therefore, this instruction provides the ability to select one of two subroutines to be executed based on a test condition.

**Instruction 6** is a CONDITIONAL JUMP VECTOR instruction which provides the capability to take the branch address from a third source heretofore not discussed. In order for this instruction to be useful, the Am2910 output, VECT, is used to control a three-state control input of a register, buffer, or PROM containing the next microprogram address. This instruction provides one technique for performing interrupt-type branching at the microprogram level. Since this instruction is conditional, a pass causes the next address to be taken from the vector source, while failure causes the next address to be taken from the microprogram counter. In the example of Fig. 29, if the CONDITIONAL JUMP VECTOR instruction is contained at location 52, execution will continue at vector address 20 if the TEST input is HIGH and the microinstruction at address 53 will be executed if the TEST input is LOW.

**Instruction 7** is a CONDITIONAL JUMP via the contents of the Am2910 register/counter or the contents of the pipeline register. This instruction is very similar to instruction 5, the CONDITIONAL JUMP-TO-SUBROUTINE via R or PL. The major difference between instruction 5 and instruction 7 is that no push onto the stack is performed with 7. Figure 29 depicts this instruction as a branch to one of two locations depending on the test condition. The example assumes the pipeline register contains the value 70 when the contents of address 52 are being executed. As the contents of address 53 are clocked into the pipeline register, the value 70 is loaded into the register/counter in the Am2910. The value 80 is available when the contents of address 53 are in the pipeline register. Thus, control is transferred to either address 70 or address 80, depending on the test condition.

**Instruction 8** is the REPEAT LOOP, COUNTER ≠ ZERO instruction. This microinstruction makes use of the decrementing capability of the register/counter. To be useful, some previous instruction, such as 4, must have loaded a count value into the register/counter. This instruction checks to see whether the register/counter contains a non-zero value. If so, the register/counter is decremented, and the address of the next microinstruction is taken from the top of the stack. If the register/counter contains zero, the loop exit condition is occurring; control falls through to the next sequential microinstruction by selecting μ PC; the stack is POPped by decrementing the stack pointer, but the contents of the top of the stack are thrown away.

An example of the REPEAT LOOP, COUNTER ≠ ZERO instruction is shown in Fig. 29. In this example, location 50 most likely would contain a PUSH/CONDITIONAL LOAD COUNTER instruction which would have caused address 51 to be PUSHed onto the stack and the counter to be loaded with the proper value for looping the desired number of times.

In this example, since the loop test is made at the end of the instructions to be repeated (microaddress 54), the proper value to be loaded by the instructions at address 50 is one less than the desired number of passes through the loop. This method allows a loop to be executed
1 to 4096 times. If it is desired to execute the loop from 0 to 4095 times, the firmware should be written to make the loop exit test immediately after loop entry.

Single-microinstruction loops provide a highly efficient capability for executing a specific microinstruction a fixed number of times. Examples include fixed rotates, byte swap, fixed-point multiply, and fixed-point divide.

*Instruction* 9 is the REPEAT PIPELINE REGISTER, COUNTER ≠ ZERO instruction. This instruction is similar to instruction 8 except that the branch address now comes from the pipeline register rather than the file. In some cases, this instruction maybe thought of as a one-word file extension; that is, by using this instruction, a loop with the counter can still be performed when subroutines are nested five deep. This instruction's operation is very similar to that of instruction 8. The differences are that on this instruction, a failed test condition causes the source of the next microinstruction address to be the D inputs; and, when the test condition is passed, this instruction does not perform a *POP* because the stack is not being used.

In the example of Fig. 29, the REPEAT PIPELINE, COUNTER ≠ ZERO instruction is instruction 52 and is shown as a single microinstruction loop. The address in the pipeline register would be 52. Instruction 51 in this example could be the LOAD COUNTER AND CONTINUE instruction (instruction 12). While

the example shows a single microinstruction loop, by simply changing the address in a pipeline register, multi-instruction loops can be performed in this manner for a fixed number of times as determined by the counter.

*Instruction* 10 is the conditional RETURN-FROM-SUBROUTINE instruction. As the name implies, this instruction is used to branch from the subroutine back to the next microinstruction address following the subroutine call. Since this instruction is conditional, the return is performed only if the test is passed. If the test is failed, the next sequential microinstruction is performed. The example in Fig. 29 depicts the use of the conditional RETURN-FROM-SUBROUTINE instruction in both the conditional and the unconditional modes. This example first shows a JUMP-TO-SUBROUTINE at instruction location 52, where control is transferred to location 90. At location 93, a conditional RETURN-FROM-SUBROUTINE instruction is performed. If the test is passed, the stack is accessed and the program will transfer to the next instruction at address 53. If the test is failed, the next microinstruction at address 94 will be executed. The program will continue to address 97, where the subroutine is complete. To perform an unconditional RETURN-FROM-SUBROUTINE, the conditional RETURN-FROM-SUBROUTINE instruction is executed unconditionally; the microinstruction at address 97 is programmed to force CCEN HIGH, disabling the test, and the forced PASS causes an unconditional return.

*Instruction* 11 is the CONDITIONAL JUMP PIPELINE register address and POP stack instruction. This instruction provides another technique for loop termination and stack maintenance. The example in Fig. 29 shows a loop being performed from address 55 back to address 51. The instructions at locations 52, 53, and 54 are all conditional JUMP and POP instructions. At address 52, if the TEST input is passed, a branch will be made to address 70 and the stack will be properly maintained via a POP. Should the test fail, the instruction at location 53 (the next sequential instruction) will be executed. Likewise, at address 53, either the instruction at 90 or 54 will be subsequently executed, depending on whether the test has been passed or failed. The instruction at 54 follows the same rules, going to either 80 or 55.
An instruction sequence as described here, using the CONDITIONAL JUMP PIPELINE and POP instruction, is very useful when several inputs are being tested and the microprogram is looping waiting for any of the inputs being tested to occur before proceeding to another sequence of instructions. This provides the powerful jump-table programming technique at the firmware level.

**Instruction 12** is the LOAD COUNTER AND CONTINUE instruction, which simply enables the counter to be loaded with the value at its parallel inputs. These inputs are normally connected to the pipeline branch address field which (in the architecture being described here) serves to supply either a branch address or a counter value, depending upon whether the microinstruction has been executed. There are altogether three ways of loading the counter: the explicit load by this instruction 12, the conditional load included as part of instruction 4, and the use of the RLD input along with any instruction. The use of RLD with any instruction overrides any counting or decrementation specified in the instruction, calling for a load instead. Its use provides additional microinstruction power, at the expense of one bit of microinstruction width. This instruction 12 is exactly equivalent to the combination of instruction 14 and RLD LOW. Its purpose is to provide a simple capability to load the register! counter in those implementations which do not provide microprogrammed control for RLD.

**Instruction 13** is the TEST END-OF-LOOP instruction, which provides the capability of conditionally exiting a loop at the bottom; that is, this is a conditional instruction that will cause the microprogram to loop, via the file, if the test is failed or else to continue to the next sequential instruction. The example in Fig. 29 shows the TEST END-OF-LOOP microinstruction at address 56. If the test fails, the microprogram will branch to address 52. Address 52 is on the stack because a PUSH instruction has been executed at address 51. If the test is passed at instruction 56, the loop is terminated and the next sequential microinstruction at address 57 is executed, which also causes the stack to be POPped, thus accomplishing the required stack maintenance.

**Instruction 14** is the CONTINUE instruction, which simply causes the microprogram counter to increment so that the next sequential microinstruction is executed. This is the simplest microinstruction of all and should be the default instruction which the firmware requests whenever there is nothing better to do.

**Instruction 15**, THREE-WAY BRANCH, is the most complex. It provides for testing of both a data-dependent condition and the counter during one microinstruction and provides for selecting among one of three microinstruction addresses as the next microinstruction to be performed. Like instruction 5, a previous instruction will have loaded a count into the register/counter while pushing a microbranch address onto the stack. Instruction 15 performs a decrement-and-branch-until-zero function similar to instruction 8. The next address is taken from the top of the stack until the count reaches zero; then the next address comes from the pipeline register. The above action continues as long as the test condition fails. If at any execution of instruction 15 the test condition is passed, no branch is taken; the microprogram counter register furnishes the next address. When the loop is ended, either because the count has become zero or because the conditional test has been passed, the stack is POPped by decrementing the stack pointer, since interest in the value contained at the top of the stack is then complete.
The application of instruction 15 can enhance performance of a variety of machine-level instructions, for instance: (1) a memory search instruction to be terminated either by finding a desired memory content or by reaching the search limit, (2) variable-field-length arithmetic terminated early upon finding that the content of the portion of the field still unprocessed is all zeros, (3) key search in a disc controller processing variable-length records, and (4) normalization of a floating-point number.

As one example, consider the case of a memory search instruction. As shown in Fig. 29, the instruction at microprogram address 63 can be instruction 4 (PUSH), which will push the value 64 onto the microprogram stack and load the number \( n \), which is one less than the number of memory locations to be searched before giving up. Location 64 contains a microinstruction which fetches the next operand from the memory area to be searched and compares it with the search key. Location 65 contains a microinstruction which tests the result of the comparison and also is a THREE-WAY BRANCH for microprogram control. If no match is found, the test fails and the microprogram goes back to location 64 for the next operand address. When the count becomes zero, the microprogram branches to location 72, which does whatever is necessary if no match is found. If a match occurs on any execution of the THREE-WAY BRANCH at location 65, control falls through to location 66, which handles this case. Whether the instruction ends by finding a match or not, the stack will have been POPped once, removing the value 64 from the top of the stack.
A PDP-8 Implemented from AMD Bit-Sliced Microprocessors

Michael Tsao

An example of a microprogrammable system based on the Am2910 sequencer and the Am2901 ALU will illustrate design with bit slices. The target machine is the PDP-8 ISP (see Appendix 1 of Chap. 8). This register-transfer (RT) level design of the micromachine is thus optimized toward the basic PDP-8. However, the general principles involved in microprogramming bit slices are illustrated by this example. A major goal of this design is the clarity of implementation, rather than the economy of design.

Overview

The basic implementation is a one-stage pipeline as shown in Fig. 1 in Chap. 13. In this micromachine, the pipeline register stores the current microinstruction, which is being executed by the Am2910 Sequencer and the Am2901 ALU. The status information (zero, overflow, etc.) of the ALU operations is stored in the Status Register. In a one-stage pipeline design, conditional branches can be executed only by the microinstruction following the microcycle that has generated the branching status. The Am2910 sequencer is used instead of the Am2909 to simplify the design and to aid understandability. A more cost-effective design might actually result from using the Am2909 sequencer, since the number of microinstruction types used to emulate the PDP-8 is small. The Am2901 ALU is used because it more closely reflects the ISP of the PDP-8.

A timing diagram for a typical microcycle is shown in Fig. 1. The indicated delays are typical values, illustrating the timing requirements rather than actual component performances. On the rising edge of the system clock, the Pipeline Register latches the microinstruction to be executed during this microcycle. The output of the Pipeline Register is valid 15 ns later. After another 15-ns delay, the Condition Code input to the Am2910 is valid. The microsequencer generates the next microaddress based on the current microinstruction and the Condition Code input. When the microprogram memory output is valid (approximately 130 ns after the rising clock edge), the microcycle can be restarted. Concurrently with the sequencer operation and microword fetch, the Am2901 ALU executes the operations specified by the microword in the Pipeline Register. The output of the ALU is
valid prior to the falling edge of the system clock. External registers, such as the Memory Address Register (MAR) and the Status Register, use the falling clock edge to latch results from the ALU output port. In this design, the duty cycle of the system clock does not need to be symmetrical at 50 percent.

**RT-Level Implementation and the Microword Format**

The RT-level implementation of the Am2900/PDP-8 is shown in Fig. 2 for the control part, and in Fig. 3 for the data part. The design can best be explained in conjunction with the microword format shown in Table 1. The ISPS description of the RT-level design is listed in Appendix 2. The following subsections discuss the meaning of each microword field and the associated RT-level components. For each microword field, there are three possible bit sizes: the number of bits *normally* required for the associated components, the *minimum* required for this PDP-8 application, and the *actual* field size used. The position of each field in the microword is defined in the ISPS description. The reason for inserting extra bits is to align the fields on octal boundaries, thus aiding the reading of the encoded microprogram.

**Sequencer Instruction and Address Field**

The Am2910 sequencer normally requires a 4-bit-wide instruction and a 12-bit-wide "next address" direct input. The microprogram occupies less than 128 words, requiring only 7 bits of address. Two extra instruction bits and two extra address bits are inserted as 0s in this design example for octal boundary alignment.

Out of sixteen Am2910 instructions, only 4 are used in this example: Conditional Jump Subroutine (CJS, #01), Conditional Jump (CJP, #03), Conditional Return from Subroutine (CRTN, #12), and Continue (CONT, #16). Therefore, it is theoretically possible to use only 2 bits of information to specify these four actions.
## Table 1 The Microword Format and Required Bits per Field

<table>
<thead>
<tr>
<th>Bits per field</th>
<th>Normal</th>
<th>Minimum</th>
<th>Actual (ISP)</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Micro sequencer control</strong></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Microinstruction</td>
<td>4</td>
<td>2</td>
<td>6</td>
</tr>
<tr>
<td>Next microaddress</td>
<td>12</td>
<td>7</td>
<td>9</td>
</tr>
<tr>
<td>Condition code select</td>
<td>(6)</td>
<td>6</td>
<td>6</td>
</tr>
<tr>
<td><strong>ALU control</strong></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td><strong>ALU instruction</strong></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>Source</td>
<td>3</td>
<td>3</td>
<td>3</td>
</tr>
<tr>
<td>Function</td>
<td>3</td>
<td>3</td>
<td>3</td>
</tr>
<tr>
<td>Destination</td>
<td>3</td>
<td>3</td>
<td>3</td>
</tr>
<tr>
<td>RAM A port select</td>
<td>4</td>
<td>3</td>
<td>3</td>
</tr>
<tr>
<td>RAM B port select</td>
<td>4</td>
<td>3</td>
<td>3</td>
</tr>
<tr>
<td>Direct input select</td>
<td>(2)</td>
<td>2</td>
<td>3</td>
</tr>
<tr>
<td>Constant mask select</td>
<td>(3)</td>
<td>3</td>
<td>3</td>
</tr>
<tr>
<td><strong>Miscellaneous control signals</strong></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
Condition Code Input Selection

There is only one condition code (CC) input for the Am2910. The status conditions have to be multiplexed into this input. The assignments for the multiplexer input lines can be found in the ISP description in Appendix 1 (ISPS procedure Condition Code). Five bits are used to select one out of 32 different input signals. The sixth bit in this field is used to select between the original signal and the complement of the signal. In this manner, the micromachine can branch when the signal is either high or low. When an unconditional microprogram branch is required, a logic 0 can be selected for the CC input.

Each bit from the Instruction Register (IR, 5 bits) or from the Memory Buffer Register (MBR, 12 bits) can be selected individually. This capability is used for the basic PDP-8 instruction decode, effective address calculation, and the Group 7 microinstruction decode. Random combinational logic is used to generate a single skip enable signal for the portion of the microprogram that decodes the PDP-8 skip conditions. Interrupt requests are also handled by using combinational logic in a similar manner.

ALU Operations and the Link Bit

Three Am2901 ALU chips are cascaded to form the PDP-8 ALU section. The ALU requires a 9-bit opcode: source, function, and destination. Six bits are used to encode the A port (3 bits) and B port (3 bits) select, since only a subset of the sixteen ALU RAM registers is used in this implementation.

The PDP-8 Link bit is constructed from random logic controlled by a set of signals. For economic reasons, random logic is used rather than adding another Am2901 chip. The Link bit does not correspond to any Am2901 function, and its control would have to be separately microprogrammed. Another alternative for the PDP-8 Link hit is to use one of the Am2901 RAM registers for storing the value. In this case, additional Link-handling microcode would have to be inserted after each PDP-8 ALU operation, increasing the target instruction execution time.

Data Input to the ALU

There is only one method of writing external data into the Am2901 ALU. It is through the Direct (D) input. In this PDP-8 design, three sources are connected to share the D input: data from the main memory (MBR), constants for ALU operations (the Mask ROM), and data in the switch register (SWITCHES). These three sources are connected by an input bus to the D input port on the ALU. The microword selects which one of the three will be the source during any given microcycle.

The use of a separate ROM to store the constants can be debated. An alternative is to store the constants in the microword. It is wasteful to dedicate a microword bit field to this purpose, since the width of this field must be the same as the ALU width and constants are used.
infrequently. If the microword fields are multiplexed, we violate the design goal of clarity. Hence, a constant ROM is a good compromise between the two conflicting objectives. One need only store the address of the constant in the microprogram.

**Miscellaneous Control Signals**

The data part of this design requires many miscellaneous control signals. For example, the Link bit uses seven different signals to control its operation. Analysis indicates that only one of these signals needs to be asserted during any given microcycle. The Miscellaneous Control Select field in the microword selects one and only one signal during each microcycle. The selection code is decoded and directed to the associated destinations. The assignments of the signals can be found in the ISPS description.

**The PDP-8 Primary Memory**

The primary memory (MP) for the PDP-8 target machine is assumed to be constructed from "static" semiconductor memory chips. In this type of memory, the output constantly displays the content of the location selected by the address input, unless a write operation is in progress. In this PDP-8 design, the ALU output is connected with the Memory Address Register (MAR) and with the data input port of the MP. When the write enable line of the MP is asserted, the content of the ALU output port is latched into the location selected by the MAR. The Memory Buffer Register (MBR), an ISP implementation pseudoregister, is constantly displaying the content of the location selected by the MAR. For the ISPS simulation, the memory access speed is assumed to be less than one microcycle. One can read the value of MBR (containing data from MP) two microcycles after a "write" into the MAR.

**The Microprogram**

The encoded microprogram that emulates the PDP-8 basic instruction set is listed in Appendix 2. This program listing is extracted from an ISPS simulator command file used to simulate this microprogrammable machine. The content of the constant ROM (Mask) is defined using the ISPS simulator "set" command, e.g., "set Mask[4] = #0177." The content of the microprogram store is also defined in this manner. As an example, the instruction fetch cycle is now described. (For readability, the encoded microword is broken into seven fields separated by dashes.)

```
set uMP[000] = #03-010-10-403-12-00-10

!RUN: MAR ← LastPC ← PC, IF PDP8.go = 0 goto HALT:
```

If the PDP-8. go bit is off (Condition code select 10), the microprogram jumps to Halt: (location 010). The content of PC (ALU RAM[1]) is pushed to the ALU output. The value is also latched into LastPC (ALU RAM[2]). Concurrently, the value is latched into the Memory Address Register (MAR) using the control code 10.

```
set uMP[001] = # 16-000-00-503-11-21-00
```

The value #0001 is selected from the constant Mask ROM (21). The PC value is selected at the ALU A port, added to the constant, and then latched back into PC.

```
set uMP[002] = #03-040-41-703-05-10-15
```
The content of the Memory Buffer Register (obtained by the MP[MAR] operation) is latched into the ALU. Mb (ALU RAM[5]). In this cycle, the MBR is also latched into the Instruction Register (IR) by the control signal 15. The micro program jumps to the instruction execution section (location 040, Exec:) by forcing a pass-test condition (41) into the Am2910 sequencer Condition Code input.

```
set uMP[004] = #03-000-03-741-00-20-10
```

When the instruction execution is finished, the microprogram returns to this point. The MAR is set to zero in anticipation of interrupt servicing. The MAR will be reset to the correct PC value by microinstruction uMP[001] later on. If the interrupt request is not granted (condition code 03), the microprogram jumps back to RUN: (location 000). Otherwise, the program continues to location uMP[005] to handle the interrupt.

Implementation and Simulation Results

The micromachine and the microcode were simulated and tested by the ISPS simulator. The results are presented here.

**Chip Count**

Since the micromachine was not actually built, the chip count is an estimate of the required hardware parts. The goal of this exercise is to identify the inefficient area in terms of the parts count, and to suggest alternative IC chip types that may reduce the parts count. (See Table 2.)

The parts count for this microprogrammed PDP-8 implementation is 35 chips. Of these IC parts, over two-thirds (25 chips) are SSI or MSI types. If IC custom-made parts are available for the Link bit, the Skip-condition generate, and the Pipeline Register, the design can be reduced to 22 chips.

**Target-Machine Instruction Execution Speed**

Two methods of comparing this microprogrammed PDP-8 and a basic PDP-8 are discussed here. By counting the average number of microinstructions executed for a target instruction, one can estimate the execution speed of the emulated PDP-8. Or one can compare the execution speed of the two LSPS simulators.

**Table 2 Chip Count for a Microprogrammed POP-B**

<table>
<thead>
<tr>
<th>Chip count</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>6</td>
<td>Microstore. The microword width is between 39 bits and 48 bits (see Table 1). In using 8-bit-wide ROM or EPROM parts, six such chips are required. Since the microprogram is less than 128 words (7 address bits), many commercially available memory chips can be used here.</td>
</tr>
</tbody>
</table>
Pipeline Register (Pipe). Eight-bit-wide D flip-flops are assumed here. This register is very expansive in terms of chip count. An alternative would be having a special ROM type that can latch the data in the output buffer. Another alternative is to latch the microaddress instead of the microword. In this second design, the microword fetch and ALU-Sequencer operations are in series rather than in parallel as in the original design. This is a classical cost-performance tradeoff.

Am2910 microsequencer. The advantage of using the Am2910 instead of the Am2909 Sequencer is evident here. The Am2909 requires two chips instead of one Am2910 for this example.

Am2901 ALU bit slices. Three slices are used to provide the 12-bit-wide PDP-8 data path.

Link bit and associated hardware. The link bit in this design is constructed of a D flip-flop, some tristate drivers, and input multiplexers. SSI implementation of the Link bit requires 14 percent (5 out of 35) of the total chip count. An alternative is to use a custom-made MSI chip for the Link bit. A second alternative is to implement the Link bit in the ALU RAM registers. In this second design, additional microcode will have to be inserted to handle the special cases, degrading the overall performance.

Condition Code input multiplexer. Two 16-to-1 MUXs and two 2-to-1 MUXs.

PDP-8 Skip condition generate. The argument for a custom MSI chip can also be made here.

Constant Mask ROM and associated ALU D input selection control. The Constant Mask uses two ROM chips. The D input control uses one 2-to-4 decoder. The source registers for the ALU D input bus are assumed to have build-in tristate drivers.

Other miscellaneous parts.

For each target PDP-8 instruction, the microprogram must execute the following number of microinstructions (Table 3). On the average, 18 microwords (4 + 3 + 6 + 5 or 4 + 3 + 11) are needed to do one PDP-8 target instruction. At the manufacturer-recommended microcycle time of 150 ns, and not counting the PDP-8 Mp access time, the microprogram execution speed is 2.7 μs per target instruction (150 ns × 18). The Mp access time is usually quoted at 1.3 μs for PDP-8/E and /M [Bell, Mudge, and McNamara, 1978]. For an average instruction (i.e., indirect memory reference), three memory accesses are required: instruction fetch, pointer to data (one level of indirection), and the actual data fetch. When these are added to the 2.7-μs microprogram execution time, the projected maximum average instruction time is 6.6 μs.
Another method of comparison involves the ISPS simulator. Several PDP-8 benchmark and diagnostic programs were simulated. The CPU times used by each simulator were compared. The microcoded PDP-8 uses approximately 20 times the CPU time used by the basic PDP-8 ISP. Translated into simulation CPU time, the ISP simulator of the micromachine executes approximately 1.5 PDP-8 target instructions for every CPU second on a DEC KL-10 processor.

Table 3 Average Number of Microinstructions Executed for a Target Instruction

<table>
<thead>
<tr>
<th>Words</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>4</td>
<td>PDP-8 instruction fetch cycle. Check PDP-8.go, fetch target instruction, increment PC, check interrupt conditions.</td>
</tr>
<tr>
<td>3</td>
<td>Instruction decodes. A straightforward binary decision decode tree is implemented in microcode. An alternative is to use the Instruction Decode Mapping ROM capability of the Am2910. The advantage of this alternative is not clear in view of the simple PDP-8 ISP.</td>
</tr>
<tr>
<td>6</td>
<td>Effective Address Calculation. Depending on the addressing mode, there are five possibilities:</td>
</tr>
<tr>
<td></td>
<td>2 words PDP-8 Page 0 address</td>
</tr>
<tr>
<td></td>
<td>4 words current page</td>
</tr>
<tr>
<td></td>
<td>6 words indirect address, Page 0</td>
</tr>
<tr>
<td></td>
<td>8 words indirect address, current page</td>
</tr>
<tr>
<td></td>
<td>9 words auto index</td>
</tr>
<tr>
<td>On the average, approximately six microinstructions are needed to calculate the PDP-8 effective address (equivalent to the Page 0 indirect address).</td>
<td></td>
</tr>
<tr>
<td>5</td>
<td>Memory Reference Instructions. For each target instruction, the microcode fetches data from primary memory, executes the operation, and deposits the result in memory. Depending on the particular target instruction, anywhere between two microinstructions (JMP) and eight microinstructions (ISZ) are needed. On the average, five microinstructions are assumed.</td>
</tr>
<tr>
<td>(11)</td>
<td>PDP-8 OPR group microinstructions. The decoding and execution of the PDP-8 OPR instructions are highly sequential in nature. Therefore, 11 microinstructions executed is taken as the average.</td>
</tr>
</tbody>
</table>

Summary

In this chapter, the design of a microprogrammed PDP-8 was presented. The central component of this micromachine was the AMD bit-sliced microprocessor. Although the design was optimized toward the basic PDP-8 configuration, many issues common to all microprogramming and RT-level hardware designs were illustrated. In simulating the micromachine, the usefulness of the ISP descriptive language as a design tool was also demonstrated.
References
Bell, Mudge, and McNamara [1978]