General performance, bus layout and arithmetic logic unit (ALU) were identical to the 4004. The main improvement was to use a larger 24-pin dual inline package, giving it 8 more pins than the 16-pin 4004. Two of these were used to implement interrupts, which were lacking in the 4004 and considered a major oversight. Two more implemented a halt/stop system, which put the processor into a low-power mode and also allowed for single-step operation that made debugging much easier. Another pin was used to bank select a second read only memory (ROM), doubling the amount of ROM the processor could address compared to the 4004.
To make use of these new pins, the instruction set was expanded, increasing it to 60 instructions from the original 46. Additionally, the internal register file and pushdown stack were expanded to support rapid interrupt processing.
The 4004 was designed to be used in an electronic calculator and many of its design notes are related to this role. For instance, program code can only be read from an area dedicated to read only memory, not RAM. Although one could implement the "ROM space" using RAM chips, there were no instructions able to write to that area of memory, and no instructions able to read program code from RAM space. The idea was that systems would supply the system program on ROM, and small amounts of RAM would be used only for data, if at all. In typical use, the internal "index registers" would be used for storage during calculations, with the 16 4-bit registers able to hold a single 8-digit binary coded decimal value, the data format used by calculators. It also had four 12-bit registers for holding addresses, the top-most was the program counter and the next three operated as a push-down stack for subroutine calls.
To reduce pin count to only 16, the original 4004 had only four data pins, multiplexed for both data and addresses. Additional pins indicated whether the address was in ROM or one of several "banks" of RAM. As addresses were 12 bits long, expressing an address took three cycles along with the subsequent read or write, and external circuitry had to latch each 4 bit part of the address between cycles. Instructions were 8 bits, so loading the next instruction required many cycles, which led to the slow performance in spite of what was a relatively fast cycle time for the era.
4040
The 4040 was essentially an expansion of the 4004, with additional pins, more registers and new instructions to take advantage of both.
The 4004 had a single ROM pin, whereas the 4040 added another ROM pin to allow two banks of ROM. This effectively increased the ROM address from 12 to 13 bits, or 8 kB. Unlike later designs where the two lines could be binary encoded and thus support four banks, the ROM chips used with the 4004 and 4040 used these lines like chip select pins, and thus the two lines could only support two banks.
The 4004 lacked interrupts, a serious limitation. The 4040 added a new input pin for calling an interrupt, as well an output pin to indicate the interrupt signal had been noticed and was being acted on. As interrupts also need to save a return address, the stack register file was expanded to seven entries, up from three.[Note 1]
Interrupt handler code normally starts by saving out values in the registers to allow the interrupt code to use them, and then at the end it copies the values back from memory so that the processor returns to its original pre-interrupt state. With the multi-cycle memory access of the design, this would have been extremely slow. To address this, eight additional registers were added in a new "bank 1", the original sixteen registers retroactively becoming "bank 0". The idea was that programmers would attempt to place their critical data in the first eight registers of bank 0. When an interrupt was received, the handler code would call an instruction to swap banks, which would cause bank 1's registers to override bank 0's registers 0 through 7. The handler code would then use these eight registers for any local data, leaving the original values untouched. When the handler completed, it simply swapped bank 0 back in. This reduced the switching time to a single instruction, greatly improving interrupt response times.
Another addition was the input stop pin, and the associated output stop acknowledge. These could be used to stop the processor while the system performed input/output or other non-CPU tasks, but was more widely used for debugging by allowing the processor to be single-stepped. When the processor was in stopped mode, most of the chip hardware put into a low-drain, high-impedance condition, reducing power use. The machine cycle clocks were kept running for the benefit of external devices, including any interrupt controllers needed to wake the chip back up, which relied on these controllers staying in sync.
To take advantage of these new features, and to support the new logical AND and OR operators, the instruction set added 14 new instructions, bringing the total to 60.
Address bus: 12-bit for ROM (multiplexed onto data bus; addresses took three bus cycles to transmit, same as in the 4004), effectively 13-bit with use of bank-switching commands; effectively 10-bit or 8-bit for RAM (8-bit direct address plus one-of-four, i.e. 2-bit equivalent, bank select; the additional 256 "status" memory locations required use of I/O commands to read or write, from an overall 8-bit address space)
Voltage: −15 V DC
Operating frequency: 500 to 740 kHz main clock (2-phase, overlapping); 62500 to 92500 8-clock machine cycles per second, each instruction requiring either one or two machine cycles to read and execute, meaning a rough average of 62 kIPS at 740 kHz with an equal mix.[Note 2]
Performance: Claimed execution time of ~850 μs to add two 32-bit (8-digit BCD) numbers, or around 1175 such operations per second and about 10 machine cycles per digit pair.[Note 3]
Designers
Federico Faggin proposed the project, formulated the architecture and led the design.
The detailed design was done by Tom Innes.
4201 – Clock Generator, 500 to 740 kHz, using 4.000 to 5.185 MHz crystals
4207 – General Purpose 8-bit Output port
4209 – General Purpose 8-bit Input port
4211 – General Purpose 8-bit I/O port
4265 - Programmable general-purpose I/O. It contains four 4-bit I/O ports. It has 14 software-selectable modes that can be programmed to interface with these ports. Not only this can interface with this processor and 8080A processor as well. It was available in sample quantities in stock in March (1976).[3][4]
4269 - Programmable keyboard/display. It has system software configuration in this chipset via instructions that controls how these ports can be handled. It was available in sample quantities in March (1976).[3][5]
Keyboard Features: It generates an interrupt when the key is pressed. It stores up to 8 characters FIFO buffer prior servicing the CPU. It can be interfaced with sensors, panel switches and keyboards which it supports full teletypewriter size. That can be key input encoding, polling, character input buffer storage, and keyboard over-entry recovery.
Display Features: It operates and refresh displays or indicator arrays up to 128 elements or lights. It also supports gas-discharge display such as Burroughs Self-Scan display up to 20 characters can be driven.
4289 – Standard Memory Interface (replaces 4008/4009)[Note 6]
4308 – 1K x 8-bit ROM plus 4 x 4-bit IO ports[Note 7]
The 4040 was first used in a microcomputer in Intel's Intellec4 Mod 40 development system, released in early 1975.[6] According to Byte magazine, the first third-party microcomputer designed around the Intel 4040 was the Micro 440, released by Comp-Sultants of Huntsville, Alabama, in late 1975.[7]
^by default only one level of interrupt was supported, as further servicing was disabled once an initial interrupt was acknowledged and could only re-enabled on RTI, essentially allowing three levels of subroutine stacking within an interrupt itself occurring within a three-deep subroutine, but detection could be forced back on with a specific command.
^This may not be entirely accurate and is based on the 4004 characteristics; some of the added instructions may require additional cycles to execute, especially any external register stacking required for interrupt handling, but it is not made clear in the documentation as to whether this is the case; halt/step are special cases which naturally take potentially many millions of cycles to "execute", but as these essentially occur "outside" of the usual program flow, they don't count for speed calculation.
^Likewise, based on claimed 4004 performance; "850 μs" is a somewhat rounded-off figure claimed in an early Intel brochure and may not be particularly precise, but the total execution time works out to approximately 79 or 80 machine cycles, which fits neatly with either 10 cycles per digit pair, 9 cycles per pair plus 7 for setup and any necessary post-processing, or 8 per pair plus 15 extra (...etc). The actual instruction mix wasn't specified, so without both source code and a list of instruction execution times it's impossible to be sure. A more efficient addition routine might have been possible on the 4040 vs the 4004, but the extra instructions don't suggest any obvious method for achieving this and appear to be focussed on addressing the earlier chip's more obvious shortcomings, e.g. a lack of interrupts and haltability.
^Provides "high" current (~25mA) handling capacity for external I/O devices. Allegedly intended solely for use with the 4289, but design appears fairly generic and may be adaptable to a bare 4004/4040?
^i.e. combining the main storage of four 4002s into a single individually-selectable chip, but with neither the 4-bit I/O port nor additional 64 words of I/O-command-accessed "status" memory. As it has a dedicated, 8-bit address bus, and two separate 4-bit data input and output buses, the 4101 is intended only for use as a downstream peripheral of the 4289. This is further emphasised by the SRAM's claim to be "used for writeable Program Memory", something not achievable by a bare 4004 or 4040.
^Allows connection of "standard" memory (8-bit parallel data and/or up to 12-bit parallel addressing, the latter in the form of 8-bit address plus 4-bit binary chip select) and devices to the 4004 or 4040, as well as the use of RAM as Program Memory (the 4289 can be installed so as to appear like a regular 4001 ROM to the processor regardless of what's connected on the other side; furthering the illusion, it also provides the same four I/O lines that a 4001 would offer). A 4040 making use of two 4289s could offer various combinations of ROM and RAM in 2KB segments up to 8KB total with a relatively simplistic segregated addressing scheme and a small number of 4101s and 4308s, e.g. 3x 4308 and 8x 4101 for 6KB ROM and 1KB RAM.
^Directly replaces four separate 4001s with a single chip, including being available in four "metal" variants that respond to logical chip addresses 0-3, 4-7, 8-11 and 12-15, simulating the sixteen original versions of the 4001
^Intended for use downstream of a 4289, as it presents an 11-bit parallel address input bus and separate 8-bit parallel data output bus, as well as three binary-coded chip select lines (thus allowing provision of up to 16KB ROM with an addressing scheme incompatible with the 4004/4040 on several different levels - although only two 4316s, thus 4KB of ROM, or one 4316 plus RAM, can be directly addressed by a 4289 without additional hardware and/or special programming techniques)
^i.e. UV-erased rather than electronically, and is again intended for use with the 4289, as it has separate, 8-bit parallel, address input and data output buses