In Real, Visual Terms
“If a 6600 used paper tape instead of core memory, it would use up tape at about 30 miles/second.” — Grishman, Assembly Language Programming
As legend has it, Grace Hopper (the original designer of COBOL) would hold up a length of wire, and ask her first-year students how long it was. It was just under 1 ft. (30.48cm) long. Most of her students answered with the visual length that they saw, and their answers were factually correct, but useless for the purpose of her class. The answer she wanted was “one nanosecond,” that is, the distance of the signal from one end of the wire to the other. On the lowest level, things happen in nanoseconds. (In silicon wafers, the design involves picosecond-timing considerations.)
But even in a nanosecond system, the physical design sometimes slows things down to microseconds or even milliseconds. The stored-program (Harvard) paradigm could be like this, with the executable instructions stored on some tangible medium like paper tape, while the executable’s data resided in fast-access, electronic storage.
And now, the world mostly uses the shared (von Neumann) paradigm, where the executable and the data may be stored in the same silicon wafer. Memory management can provide an architectural barrier between them, but 1’s and 0’s ultimately are just 1’s and 0’s. (But take a look at the postscript below.)
Let’s look back on that. 1’s and 0’s, but on punched tape.
First of all, let’s assume a punched tape 8 bits wide (yes, it did exist). That simplifies our calculations a lot.
The next thing to consider is that the holes in a punched tape are always 1/10 of an inch apart. That’s true for individual (horizontal, discrete) bits, as well as symbols (rows of bits). An eight-bit-wide tape thus gives 10 bytes per inch in length. Yeah, let’s mix decimal and binary: an inch-long tape contains two and a half 32-bit words. Two inches of tape contain 5 words.
But let’s grow that to a bigger scale. One hundred yards is 300 feet, and 3,600 inches. (Kind of like how one hour is 3,600 seconds.) On punched tape, that would amount to 36,000 bytes, or 9,000 words.
Thus, one mile of punched tape contains 5280*12*2.5, or 158,400 bytes.
Running a program
What does that mean for executable code? Let’s look at a Raspberry Pi 2 model B running at 900MHz. Nine hundred million operations per second, at 4 bytes per operation, means 3.6 billion bytes of code per second (ignoring any CPU stalls). Unrolling a very tight loop yields an interesting number: 22,727 miles of code on punched tape per second!
Okay, 99% of the programs don’t run that fast. So let’s turn things around, from consuming bits (of code) to producing bits (of entropy, that is, random bits). Linux on any Raspberry Pi has a /dev/hwrng, or at least it should. The HardWare Random Number Generator is built-in on the Raspberry Pi, and might be a “good enough” entropy source for most clients. So how much entropy per second can it generate on the punched tape?
The very first Raspberry Pi model B, using the Broadcom 2708/2835 CPU (with a default clock speed of 700MHz), can generate roughly 95KB/sec, or 97,280 bytes/second. On a punched tape, that’s 9,728 inches, or 810 feet, 8 inches of entropy.
The Raspberry Pi 2 model B generates entropy a little faster, at 112KB/sec, or 114,688 bytes/second, amounting to 11,469 inches, or 955 feet, 9 inches of entropy. Incidentally, this speed-up is not proportional to the clock: the clock runs 29% faster than the original Model B, but the entropy generator is only 18% faster.
The final word: it’s a concrete visual
There’s always another way to look at something. Sir Isaac Newton showed that white light is simply all the colors we can see, smashed together at once. Albert Einstein showed that distance is measured by the speed of light, something Grace Hopper brought home to her new students with her one-foot-long, one-nanosecond wire.
Postscript: we can still benefit from the old ways
What I said earlier about the Harvard and von Neumann architectures is true enough. Most computers now use the von Neumann, shared-storage model, where code and data may reside on the same silicon wafer. But the Harvard architecture, with separate code and data, still has significance in modern computing.
Intel 80286: Harvard on a single wafer
The 80286 architecture enabled enforced separation of code and data on a PC. Earlier Intel CPU’s, the 8086/8088 and the 80186 used the overlapping-segment model, but the Intel 80286 made possible a true separation of code and data. Later 32-bit x86 implementations, from Intel, AMD, Cyrix, and Transmeta, follow this model.
Next, 64 bits and simpler memory management
The AMD64 and IA-64 extensions acknowledge the greater benefit of the so-called “flat memory model.” The CPU and the OS conspire to enforce the separation of code and data. A lot of real-life feedback to AMD let them gain a lead over Intel regarding memory management: the software is the manager, the hardware is the subordinate. I’ve looked at the Linux kernel code, and it really does make things simpler.
Raspberry Pi: totally shared (but not really)
The original Raspberry Pi model B used an ARM1176JZF-S processor. The Raspberry Pi Zero uses the same CPU. The Raspberry Pi 2 model B has an ARM CPU that has the NEON instruction set, but doesn’t have the VFP instruction set (at least, officially). But there’s an interesting sub-model of the Raspberry Pi.
The design of the RPi Compute Module has an on-board EEPROM, for storing the boot code and the (presumably) Linux kernel code. The EEPROM is the modern equivalent of the earlier punched-tape, but with the extra ability to skip forward or backward to any position on the tape.
Yes, some little things still use the Harvard architecture, even if by implication. Our cell phones, our Raspberry Pi Compute Modules, and maybe even our old PC’s.