The Night All the Disk Drives Crashed

This is a story a from very early in “a friends” career, concerning an over-zealous computer operator, failed disk drives, mainframe computing, conspiracy and a long-held secret.

In the early days of computing, computers were rooms full of huge racks, and disk drives were the size of washing machines. The disk packs themselves were stacks of aluminum platters that looked like wedding cakes in their smoked plastic covers and they weighed upwards of 20 lbs.

“Strings” of around 8 drives would be connected to a disk controller cabinet. A mainframe could have one or more controller cabinets. Each of these washing machines held a whopping 176 MBytes. Yes, that’s 100 3.5″ floppys (remember those?), or 1/1000th the storage of that SD card that you just threw away because it is too small to be useful.

Yeah, stone knives and bearskins, indeed.

A typical mainframe installation would have rows and rows of washing machines and dedicated people called “operators” who would mount tapes, switch disk packs and run batch jobs according to run books. Run books were essentially programs followed by humans in order to make things happen.

“A Friend” was a student intern in the IT department at a factory that made computer terminals for a large mainframe company. There were two PDP-10 mainframes, the “small” (TEST) system used for testing factory software things, and the “big” one that ran the mystical, mysterious and oh-so-important PRODUCTION. The TEST machine had one controller with six RP06 drives and the big PRODUCTION machine had three rows of eight RP06 drives, each row with its own controller. This becomes important later. They looked a lot like this, actually.

If the PRODUCTION machine wasn’t running, the entire factory stopped, leaving almost 2000 workers twiddling their thumbs at a hefty hourly rate. This was considered a Bad Thing and Never to Be Allowed Upon Pain of Pain.

It was common for different batch jobs to have different disk packs mounted. When you ran payroll, you put in the disk packs with the payroll data. When you ran the factory automation system a different set of packs, and when doing parts inventory true-up a third set, etc. Backups were done to rows and rows of tape drives, but that’s a topic for another story.

At night, on the 3rd shift, after the production jobs had all completed and the backups to tape were all done, there wasn’t a lot for the operators to do. On these slack evenings it was common, permitted and expected that the operators would put in the “GAMES” disk pack and play ADVENT, CHESS, or whatever mainframe game was on the most recent DECUS tape.

Ancient disk drives and packs were not sealed, and it was possible for some dust or even (GASP) a hair to fall into the drive “tub” when the pack was changed. Since the heads “flew” over the platter at a distance measured in microns, or 1/100 the thickness of a hair, any dust would cause a “head crash”, often sending the heads oscillating, skipping across the surface of the platter. So in a “head crash” both the drive heads and the platter were damaged. Here’s a diagram from that era showing all this.

When you changed disk packs, the platters would start to spin and the air filtration system would run. Only after about 60 seconds, after the air had been filtered, would the heads extend out onto the platters and begin to “fly” on a cushion of air.

Late one night after production was ended, the lead operator decided it was time to play some games. As was his privilege, he directed the junior operator to change out the disk packs on the “little” TEST mainframe and load the “GAMES” pack while he (the Senior) went to visit the little operators room, and also step outside for a needed cigarette (and likely also a nip of tequila from his hip flask, it being Arizona).

While the lead operator was out the junior dutifully swapped the GAMES pack into drive T (for test) 05. As it was spinning up, the washing machine emitted a set of beeps and displayed the “FAULT” light and spun back down.

Being a dutiful, and very new operator, the junior wanted to make sure that the ever-so-important lead operator could play the newest games upon his return, so he moved the GAMES pack from the faulty disk drive T05, to the next in line, unit T04. Once again, during the spin up phase, the drive FAULTed and spun down.

So he moved the GAMES pack to unit T03. Which promptly faulted.

The junior operator, being no slouch, realized that there was something wrong here and decided that there was a problem with the TEST mainframe’s single disk controller. Because the odds of three drives failing at the same time was inconceivable. It had to be the disk controller!

So he mounted the GAMES pack into the disk drive labeled P12 on the PRODUCTION mainframe. Which also faulted. The same with P11.

How odd, he thought, another disk controller failure. So he tried the GAMES pack in P05, which while still on the PRODUCTION mainframe, was on disk controller 0, not controller 1.

In all, the junior operator valiantly tried to mount the GAMES pack in six drives, across three disk controllers, on both the TEST and PRODUCTION mainframes. He knew that the lead operator loved his games, and he wanted to demonstrate his perseverance in following orders.

By the time the lead operator came back from his smoke/tequila break, the junior operator had destroyed the heads in six very expensive disk drives.

We later discovered that the original head crash had caused the heads to skitter into the platter, leaving a dent in the aluminum substrate.  When we examined that pack later, it looked like someone had stabbed the platter with a screwdriver, leaving a raised crater that was VISIBLE TO THE NAKED EYE!  So of course, each time he moved the pack to a new drive, the heads quickly crashed into the to-them Himalayan-sized mountain of aluminum, damaging another set of read/write heads and incidentally spraying oxide dust throughout the drive mechanism itself.

The lead operator had the presence of mind to call in the lead system administrator. As this was going to be a dirty job, they also called in the lowly student intern (“my friend”) so they would have TWO very junior someones to crawl under the almost 3 feet deep raised floor to drag the heavy cables (often called “anaconda cables” due to their size) as the machines were reconfigured.

By Tom94022 – Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=38411260

The four of “them” spent the late evening and early morning re-cabling drives between the two systems so that Production could run starting at 8am. The in-house Field Engineer (FE) was called the next day to change all the drive heads and clean all the drive air filters, a two-day job. He happily joined the conspiracy as it was immediately obvious to him what had happened. Because he had seen the exact same thing happen at a Major University the year before. They had lost 8(!) drives to a zealous student operator trying to load a disk pack full of ASCII porn pictures. The Senior FE conveniently had a junior FE who needed some extra practice on this incredibly tedious task, having annoyed said Senior FE by interrupting him while he was “explaining computers” to (snogging with) the cute new secretary, in his office late one afternoon.

The junior operator was sworn to secrecy and paid hefty bar tabs for all involved for several months, including a trip to a strip club across town. The intern was promised a good grade and evaluation, and the Junior FE served his multi-day penance never knowing the whole story.

The rash of crashed disk drives was chalked up to a faulty A/C filter in the first failed drive. Said A/C filter having been created by the Senior Field Engineer taking it outside into the Arizona desert and bashing it into a small bush. All the drive heads had been scheduled for replacement and alignment in three weeks anyway, so there was no actual loss to the company.

It’s been over 40 years since that long night crawling under floor tiles and I still remember the lessons of that night. “Dust is bad”, “stop and think”, “know when to call for help,” the value of learning from the mistakes of others, and most importantly, how to keep a secret.

%d bloggers like this: