Archive for category Computing History

The Night All the Disk Drives Crashed

This is a story a from very early in “a friends” career, concerning an over-zealous computer operator, failed disk drives, mainframe computing, conspiracy and a long-held secret.

In the early days of computing, computers were rooms full of huge racks, and disk drives were the size of washing machines. The disk packs themselves were stacks of aluminum platters that looked like wedding cakes in their smoked plastic covers and they weighed upwards of 20 lbs.

“Strings” of around 8 drives would be connected to a disk controller cabinet. A mainframe could have one or more controller cabinets. Each of these washing machines held a whopping 176 MBytes. Yes, that’s 100 3.5″ floppys (remember those?), or 1/1000th the storage of that SD card that you just threw away because it is too small to be useful.

Yeah, stone knives and bearskins, indeed.

A typical mainframe installation would have rows and rows of washing machines and dedicated people called “operators” who would mount tapes, switch disk packs and run batch jobs according to run books. Run books were essentially programs followed by humans in order to make things happen.

“A Friend” was a student intern in the IT department at a factory that made computer terminals for a large mainframe company. There were two PDP-10 mainframes, the “small” (TEST) system used for testing factory software things, and the “big” one that ran the mystical, mysterious and oh-so-important PRODUCTION. The TEST machine had one controller with six RP06 drives and the big PRODUCTION machine had three rows of eight RP06 drives, each row with its own controller. This becomes important later. They looked a lot like this, actually.

If the PRODUCTION machine wasn’t running, the entire factory stopped, leaving almost 2000 workers twiddling their thumbs at a hefty hourly rate. This was considered a Bad Thing and Never to Be Allowed Upon Pain of Pain.

It was common for different batch jobs to have different disk packs mounted. When you ran payroll, you put in the disk packs with the payroll data. When you ran the factory automation system a different set of packs, and when doing parts inventory true-up a third set, etc. Backups were done to rows and rows of tape drives, but that’s a topic for another story.

At night, on the 3rd shift, after the production jobs had all completed and the backups to tape were all done, there wasn’t a lot for the operators to do. On these slack evenings it was common, permitted and expected that the operators would put in the “GAMES” disk pack and play ADVENT, CHESS, or whatever mainframe game was on the most recent DECUS tape.

Ancient disk drives and packs were not sealed, and it was possible for some dust or even (GASP) a hair to fall into the drive “tub” when the pack was changed. Since the heads “flew” over the platter at a distance measured in microns, or 1/100 the thickness of a hair, any dust would cause a “head crash”, often sending the heads oscillating, skipping across the surface of the platter. So in a “head crash” both the drive heads and the platter were damaged. Here’s a diagram from that era showing all this.

When you changed disk packs, the platters would start to spin and the air filtration system would run. Only after about 60 seconds, after the air had been filtered, would the heads extend out onto the platters and begin to “fly” on a cushion of air.

Late one night after production was ended, the lead operator decided it was time to play some games. As was his privilege, he directed the junior operator to change out the disk packs on the “little” TEST mainframe and load the “GAMES” pack while he (the Senior) went to visit the little operators room, and also step outside for a needed cigarette (and likely also a nip of tequila from his hip flask, it being Arizona).

While the lead operator was out the junior dutifully swapped the GAMES pack into drive T (for test) 05. As it was spinning up, the washing machine emitted a set of beeps and displayed the “FAULT” light and spun back down.

Being a dutiful, and very new operator, the junior wanted to make sure that the ever-so-important lead operator could play the newest games upon his return, so he moved the GAMES pack from the faulty disk drive T05, to the next in line, unit T04. Once again, during the spin up phase, the drive FAULTed and spun down.

So he moved the GAMES pack to unit T03. Which promptly faulted.

The junior operator, being no slouch, realized that there was something wrong here and decided that there was a problem with the TEST mainframe’s single disk controller. Because the odds of three drives failing at the same time was inconceivable. It had to be the disk controller!

So he mounted the GAMES pack into the disk drive labeled P12 on the PRODUCTION mainframe. Which also faulted. The same with P11.

How odd, he thought, another disk controller failure. So he tried the GAMES pack in P05, which while still on the PRODUCTION mainframe, was on disk controller 0, not controller 1.

In all, the junior operator valiantly tried to mount the GAMES pack in six drives, across three disk controllers, on both the TEST and PRODUCTION mainframes. He knew that the lead operator loved his games, and he wanted to demonstrate his perseverance in following orders.

By the time the lead operator came back from his smoke/tequila break, the junior operator had destroyed the heads in six very expensive disk drives.

We later discovered that the original head crash had caused the heads to skitter into the platter, leaving a dent in the aluminum substrate.  When we examined that pack later, it looked like someone had stabbed the platter with a screwdriver, leaving a raised crater that was VISIBLE TO THE NAKED EYE!  So of course, each time he moved the pack to a new drive, the heads quickly crashed into the to-them Himalayan-sized mountain of aluminum, damaging another set of read/write heads and incidentally spraying oxide dust throughout the drive mechanism itself.

The lead operator had the presence of mind to call in the lead system administrator. As this was going to be a dirty job, they also called in the lowly student intern (“my friend”) so they would have TWO very junior someones to crawl under the almost 3 feet deep raised floor to drag the heavy cables (often called “anaconda cables” due to their size) as the machines were reconfigured.

By Tom94022 – Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=38411260

The four of “them” spent the late evening and early morning re-cabling drives between the two systems so that Production could run starting at 8am. The in-house Field Engineer (FE) was called the next day to change all the drive heads and clean all the drive air filters, a two-day job. He happily joined the conspiracy as it was immediately obvious to him what had happened. Because he had seen the exact same thing happen at a Major University the year before. They had lost 8(!) drives to a zealous student operator trying to load a disk pack full of ASCII porn pictures. The Senior FE conveniently had a junior FE who needed some extra practice on this incredibly tedious task, having annoyed said Senior FE by interrupting him while he was “explaining computers” to (snogging with) the cute new secretary, in his office late one afternoon.

The junior operator was sworn to secrecy and paid hefty bar tabs for all involved for several months, including a trip to a strip club across town. The intern was promised a good grade and evaluation, and the Junior FE served his multi-day penance never knowing the whole story.

The rash of crashed disk drives was chalked up to a faulty A/C filter in the first failed drive. Said A/C filter having been created by the Senior Field Engineer taking it outside into the Arizona desert and bashing it into a small bush. All the drive heads had been scheduled for replacement and alignment in three weeks anyway, so there was no actual loss to the company.

It’s been over 40 years since that long night crawling under floor tiles and I still remember the lessons of that night. “Dust is bad”, “stop and think”, “know when to call for help,” the value of learning from the mistakes of others, and most importantly, how to keep a secret.

Leave a comment

Scripting a fast Ubuntu install in Google Cloud Platform (GCP)

In this post I’ll show how to script GCP instance creation, Ubuntu installation and patching in order to support the customized SIMH installs that we’ll do later.

All of my GCP/SIMH installs are based on Ubuntu Linux, running on tiny or small GCP instances. Since one of my goals is quick iteration and making it fast and easy for other people to install the SIMH emulator and the guest OSes, I’ve scripted everything. I’ve been a fan of infrastructure-as-code for two decades, so how could I not apply that to my GCP estate?

For this we need four scripts:

  • create-instance – create an instance, install and patch Ubuntu
  • stop-instance – stop (pause) the instance, preserving the instance state (boot volume)
  • start-instance – (re)start the instance from the saved state
  • destroy-instance – destroy the instance (which deletes the associated boot volume)

All of the examples start with a common Linux base in GCP, so it made sense to script a fast Ubuntu install and update.  While I could use a common SIMH install for almost all the guest operating systems, it makes sense to keep them separate so that people can install just the single OS that they want to play with, instead of them all.

These examples all assume that you have created a Google Cloud account, created at least one project, and enabled billing for that project. You may want to start with these tutorials.

You also need to set a few environment variables as described in this earlier post.

Everything below should be self-explanatory. Essentially, the main steps are to create the instance, then wait for the instance to be up and running. After that, another loop waits until the SSH daemon is running, so that some commands (apt-get update and apt-get upgrade) can be run.

#!/bin/bash

# given a GCP, etc account and the SDK on the install-from host, build and install a new server

. ./set-cloud-configuration.sh

# If you don't use ssh-add to add your key to your active ssh-agent
# you're going to be typing your passphrase an awful lot

#
# create the instance
#
gcloud compute instances create ${INSTANCENAME} --machine-type=${MACHINETYPE} --image-family=${IMAGEFAMILY} --image-project=${IMAGEPROJECT}
gcloud compute instances get-serial-port-output ${INSTANCENAME}

# add the oslogin option so I don't need to manage SSH keys
gcloud compute instances add-metadata ${INSTANCENAME} --metadata enable-oslogin=TRUE

#
# it can take some time, and sometimes(?) the create returns much faster than expected, or the system
# takes a long time to boot and get to the SSH server, so wait for it to be READY
SSHRETURN="dummy"
while [[ "RUNNING" != ${SSHRETURN} ]]; do
    SSHRETURN=`gcloud compute instances describe ${INSTANCENAME} | grep status: | awk -F\  ' {print $2}' `
    sleep 5
done
echo "instance running..."

#
# now wait until the SSH server is running (we get a response without a timeout)
SSHRETURN=255
while [[ ${SSHRETURN} -ne 0 ]]; do
    gcloud compute ssh ${CLOUD_USERNAME}@${INSTANCENAME} --project ${PROJ} --zone ${CLOUDSDK_COMPUTE_ZONE} -- hostname
    SSHRETURN=$?
    sleep 3
done
echo "SSH up and listening..."

# All we have is a "naked" Ubuntu OK, its always a good idea to update+upgrade immediately after installation
 gcloud compute ssh ${CLOUD_USERNAME}@${INSTANCENAME} --project ${PROJ} --zone ${CLOUDSDK_COMPUTE_ZONE} -- sudo apt-get --yes update
 gcloud compute ssh ${CLOUD_USERNAME}@${INSTANCENAME} --project ${PROJ} --zone ${CLOUDSDK_COMPUTE_ZONE} -- sudo apt-get --yes upgrade

exit

The start, stop and destroy shell scripts are much simpler.

All the code is available in my github repo: https://github.com/tomperrine/create-simple-google-instance

, ,

Leave a comment

Setting configuration variables for the SIMH instance in Google Compute

In this short installment, we’ll create a BASH script that will be re-used as we script the creation of the Linux instance, SIMH installation and guest OS installation.

This assumes that you’ve followed the prior posts in the series, and have a functioning Google Cloud account, with a project created, and billing enabled. You need billing enabled even if you’re using the “free tier” or your initial account credit.

There are (for now) three things we need to have set up: account information for logging in, a project name, and a description of the instance we want to run. The description includes the physical location (region/zone) and the operating system we want.

This simple script will set the variables that we will want and can be included into all the other scripts we’ll write later.

Save this as set-cloud-configuration.sh

#!/bin/bash
#
# set user-specific configuration info
# we're going to use "oslogin" so set a username
# THIS MUST MATCH your GCP account configuration
# see https://cloud.google.com/compute/docs/instances/managing-instance-access for details
export CLOUD_USERNAME=YOU_NEED_TO_SET_THIS_FOR_YOUR ACCOUNT!!!!!!!

# Set project information - this project MUST already exist in GCP
# This project MUST have billing enabled, even if you plan to use the "free" tier
export PROJ=retro-simh
gcloud config set project ${PROJ}

# set configuration info for this instance
# pick a region
export CLOUDSDK_COMPUTE_ZONE="us-central1-f"
# set information for the instance we will create
export INSTANCENAME="simh-ubuntu-instance"
export MACHINETYPE="f1-micro"
export IMAGEFAMILY="ubuntu-1804-lts"
export IMAGEPROJECT="ubuntu-os-cloud"

In order to continue with the series, you’ll need to make sure you have enabled billing AND configured “oslogin”.

You should also make sure you have ssh-agent running, unless you want to type your password, a lot.

In the next installment, we’ll create, stop, start and destroy GCP instances in order to prepare for compiling and running SIMH.

, ,

1 Comment

Retrocomputing – Multics

250px-Multics_logo

For the past few months, I’ve been using the dps8m fork of  SIMH to create and run Multics, one of the first operating systems I ever used, and one of my favorites. I’ve also built a completely automated process to install Multics in “the cloud”, so that others can play with this piece of Internet history. I’ll show how that works in some future posts.

Around 1973 I encountered my first computer,  GCOS (AKA GECOS), thanks to Honeywell and Explorer Post 414 in Phoenix. After “we” “discovered” some quite a few security problems with GCOS Timesharing, Honeywell management and our Boy Scout leaders decided to move us all to Multics, as it was a much more secure platform.

Multics has an interesting place in computer science history. It wasn’t the first timesharing (interactive) system, it wasn’t the first to have virtual memory, it wasn’t the first to be primarily written in a higher level language, and it wasn’t the first to be designed and developed with security as a primary goal. It wasn’t open source, although every system did ship with complete source code, something that was not true of any other operating systems of the era.

But it was the first operating system where all these things (and many more) came together.

It is a proven fact that without Multics, there would have been no UNIX, and therefore no MINIX and no Linux.

A lot has been written about Multics, by the people that created and ran it. For background about Multics see:

Leave a comment

Using SIMH in Google Compute to retrace my (UNIX) OS journey

After being introduced to SIMH and getting Multics running, I thought about using SIMH to retrace the steps (and operating systems) that I’ve used in my career. For now, I’ll focus on the UNIX and UNIX-derived systems.

Before coming to UNIX, I had already used Honeywell GECOS, Multics, CP-V and CP-6, and well as DEC’s VMS and TOPS-10. My first UNIX experience was Programmer’s Workbench (PWB) UNIX, which was an interim version between versions 6 and 7.

But after that I used 4BSD, SunOS, UNICOS, HPUX, DomainOS, SGI IRIX, and a host of other UNIX-flavored systems until finally coming to Linux. Along the way I help to extend or create two security kernels – KSOS-11 and KSOS-32.

So my plan is to bring up as many of these operating systems up as possible using SIMH, and focusing on the UNIX family.

Here’s the dependency graph of what I have in mind to begin, and it’s a roadmap for the rest of this series. I have no idea how long it will take, or how far I’ll get.

To date, I’ve got Multics and V6 UNIX, so I’ll show the tooling for those first. Using this information, you should eventually be able to run any OS for which a SIMH emulator exists for the CPU, and for which you can find a bootable or installable image.

, , , ,

Leave a comment

Preparing for SIMH – Setting up the Google Cloud account and installing the Google Cloud SDK

This installment shows how to set up a Google Cloud account in order to run the SIMH emulator in a GCP instance. This is NOT a complete training or tutorial on Google Cloud, but does explain the settings needed for this project.

In this prior post, I’ve shown how the guest OS will be running on the emulator, in a virtual instance in Google Cloud. This post talks about getting a Google Cloud set up to make all this possible.

Google has written a huge amount of documentation. There are tutorials, quickstarts, API docs, and examples and labs. If you have trouble, Google Cloud Help has everything you need to get unstuck.

In order to prepare for the rest of this series and running SIMH in GCP, start with the Google Cloud console and go through this example. It uses the Google Cloud console to do a part of what we’ll do later with scripts.

Those examples show how to set up a project and enable billing. After that, a VM (instance) is created and Linux is installed. Once you have logged into the instance, and logged out, you can then delete the instance to clean up.

Follow the example, and your Google Cloud account will be ready for the rest of this series.

You’ll also want to set up SSH keys for use with “oslogin” – see the documentation here.

Keep the project open, as you’ll need it later to run the emulator instance.

Finally, we’re going to be using BASH scripts and the Google Cloud SDK (AKA gcloud) for all the future example.

You’ll need to install the SDK, using these instructions for your particular operating system.

Next time we’ll begin the first bash script, to use gcloud to set some configuration variables we need to create and run the SIMH instance.

, ,

Leave a comment

An overview of installing and using SIMH in the Google Cloud (GCP)

As I mentioned in this prior post, I’m running some legacy operating systems (Multics, UNIX v7) using SIMH in Google Cloud. In this post I’ll give an overview of the installation process, and how the legacy OS stacks on the emulated hardware SPU, etc.

The process of using Google Cloud to run SIMH for hosting a legacy operating system has these major steps, no matter which CPU you’ll be emulating, or which operating system you’ll be hosting.

  1. Configure your Google Cloud account. Since we’ll want to script all of this later, we’ll save some key values in a script that can be included in later steps.
  2. Configure the GCP instance. This involves selecting the zone, CPU (instance type), operating system, etc. Again, this all gets saved in a script for future (re)use.
  3. Create the GCP instance. This creates the instance (virtual host) of the proper instance (CPU) type, in the correct location, and does the initial operating system install. When this is done, you have a virtual instance running Linux (typically) in the cloud, albeit with an un-patched operating system.
  4. Patch the base operating system.
  5. Install the development tools that are needed to compile SIMH.
  6. Load the SIMH source code, configure and compile it. At this point you have an SIMH binary that will emulate the desired CPU(s) all running in GCP.
  7. Copy (and then customize) the files needed run the desired guest OS on the emulated CPU to the running instance. This will include SIMH configuration files, disk image files, and other SIMH resources. This may vary considerably depending on the version of SIMH and the guest OS.
  8. Start SIMH, which will bootload the guest OS. If this is the first time the OS has been booted, you may need to then log into SIMH to issue commands, or even directly into the running guest OS for final configuration.
  9. After this, you can halt the guest OS and save the disk image. This saved state lets you reboot the system again (and again) with emulated persistent disk storage.

At this point, you’ve got a guest operating system, running on an emulated CPU, on top of a Linux system, running on a hypervisor, in the cloud.

It looks something like this:

For simplicity’s sake, we can combine some of the steps above into fewer steps, each of which can be a separate script.

  1. Capture configuration information – a single script to set environment variables for the Google Compute Account and the instance configuration.
  2. Create the GCP instance, install the operating system, patch it,
  3. Install the development tools needed to build SIMH, load the SIMH source code, configure and compile it. Copy the needed SIMH configuration files at the same time.
  4. Copy (and then customize) the files needed run the desired guest OS on the emulated CPU to the running instance. This will be different for each operating system.
  5. Start SIMH, which will bootload the guest OS.

Next time, we’ll look a little bit more at the GCP account setup and capturing the account and instance configuration information.

, , ,

Leave a comment

Retrocomputing – using SIMH to run Multics on Google Cloud Platform (GCP)

Last Fall (Oct 2018) I started playing with SIMH, and using it to run some rather ancient operating systems in the Google Cloud (GCP). So far I’ve been able to run Multics, UNIX V6 (PDP-11), and 4.0BSD (VaX).

I started down this path by using the dps8m fork of SIMH to run Multics on a Raspberry Pi 3. This worked very well, and produced performance that for a single user, matched the original mainframe hardware. Not bad for a US$35 pocket sized computer emulating a US$10+ MILLION mainframe (of the 1980s). Of course, Multics supported 100s of simultaneous users using timesharing, but at its heart, Multics (up to 8) CPUs were about 1-2 MIPS each and the system supported up to 8M 36-bit words (32 Mbytes) per memory controller,  up to 4 controllers per system for a grand total of 128 Mbytes per system. Yes, that’s Mbytes, not Gbytes.

For comparison, The $35 Pi 3 B+ runs at about 1000 MIPS, and has 1Gbyte of RAM. The Google Compute f1-micro uses 0.2 of a ~1 Ghz CPU and has 0.60 Gbytes (600 Mbytes) of RAM, making it a reasonable fit.

I’ve been building tools to allow anyone to install SIMH and any of these operating systems in the cloud, so that they can be experienced, studied and understood, without having to use dedicated hardware, or understand the details of using GCP, or SIMH.

In this series of posts, I’ll introduce how I’m using GCP (with scripting), a little about SIMH, a little bit about the hardware being emulated, and the historical operating systems and how to run them all in the GCP cloud, pretty much for free.

You should start by looking into Google Cloud Platform (GCP) and using some of their tutorials.

All of the SIMH examples I will show are running Ubuntu Linux on tiny or small GCP instances.

You can get started by reading about SIMH on Wikipedia, at the main SIMH web site, or at the Github repository for the software.

, , ,

Leave a comment

2018? Wait, what?

Wow, I’m behind. It was a busy year, and not a lot going on that I could really talk about publicly.

The recent meltdown and spectre bugs have brought back some memories from Orange Book days. I’ve also been spending a lot of time thinking about “IT transformation” and non-technical stuff. I’ve also been to the UK and Japan, twice, each, which may become the “new normal”.

Let’s see what happens in the next 12 months.

 

Leave a comment

IPv6 – CGN and Teredo Considered Harmful

There, I said it. The so-called “IPv6 transition strategies” are making it harder, more complicated and less secure to deploy IPv6 than just “doing the right thing”.

Carrier Grade NAT (CGN) and Teredo (among others) are the last gasps of an IPv4 world, and have no place in the modern Internet. While they may have short-term advantages to network operators, they will cause problems for their end users until they are finally phased out. Dual stack would be a better transition process, especially for customers.

keep-calm-and-dual-stackCGN is, as much as anything else, a way for carriers with a large network or large installed base of end users to make the fewest (and hopefully least expensive) changes in their networks. They are betting that by introducing a small number of large-scale NAT devices on the border between their networks and the Internet that they can avoid making sweeping internal network changes, or upgrading CPE (Customer Premise Equipment).

At best, even when working correctly, CGN breaks end-user accountability, geo-location and the end user experience. On top if that, it will slow IPv6 adoption, and force “true IPv6” users to adopt a host operational work-arounds and complicate deployment of next generation mobile and Internet applications.

CGN is inherently selfish on the part of the network operators that deploy it. They are saying “I want to spend less money, so I’m going to force everyone else to make changes or suffer in order to continue to talk to my customers.”

Or, as Owen Delong put it in his excellent look at the tradeoffs in CGN:

Almost all of the advantages of the second approach [transition to CGN and avoid investing in IPv6 deployment] are immediate and accrue to the benefit of the provider, while almost all of the immediate drawbacks impact the subscriber.

The next part of my rant has to do with Teredo, a “last resort transition technology”.

Like CGN, Teredo promises to allow end-user equipment to connect to the public IPv6 Internet over IPv4. It does this by “invisibly” tunneling your IPv6 traffic over the public Internet, to a “Teredo gateway”. A Teredo gateway performs a 4to6 network translation and passes your traffic onto the desired IPv6 destination. Teredo is implemented transparently in some Microsoft operating systems and can by default provide an IPv4 tunnel to the outside world for your IPv6 traffic.  It can, also provide an “invisible” tunnel from the outside world back into the heart of your network. And of course, all your network traffic could be intercepted at the Teredo gateway.

Teredo security has been a hot topic for years, with some concerns being raised shortly after Teredo’s standardization in 2006, and RFC6169 finally providing IETF consensus in 2011. Sadly, Teredo security must still be discussed, even though it is 0.01% of network traffic to dual-stacked resources. Fortunately, there’s a move in IETF to declare 6to4 technologies (including Teredo) as “historic”. Teredo will complicate network security until it is gone.

I for one, cannot wait for both CGN and Teredo to be consigned to the dustbin of history.

Leave a comment

%d bloggers like this: