A few weeks ago the “anti-social” bookmarking site Pinboard (http://pinboard.in/) made the news in a big way. The site experienced hyper-growth due to the news of the possible demise of Del.icio.us. Concerns about the future of Del.icio.us led tens of thousands of people to look for a new place to store and share their millions of bookmarks.
And quite a few of these people chose Pinboard! During one 30 hour period around December 18th, Pinboard received over 7 million new bookmarks, more than had been put into the system during its entire life.
I was able to catch up with Maciej for an interview via email. I wanted to find out more about how Pinboard was operated, and how this huge spike in load had affected administration of the site. Large-scale system administration isn’t always about hundred of systems, it can also be about tens or hundreds of thousands of users, or unexpected load spikes, or just how you plan for growth.
What’s your background? Especially in system administration? In programming?
I’m one of those people who fell into programming from the liberal arts. I started by making a website for myself in 1998, then got a job making websites for small businesses, got taken under the wing of a real programmer, and eventually ended up as a full-time developer.
All I know about systems administration comes from running my own dev servers, and then eventually production machines for various websites. In other words, I know just enough to be a danger to myself.
Pinboard started as your personal site. What prompted you to create it? Where did the code come from, and what are the basic technologies that you used? What kind of hardware did you use for your personal site?
I built Pinboard because I had long wanted a bookmarking site that would serve as a personal archive (meaning store bookmark content in case the original site went offline). I was also motivated by the Delicious redesign around the summer of 2009, which I found very unpleasant.
Pinboard is a plain PHP app running atop MySQL. Various backend tasks are taken care of by Perl scripts running in screen, and the occasional cron job. I wrote all the code myself. The site initially ran on a small slicehost instance that also hosted my personal blog.
What prompted you to take the site public? Did you change the basic architecture of the site to go public? How did you change the way you host the site to go public?
It seemed like a propitious time for an experiment in social bookmarking, since delicious was moribund and had few competitors. I thought it might be fun to see if I could get a small set of users regularly bookmarking on the site along with me. Since I did not expect much traffic I did not make any substantive changes to the site architecture. I did rent a slightly larger virtual machine from Linode for hosting.
You mentioned on your about page that from your public launch in July 2009 until about October 2010, you had only about 6 hours of unplanned downtime. What caused that? What design or operational features let you have that kind of uptime?
Most of our downtime was due to disk issues. By autumn 2009 I had split the site across two servers, one of them an actual leased server and the other a virtual machine. The physical server developed some problem that would cause it to unexpectedly go into read-only mode, at which point it would require a reboot and fsck, followed by a second reboot. This brought down the site a couple of times until we moved the web-facing part back onto the Linode.
The problem was never diagnosed but resolved by a chassis replacement. My suspicion is that there was a problem with the data bus that caused irrecoverable errors, leading to the freeze. The disk itself reported no errors during numerous self-tests.
Our second major problem did not cause downtime, but made us decide to forgo virtual servers at all costs. On two occasions a RAID rebuild caused I/O performance to degrade drastically. This was not visible from within the VM, so we had to infer that it was happening and then try to convince our provider that it was not our fault. The lack of control and visibility into I/O convinced me to use dedicated hardware from then on.
What can you share about the hosting architecture? How many servers, are they in collocation, or private data center? What kind of storage are you using and what kind of network bandwidth to you have? You say that there is nothing interesting about the way the site is hosted, and that is a feature. Why do you say that?
We have five servers, which is overkill right now but gives us a little headroom. The main set up is:
64 GB master DB server, 6 CPUs
32 GB master DB server, 4 CPU
16 GB web server + DB read slave
plus two 4 GB machines that we use for various offline tasks.
The three main machines are leased from Digital One, which is the only company I found that offers large memory servers at a sane price. The smaller servers are from ServerBeach and Alchemy.net respectively.
We use local IDE storage (about 3 TB) and back up to S3. Not sure what bandwidth we have but we don’t use much.
I say that boring hosting is a feature because in times of crisis you find yourself in a very comfortable and familiar setup. It’s also easy to bring people in and explain what you’re doing. And you get the benefit of a lot of documented online experience with similar tools, by people smarter than you are who write about it well.
It looks like Pinboard is a distributed application, with different servers (and functions) at three different hosting companies. Does having the application distributed in this way have any challenges due to latency between the pieces of the application?
The main web server and database servers are all at the same Digital One hosting facility, which helps cut down latency. The tasks that get done at the other two hosts are asynchronous and not time-sensitive.
I see that you’ve had only modest hardware growth since launch. Do you think your new popularity will force more substantial upgrades?
I don’t think so. To put any kind of load on the 64GB DB server would take tens of thousands of new users. The main reason we would add hardware at this point would be to remove points of failure – for example, if the web server dies, the only recourse we have is to repoint DNS at the backup, which means several minutes of downtime.
The biggest changes we face are on the software and systems side – suddenly we need to automate and instrument everything much more carefully, and parallelize lots of tasks that used to work fine in a single process. For example, it used to be enough to have just one script parse downloaded Twitter files and parse out bookmarks. Now we need much more throughput, which means rewriting that code so that we can launch numerous worker processes to do it.
It looks like you went from collecting 4.5 million links in your first 15 months of operation, and then collecting 7 million new links in 30 hours. That must have been quite a surprise. Could you describe what was going on in your mind during that 30 hours? What were you thinking about? What were you worrying about?
The site was never optimized for write speed. Our typical traffic level was 200-300 new bookmarks per hour. Suddenly we were seeing hundreds of new signups per hour, and every one of them brought along a delicious export file with thousands of bookmarks in it. The database did something like 77 new bookmarks/second for two days, falling as far as eight hours behind on imports. And the slave databases were hopeless. We had not yet brought the really big DB server online, but that would not have helped – our problem was a misdesigned tags table that slowed the possible write throughput.
There was nothing to do about this except pray that nothing broke. At one point the backup master was over half a day behind the master, which is something you never want to see.
All of my energy was spent on handling email. People had trouble getting activation mails, misentered their password, that kind of thing. My biggest fear was that everyone who signed up would take a look at the site and then demand a refund the next day.
You picked up at least 7 million new bookmarks. How many new user accounts did you get?
More like 10 million new bookmarks through the end of the year. We got just short of 10K new users.
I found out about you from a Lifehacker post and then seeing your Twitter feed. How do you think most of your users found you?
Many of our early users came from mentions by Daring Fireball and Leo Laporte, both early fans of the site. After the Delicious news broke, we got written up in several lists of alternative sites and that raised our visibility. We were also lucky enough to enjoy good word of mouth from our existing users, who weren’t shy about getting on Twitter and recommending us. That was incredibly uplifiting.
You have an interesting funding model in that the cost for an account started at a certain low level and increases as time passes. Where did you come up with that model? Where are account costs today and where do you see them going? Would you recommend this model for other new sites?
I got the idea for a rising incremental cost from Joshua Schachter. The idea was that it would serve as a brake on growth, prevent spam, and also help pay up front for resources needed to grow the site. Since every user adds load to the system, it makes sense to ask them to offset it.
The funding model also turned out to have big PR value, which I did not anticipate. It was novel and people found it fascinating. I think they also liked the idea that there was a plan for how to fund the site other than “we’ll think of something or sell it”.
You call out some “boring” technologies like MySQL, PHP and the like. Is there any automation in the administration of the site? I’m looking more for information on puppet, cfengine, kickstart, or other system administration tools rather than any automation within the application itself.
We have various cron jobs for backup (database and otherwise) and a simple deployment script that checks out a clean source tree and rsyncs it to the servers. Too much of the site now runs in screen; I’m in the process of trying to corral the various long-running scripts so I can launch, restart and shut them down more easily.
How do you monitor the health of the site? Cacti, or Nagios or ???
I’ve finally been able to hire a sysadmin to install Cacti and Nagios. At the moment we use a homebrew script that collects various stats and generates a webpage every few minutes. It makes the sysadmin cry.
What do you see as the future of Pinboard? I see that there’s a technology roadmap, but over and above that, what do you see? User growth, becoming the next Billion dollar .COM? 🙂 Any novel and Internet-changing features coming out?
We’ve been thrown a curve ball with the demise of delicious. Suddenly there’s a big gap in online bookmarking and I’m trying to decide to what extent we want to try and fill it. At its heart, the site is going to remain a personal archive, so much of the work we put into it will be to enable that. But it’s also an opportunity to try some things that were in the original delicious roadmap and got killed by Yahoo – group bookmarking, more interesting ways to search and present our data, tools for organizing really huge collections.
I think user growth has been the siren song of many otherwise great projects. I want to make sure we grow organically and slowly enough that it does not become a distraction. I hope to work on Pinboard for many years to come, keep it independent, and enjoy being my own boss.
How are you dealing with your new fame? 🙂
By trying to convert it into riches!
What questions have I not asked? What things would you like your users and the interview readers to know?
I think people are often surprised by how small developer teams are, even inside huge companies. Many of our users write to us thinking their email will land in a help desk queue, not knowing it all goes straight to the developers. The remarkable amount of technical leverage available is allowing personal projects to compete on a world class level. I have in mind single-developer apps like Instapaper and Minecraft. This ability for single developers to build an entire product is recent and I think we’re going to see all kinds of exciting things come of it.