Archive for category cloud
In parts 1, 2 and 3 the focus was on getting the blog data out of the old system, cleaning it up, and converting it to a modern format that can be imported into a modern WordPress site. At this step, you can either spin up your own WordPress install, or just put it into hosted WordPress.
One of my goals was to never have to admin WordPress again. I’m tired of constantly having to patch it, or deal with security issues in plug-ins. So I’m putting everything into WordPress.com.
After part 3. we’ve got a WordPress WXR (WordPress eXtended RSS) export/import file. We just need a place to import it into.
Create a wordpress.com account and empty site
Load your WXR file
Log in to the control panel for your blog. Go to “Tools -> Import” to get to the Importer Screen. Select “WordPress” and follow the directions to upload your WXR file.
View your new blog
In the left menu panel, select “My Sites -> View Site” to see your new blog, with (hopefully) all your old content. Check the older entries, check embedded links. They *should* all be there. If they aren’t, you may have to go all the way back to Step 2, and re-do the editing, then Step 3 and Step 4! I got pretty lucky, or was thorough enough with my initial editing, that everything I needed was recovered completely.
Enjoy a Frosty Beverage to Celebrate
May I suggest a great California IPA?
At this point we’ve got a good MySQL dump of the compromised WordPress site. Now what?
To the cloud!
As I alluded to in the earlier parts, I’m going to load the MySQL dump from the ancient (compromised) site, then re-dump it out as WXR (WordPress backup) so that I can import the whole thing into WordPress.com.
I’ve got the database dump, now I need a WordPress instance to load it into.
In the olden days, I would have grabbed some hardware, loaded Linux, then mySQL, then Apache, then WordPress. I only need this for a few hours, so why spend a half day doing the basic installation? It turns out there’s a great alternative.
Bitnami has a pre-configured LAMP+WordPress image available from the Amazon Marketplace. I can use their image for only US$0.13/hour on a c1.medium AWS instance. or US$0.02/hour on a t1.tiny instance. I figure I need at least two-three hours of run time, and I don’t want to run into any size/space limitations of the t1.tiny instance. So I’ll gamble and use the c1.medium. That means I might spend up to a little over US$0.50 (c1.medium) if I need 4 hours instead of only US$0.08 for 4 hours if I use the t1.tiny. I’ll take that gamble 🙂
1. Spin up a WordPress instance using the Bitnami image
This was pretty easy. Just start from the Bitnami pre-configured image in the Marketplace, and then proceed to the launch area. You’ll see that there’s a m1.small instance type already selected. This is where you can decide to use a c1.medium, or take the m1.small default. Just proceed and spin up the instance. Then proceed to the AWS Console to get the DNS hostname.
2. Configure WordPress on the instance
At the bottom of the AWS console you’ll see a section labelled “AWS Marketplace Usage Instructions”. This will lead you to the username and the password (which will be in the instance’s boot log file). From there you can log into the WordPress instance over SSH with the username “bitnami” and your AWS private key.
3. Load and check the database
Log into the WordPress instance and use the control panel to load your MySQL dump into WordPress. Switch to the site view, and start scrolling through the blog posts and other links.
In my case, I found about a dozen posts that were still broken. This sent me back to the raw database edit (see Part 2) to re-edit the database text file dump. I edited out the broken records, re-dumped the database, and started again at step 1 above.
Once you have a valid WordPress site in your AWS instance, it’s time to get that WXR file we need for the import into WordPress.com.
4. Export the valid WordPress blog
Jump into the WordPress control panel, and use “Tools -> Export” to create a WXR file and download it to your computer. Once you’ve done this, you can spin down the AWS instance using the AWS console. Use “Terminate” so the EBS volume will be released as well.
We’re almost done. Next time, creating and loading the site into WordPress.com.
Let’s get started recovering the site. See Part 1 for the background. Note that I actually did this recovery in February 2015, and some software may have changed since then.
1. Dump the DB of the infected site in the test SQL dump format. This creates a human readable (and editable) file on my laptop.
There are all kinds of tutorials out there on dumping a SQL DB using phpMyAdmin. They are all better than I could write. This one, for example.
2. Examine and edit the DB dump file to remove any obvious damage. Is it worthwhile to continue?
For this I used Emacs. Yes, Emacs. You can use any test editor that you understand well, that has a “repeat this edit” or a general “search and replace” function. It must handle long lines, as each DB record is on a single loooong line. It helps if the editor can deal with escape characters. To make a long story short, the damage was almost immediately obvious. I was able to find the suspect lines and ^K (kill) them pretty quickly. For large values of “quickly”. There were about 1500 damaged or bogs records. Using search/replace and making a “fine pattern and kill line” worked wonders.
OK, after about 45 minutes of editing, I’ve got a clean database. All the records that I see are (probably) valid WordPress code/values or (probably) valid user records, or image pointers. It’s worthwhile to continue.
However, there’s still some cleanup, and this is a raw mySQL dump. I can’t import this into WordPress.com, yet. For that I need a WXR format dump, and this WordPress version was so old, that WXR isn’t even supported. I need a modern WordPress install somewhere that will accept the old MySQL dump and then allow a WXR export.
3. Install stand-alone WordPress somewhere (but how, and where?)
I’m going to use this new environment to examine the site in a sandboxed environment and get a chance at some forensics and to more completely assess the damage. This will also be the bridge between the raw mySQL dump and the WXR file that I import into WordPress.com later.
I expected that installing a new host and WordPress to take the most time of the entire process. In the olden days I would start with a physical host, do a full Linux install, add mySQL, Apache, etc and eventually WordPress. I don’t want to take this much time.
What’s the fastest, easiest way to get a full-blown WordPress setup? Turns out, the cloud is a pretty good place to start.
Is IPv6 getting enough traction to be called mainstream, yet? Sort of. Lots of groups are tracking world wide IPv6 adoption through various means, often looking at the percentage of web sites that are IPv6 reachable. But is this the right metric?
World IPv6 Launch Day did “prove” that IPv6 is viable and that more people are using it. But, does it matter that Romania has 8.64% adoption or that the US has 1.77%, or that France has 4.61%? How does that relate to a “real user”? The answer of course is that it doesn’t. I don’t (often) visit Romanian or French web sites, and the experience of Internet users in those countries is affected by the use (or lack) of IPv6 elsewhere in the world. Facebook, Google, Twitter and others are all worldwide communities.
One way to see how (if) IPv6 adoption is affecting you is to look at which web sites that you visit every day are IPv6 capable.
I took the last 30 days of browser history from my laptop and looked at the IPv6 reachability for the sites that I actually use on a regular basis. Here are the results.
I started with the Firefox add-in “History Export 0.4” and exported my history for the past 30 days. This showed that I visited over 10,000 URLs in the recent past. This raw data was massaged by some Perl and Emacs macros to process, sort and extract unique domain names. Finally I used a bash script to do DNS lookups for A and AAAA records for all the unique hostnames that remained.
Here are the results:
- 10202 URLs in 30 day history
- 1310 unique host names (there were lots and lots and lots of Gmail URLs!)
- 125 of the unique host names had AAAA records indicating IPv6 reachability
- 2(!) hosts had AAAA records and no A records – IPv6 ONLY sites! W00t!
So about 9.5% of the sites that I visited in the past 30 days are IPv6 capable. That’s more than I had expected and more than the general Google IPv6 stats would suggest. Now, since I am doing IPv6 work I would expect to be an outlier, but am I an outlier for that reason?
Of the 125 IPv6 sites:
- 27 are Google properties (Google, Gmail. Blogger, Google Code, Youtube, Android Market and similar)
- 11 are IPv6 information or test sites
- 10 are US .gov (more on this later)
- 5 are notable open source projects (ISC Bind, ISC DHCP, Mozilla, Ubuntu, Fedora)
- 8 are larger .EDUs like Stanford, UCLA, UCSD
- 6 Internet governance and operations sites (IANA, IETF, Internet Society, ARIN, APNIC)
- 6 are blogs hosted at blogspot.com
- 5 of the sites are Facebook properties
- 2 Netflix sites
- 2 Wikipedia
- 2 Yahoo properties
- the rest (almost 40) are singleton sites – individually hosted blogs, news and aggregation sites (political, tech)
This means that 91% of the IPv6 sites I visit are probably typical for a “regular” Internet user. Most of the most popular properties are represented in IPv6-land: Google, Facebook, Wikipedia, NetFlix and similar.
It also shows that while individual sites are making progress (those 37 singletons), many hosted sites will get upgraded through no action of their own when their hosting provider (or cloud provider) makes the switch. Between then, Blogger and Blogspot are now hosting thousands of personal blogs that are IPv6 capable.
Personally, I can’t wait for WordPress.com to make the switch.
I’ll be at the Gartner Datacenter conference in Las Vegas all this week. In my new role at work I’m no longer directly responsible for our US datacenters, but I will be helping to shape our world wide datacenter and networking strategies (among others). If the conference is anything like last year’s there will be LOT of “cloud” in addition to the core topic. It will be interesting to see updates on the major initiatives that large scale operations like Bank of America, eBay and others talked about last year.
The usual Twitter hashtag for the conference is #gartnerdc. If you’re interested in datacenters, “devops”, “green IT”, “orchestration” or “cloud”, I recommend that you follow the tag.
The IPv6 series will continue as usual next week with posts on Tuesday and Thursday.
It’s not you, it’s me. You see, I care about my privacy and what I share with friends and the Internet at large. I also care about what I share with you and other companies.
I was hesitant to use your service, but I read your terms, and got the strong impression that you cared about my privacy and security.
So, I’ve deactivated all three of my Macs and my Droid, and deleted my Dropbox account. Fortunately, I didn’t use that password anywhere else, so I’m done.
Your service was convenient, so I’ll check our your competitors to see if they have a better security posture and more transparency. If so, I’ll likely end up paying for their service. Thanks for showing me how useful a sharing service like yours could be, but too bad I couldn’t stay with you.
This prediction was from January 2010, and of course predates the recent troubles at Amazon and other cloud providers. Also, 2011 saw some re-evaluation of “the cloud” as a panacea for all IT ills. And yes, some companies have made transitions to nearly 100% cloud operations.
Let’s take a closer look at this statement and dive a little deeper into some of the trends behind this prediction. The key trends are virtualization, “X as a Service” and employee desktop management.
Virtualization is the easy one. Everyone either has or is in the process of virtualizing wherever possible. Whether that is virtualizing legacy services, or taking advantage of virtualization features for reliability or redundancy, it is a well-established strategy that has definable benefits. One key idea from Gartner’s Datacenter and Cloud conference last year was internal virtualization as a required stepping stone to public or private cloud. Now that server, storage and networking virtualization are all solved problems, we’re seeing more interest in virtualizing the desktop and that will dovetail nicely with desktop management.
The next trend is “X as a Service”. Whether you’re talking Infrastructure, Platform or Software, all of these are making good progress. Let’s start with Software as a Service. If you are a startup or smaller business, you could arguably perform most of your back office functions using hosted solutions. Sales support, HR, payroll, ERP, email and other services are all available from “the cloud”. More mature and larger organizations are also making more use of these, although perhaps at a slower pace. Platform as a Service is now mainstream, with an ever-increasing list of offerings and companies making use if them. Infrastructure as a Service is clearly here to stay, and many companies like NetFlix and Foursquare have “bet the ranch” on its viability.
All of the above trends were initially focused in servers and services. Virtual desktops have been around, and coupled with a new trend will further decrease the ownership of IT assets. The new trend is “employee owned desktops”. In this model, employees are given a stipend, coupons or other ways to buy their own laptops and/or home computers, which are then used as the employee’s primary interface to corporate resources. In some models IT still manages the entire machine, more commonly a standard “virtual desktop machine” is deployed and all company computing runs in the virtual machine. In all cases, the hardware is owned by the employee, who is responsible for loss, damage and hardware failure.
So what might this all this mean for IT organizations in companies that do proceed down this path?
I believe that those business will have about the same amount of IT staff, but (fewer or) no datacenters, networks or servers of their own. Their IT staff will be managing virtual assets from Amazon, Rackspace, IBM, HP and other IaaS, Paas and SaaS vendors. Their staff members will spend more time creating architectures, devising new solutions and creating new services, using services instead of hardware. They will spend less (or zero) time racking and repairing hardware and more time creating solutions in their own private clouds, built from other peoples’ hardware infrastructures
There will always be local datacenters, especially for high-performance storage and internal-facing apps, and to host our most high-performance applications where control and provisioning of the network is critical. Security will remain an important reason to not put everything in the claoud, but this will be an increasingly less important driver for non-cloud systems. But we will all be increasingly integrating hosted solutions from vendors, designing our solutions to run on other peoples’ hardware in other peoples’ datacenters, and managing IT assets that we do not own or physically install.
[Sorry for the sporadic posting. I’ve had more travel in the past 7 weeks than the last 2 years. I should be back to a more regular schedule soon.]
The “cloud” is still a new and curious beast for a lot of us, especially people who grew up in a more traditional hosting model. We have several generations of IT workers who have learned everything about hosting on our own hardware and networks. The flexibility of the cloud is a game-changer, and I’m continually learning new places where “conventional wisdom” will lead you down a difficult path.
Netflix has been kind enough to post their five key lessons from their cloud experiences on their tech blog. While these lessons may look simple and perhaps obvious in retrospect, there are two that really hit home with me:
1. Prepare to unlearn everything you know about running applications in your own datacenters.
3. The best way to avoid failure is to fail constantly.
First, an entire generation (or maybe two or three) of system and network administrators learned all of what we know about scale and reliability by running our own applications on our own servers in our own datacenters using our own networks. There are thousands of person-centuries of of experience that have created best (or at least “good”) practice on how to be successful in this model, but this has done very little to prepare us to be successful using cloud resources. In fact, it might even be working against us.
We’ve all got a lot to un-learn.
Second, in the olden days, uptime was king, and a high time between reboots (or crashes) was considered a mark of a capable system administrator. Failure was to be avoided at all costs, and testing failover (or disaster recovery) was done infrequently, if at all, due to the high impact and high costs. We did all get used to a more frequent reboot cycle, if only to be able to install all the needed security patches, but that was just a small change in focus, not a complete sea change.
In computing clouds, it is a given and an expectation that instances will fail at random, and the solution is to have an agile application, not to focus on high availability or increasing hardware reliability. Just as there is continuous development, testing and deployment, there needs to be continuous failover testing. Netflix created a tool (Chaos Money) specifically to force random failures in their production systems! That’s right, they are constantly creating failures, just to continuously test their failover methods, in the live production system.
That’s a) really hardcore, b) really scary and c) really cool.
That’s one way to put your reputation on the line. And it points out just how you need to do some very non-intuitive things, and unlearn decades of good practice to be successful in the cloud.
This interview was prompted by QRZ.com‘s recent move into “the cloud”. QRZ means “Who is calling me?” or “You are being called by ___”, which is very appropriate for what is widely considered to be the largest online community for amateur (“ham”) radio in the world. Moving this resource from traditional hosting into the cloud is an interesting comment on the readiness of the cloud to actually deliver for a community that has come to depend on this resource.
The computer and ham communities have a long history together. The original “hacker” community originally had quite a few ties to ham radio and computers, as all were involved with experimenting, especially with electronics. In fact, one possible origin for the term “hacker” its use by the amateur radio community from the 1950s to mean “creative tinkering to improve performance”. This continuing curiosity and desire to build and improve is a hallmark of these communities.
I’ve encountered a few system and network administrators who are hams, and vice versa. QRZ’s founder and publisher, Fred Lloyd, is no exception. Fred spent much of his career on the cutting edge of Internet adoption, working for Sun and other companies in Silicon Valley and other locations. As it turns out, he’s been a ham radio operator about as long.
Fred was kind enough to do an email interview with me earlier this week to discuss system administration, QRZ, ham radio, the Internet and his experiences in moving to the cloud.