Archive for category best practice
The last two weeks at work have been some of the most fun in the past few years. A few months ago I moved from management back to my first love: deep technical work. In my new position I’m responsible (with a co-worker) for technical strategy, creating our Enterprise Architecture, and forward-looking technical projects. We’re also tasked with finding new ways to collaborate and take on projects as well as take a hard look to ensure that IT is supporting the rest of the business.
For some of these, we act as facilitators for IT projects, even though we aren’t in the management chain.
IPv6 has been one of my “back burner” projects for almost a year. There is a business mandate that we must have IPv6 connectivity to one of the inter-corporate networks by 1 April. A select set of our internal users need to have IPv6 connectivity to business applications that will only be available over IPv6 via this network.
To prepare for this, we had a need to ramp up IPv6 knowledge from almost nothing, to ready to plan a limited IPv6 deployment next month.
We decided to try a new project methodology (loosely) based on agile concepts: we performed IPv6 testing and deployment preparation as a “sprint”. We got 12 of our most senior system and network admins together in a large conference room with a pile of hardware, a stack of OS install disks, a new IPv6 transit connection and said, “Go!”.
No distractions, no email, no phone calls. Just 12 people off in a completely different building, in a big room with a pile of gear and the mandate to “explore IPv6″ and learn enough to be comfortable planning a limited IPv6 deployment at the end.
It was great seeing people from different IT departments who usually specialize in Linux, MS Windows, VMWare, networking, security, etc. all come together to explore IPv6 on all these platforms, bring up services, test, find vendor bugs and in general build a standalone IPv6 lab from scratch.
We truly did start from scratch; we started with an empty room, a bunch of tables and chairs, two pallets of PCs, assorted network kit, three boxes of ethernet cables and installation media.
Along the way, all of these people stepped out of their comfort zones, learned about each others’ specializations, and worked together for a common goal that we all created together.
At the end of the 2 weeks, we had a fully functioning dual-stack IPv4/IPv6 network:
- Routers and switches, firewall and IPv4/6 transit from a new provider
- Fully functioning Windows infrastructure: AD, DNS, DHCP, IIS, Exchange, etc.
- Linux infrastructure: DNS, DHCP, syslog, apache, Splunk, Puppet (mostly)
- Windows Server 2008 and 2008 R2, Windows 7 clients
- Linux Centos 5 and 6 servers and desktop
- MacOS Snow Leopard and Lion clients
All the results and everything we learned is documented in a wiki full of IPv6 configurations, hints and tips, debugging info, links to IPv6 info, lessons learned and plans for IPv6 next steps to production. I think we generated about 50-60 pages of new documentation along the way on IPv6, and about 6 pages of notes on the sprint experience itself.
The sprint wasn’t perfect, and we had a few stumbles along the way. But we learned a lot about how to run these kinds of sprints, and we’re pretty sure that we’ll have more of them in the future.
We also had two full weeks of face time with our colleagues from four sites in two states. In some cases we had never met each other in person, but had been exchanging email and tickets for years.
It was incredibly productive two weeks. We learned a lot about IPv6, each other and found new ways to work together.
To recap, a useful system logging solution consists of four components: generation, transport, storage and analysis.
I will argue if you already have any logs at all, that your first step should be to build an analysis capability. This will let you begin to analyze the logs you already have, become familiar with your analysis tool on a smaller dataset and use the analysis tool to help debug any problems that you encounter while building the rest of the system.
I’ve been a big Splunk fan for years. The Splunk folks understand system and network administration and that shows in the design and capabilities of the product. The free “home” license is a great contribution to the community, too.
There is a lot of good documentation out there on getting started with Splunk, so I’ll focus on what it allowed me to find instead of the details of using it. I encourage you to experiment and try different kinds of searches, you’ll be surprised at what you find.
After starting Splunk, I pointed it at my /var/log directory, which has all the usual system logs, and also all my Apache logs. Splunk indexed about 2 million log events in less than 8 minutes, on my low-power Atom CPU with only 2G RAM and a single 150G IDE laptop disk.
In the 30 minutes or so, I found (all on a single host, all in the last 30 days)
- 935 root SSH root login attempts
- 838 attempts to exploit PHP bugs in my web server
- 20 attempts to buffer overflow my web server
- over 100K attempts to delivery SPAM or use my hosts as a mail relay
- 40 attempts to use MyAdmin scripts (which I don’t have)
So, less than 30 minutes to install Splunk and 30 minutes of playing with the search tool has already paid off
Next steps: get the home router sending its logs to the log server and setting up some Splunk “canned” searches.
I am a huge system log junkie. Logs are my go-to first place to look when there is a problem of almost any kind. I think they are one of the most under-utilized collections of useful information that a system (or network) administrator can use. System logs can tell you what has happened (system outages, security incidents), what is happening (performance monitoring and debugging) and what may happen in the future (trending).
At one time in the deep past I “owned” the first large-scale system log collection: 10 years (1993-2003) of continuous logs gathered from over 500 hosts, including four major supercomputers. That was one of (if not the first) large scale log repositories and it provided a great data set for log analysis for SDSC.EDU and CAIDA.ORG administrators and researchers. The log repo was incredibly useful for security research and practical intrusion analysis.
The most important thing to remember is that system logs are created in real-time, and if not captured (and saved), are lost forever.
A useful system logging solution consists of four components: generation, transport, storage and analysis.
Fortunately, you don’t have to build an entire complex large-scale system before you start seeing some value. As soon as you begin to generate and analyze a few log sources, you begin getting a return on your time investment. Your syslog system can grow incrementally, as needed and as time (and budget) permit. You can start small and simple and get some value, and then every small improvement or every system (log source) added to the collection just adds more value.
For a single host you can do an entire log solution on a single host: logs are generated locally, transport is local sockets, storage is on local disk and you analyze with grep (or even Splunk). In a solution like this, most of your incremental improvements will be in making sure that new software is logging as it is installed, and in improving your analysis methods.
I believe that any collection of more than about 3-5 hosts (or network devices) should have a central log repository. Being able to see everything that is going on in one place and correlate events across the network can be invaluable in trouble shooting problems and interactions between the systems.
I’ll be fixing up the system log situation here art home over the next few weeks, to include gathering and processing logs from all the Linux, Windows, Mac and other devices on the home network. I wonder what I will find as I begin the analysis?
- The first is to get AAAA (quad-A) records into your DNS system. At that point clients can ask for the AAAA records over IPv4 and everything will work just fine.
- The second is for you to actually serve your DNS zones over IPv6.
- The third is to get hooked into the global IPv6 DNS system, so that you (and others) can resolve your IPv6 addresses.
In this post, we will deal with the third part, ensuring that all the DNS servers needed to resolve my AAAA records are IPv6 capable. This step isn’t strictly necessary since, as I pointed out before, there’s nothing wrong with serving your AAAA records via IPv4.
First, let’s take a look at the symptoms of my problem:
$ dig -6 +short +trace ipv6.thuktun.org aaaa ;; connection timed out; no servers could be reached
What has happened here is that there is no authoritative DNS server that can be reached via IPv6. So, what’s the problem?
One thing that I’ve never mentioned is that “my” local DNS server is a hidden master. It holds all the zone files, but is not advertised. My advertised public DNS servers are elsewhere, and they pick up my zone data via AXFR whenever I make changes and they are sent a NOTIFY. So, while my local server has all the zone data, it will never be queried during a normal DNS lookup. The advertised DNS servers, the slaves, actually serve all the answers.
It turns out that there are two problems here:
- My external slave name servers aren’t IPv6 capable;
- My resolv.conf has no IPv6 name servers listed.
My external nameservers are run by a friend at his organization’s datacenter. They aren’t prepared to serve DNS over IPv6, and won’t be any time soon. The fastest way to fix this is to move my external DNS to a DNS hosting provider that is IPv6 capable. Fortunately, I can get IPv6 DNS from the same place that I get my IPv6 tunnel: Hurricane Electric.
Using their DNS slave server setup page, I can easily make Hurricane’s DNS servers be my public slave DNS servers. I do have to ensure that their DNS servers can do zone transfers from my hidden master, which is mostly handled by using the the allow-transfer statement in my Bind config files, which is left as an exercise for the student.
After setting up the slave servers at Hurricane, I can wait for them to zone transfer my data to their servers. This can take 5-10 minutes for the first transfer.
At this point, there are three completely independent sets of name servers that have correct data for my domain:
- My home hidden master
- The current advertised DNS servers at my friend’s organization
- The as-yet unadvertised DNS servers at Hurricane Electric
All that remains is to remove the advertisements for the “old” servers and to add advertisements for the new servers. This is done at your domain registrar. Fortunately, my registrar is Register4Less, and their DNS system can take IPv6 addresses. In fact, it was easier than I expected. Unlike some other registrars I’ve used in the past, at Register4Less, you enter the names of your DNS servers, not the IP addresses. Register4Less resolved the hostnames ns1-ns5.e.net and created NS records for both the IPv4 and IPv6 addresses.
There’s still one step left, and that’s to add some IPv6 name servers to my resolv.conf file. I could use the five DNS servers that Hurricane Electric has advertised, but I’ll go one step farther. I’ll use the anycast DNS server address that Hurricane gave me when I established my tunnel:
$ more /etc/resolv.conf nameserver 126.96.36.199 ;; IPv4 nameservers from my ISP nameserver 188.8.131.52 nameserver 184.108.40.206 nameserver 2001:470:20::2 ;; IPv6 anycast name server from HE.NET
And now here’s a full “trace” of IPv6 DNS resolution:
$ dig -6 +trace ipv6.thuktun.org aaaa ; <<>> DiG 9.7.3 <<>> -6 +trace ipv6.thuktun.org aaaa ;; global options: +cmd . 79981 IN NS g.root-servers.net. . 79981 IN NS j.root-servers.net. . 79981 IN NS a.root-servers.net. . 79981 IN NS i.root-servers.net. . 79981 IN NS h.root-servers.net. . 79981 IN NS l.root-servers.net. . 79981 IN NS c.root-servers.net. . 79981 IN NS b.root-servers.net. . 79981 IN NS d.root-servers.net. . 79981 IN NS m.root-servers.net. . 79981 IN NS f.root-servers.net. . 79981 IN NS e.root-servers.net. . 79981 IN NS k.root-servers.net. ;; Received 509 bytes from 2001:470:20::2#53(2001:470:20::2) in 31 ms org. 172800 IN NS b0.org.afilias-nst.org. org. 172800 IN NS a2.org.afilias-nst.info. org. 172800 IN NS d0.org.afilias-nst.org. org. 172800 IN NS c0.org.afilias-nst.info. org. 172800 IN NS a0.org.afilias-nst.info. org. 172800 IN NS b2.org.afilias-nst.org. ;; Received 436 bytes from 2001:7fe::53#53(i.root-servers.net) in 157 ms thuktun.org. 86400 IN NS ns5.he.net. thuktun.org. 86400 IN NS ns4.he.net. thuktun.org. 86400 IN NS ns2.he.net. thuktun.org. 86400 IN NS ns3.he.net. ;; Received 112 bytes from 2001:500:c::1#53(b0.org.afilias-nst.org) in 188 msipv6.thuktun.org. 28800 IN AAAA 2001:470:67:84::10 ;; Received 62 bytes from 2001:470:500::2#53(ns5.he.net) in 28 ms
And I’m done with DNS. Now I can go on to some other services, like SSH, FTP, HTTP, etc.
Now that I have a functioning IPv6 network, I can actually “see” how much of the public Internet (or at least web sites) are IPv6. Before I had the home net on IPv6, I was limited to just using DNS queries for AAAA records (over IPv4).
Here are a few images showing which sites/pages are loaded via IPv6, IPv4, or both.
This first one is interesting, ipv6.google.com. As you can see from the image, the main page (URL) is IPv6 (big green “6″), but other parts of the page loaded via IPv4 (little red “4″). Clicking on the 6/4 image in the URL bar shows you which parts loaded which way. The main URL is IPv6, but the other parts of the page loaded over IPv4. Note that plus.google.com loads over IPv4.
This next one is ipv6-test.com. Again the main page loads via IPv6, but the other content on the page is loaded from a combination of other sites running IPv4 and IPv6.
Here’s another IPv6 test site, test-ipv6.com. This one uses IPv4 for the main site, and then pulls elements over IPv6 and IPv4.
As one of the newest of Google’s Internet properties, it is not unexpected that plus.google.com loads over IPv6, at least in this example. Go back and look at the first example, however, where it loaded over IPv4. Strange…. However, the “+1″ system is still IPv4:
As I do my daily browsing, it’s interesting which sites come up over IPv6, and which don’t. I’m seeing more media and social sites on IPv6, and very few vendor sites. I had expected to see much more IPv6 from the big network kit vendors, but they are noticeably missing. Some of them “do” IPv6 on a separate host (ipv6.google.com, for example).
Not surprisingly, the main DREN web site is 100% IPv6.
I wonder if the social media sites will lead the charge, or the vendors? Right now, I’m not seeing a lot of commitment from companies that I would hope have a lot more IPv6 experience.
They are going to want my company’s money for new network gear in the coming year, and I’m going to be asking hard questions about why they don’t have their own main sites running IPv6.
- Google Internal Networks Are 95% IPv6 Now (techie-buzz.com)
With the clients now all speaking IPv6 (with IP addresses from stateless auto-config), and the server now having a global-scope static IPv6 address, it’s time to make this much more useful.
With IPv6 address being 128 bits (32 Hex characters), it’s just not practical to expect anyone to remember IP addresses. DNS becomes much more important, not only for servers (with static addresses) but for clients. Clients will in general get their “real” IPv6 address via DHCP6 and do dynamic DNS updates. (There’s a special “stateless” DHCPv6 that just listens for the auto-config’ed IP addresses and put them into DNS.)
There are three parts of getting to “IPv6 DNS”.
- The first is to get AAAA (quad-A) records into your DNS system. At that point clients can ask for the AAAA records over IPv4 and everything will work just fine.
- The second is for you to actually serve your DNS zones over IPv6.
- The third is to get hooked into the global IPv6 DNS system, so that others can resolve your IPv6 addresses.
In this installment, we’ll just do Step 1.
Lets do the AAAA records and test some queries. If you’re this far along, editing Bind zone files and using “dig” should be second nature for you, so I’m only going to show snippets from the zone files:
;;; services www a 220.127.116.11 ;; original IPv4 address www aaaa 2001:470:67:88::10 ;; NEW IPv6 address, same name ipv6 aaaa 2001:470:67:88::10 ;; NEW ipv6 address, new name for ease in testing
I’ve added two new records, a second “www” entry and a completely new “ipv6″ entry. The “ipv6″ entry is so that I have a hostname that has only an IPv6 address, and no IPv4 addresses. Let’s see what I can get (after I reload the zone)…
$ dig +short ipv6.thuktun.org # 1 asking for the "A" record for "ipv6" - NO AAAA records exist $ dig +short ipv6.thuktun.org aaaa # 2 asking for the AAAA record - SUCCESS 2001:470:67:88::10 $ dig +short www.thuktun.org # 3 asking for the "A" record for "www" - SUCCESS 18.104.22.168 $ dig +short www.thuktun.org aaaa # 4 ...and the AAAA record - SUCCESS 2001:470:67:88::10 dig -4 +short www.thuktun.org aaaa #5 force IPv4 query (which is actually the default) - SUCCESS 2001:470:67:88::10 $ dig -6 +short www.thuktun.org aaaa #6 force query over IPv6 transport - NO RESPONSE ^C #hangs
- By default “dig” queries for “A” records if no other record type is given.
- Be default “dig” queries over IPv4.
This explains why query #1 returns no data and why #3 returns the “A” record (only). To get the “AAAA” records, you have to explicitly ask for them with a record type. Finally, query #6 attempts to force the DNS queries to use IPv6 for transport, which hangs since there are no know IPv6 DNS resolvers configured in the system.
At this point we’ve achieved step 1, we have AAAA records in our DNS, and we can retrieve them via IPv4.
Next step, having our own DNS server answer queries over IPv6 transport.
One of this morning’s keynotes at Gartner Datacenter Conference (#gartnerdc) was an on-stage interview with Scott Dillon, EVP, Head of Technology Infrastructure for Wells Fargo. He was interviewed about the Wells/Wachovia merger, and the challenges faced by the organization.
While the talk was full of sound bytes about scale, talk about merger strategies and budgets, the discussion came back to culture over and over.
On the scale and technology side, there were tidbits like these:
- Wells Fargo employs 1 in 500 in the US;
- IT had 10,000 change events per month, before the merger;
- They have a physical presence within 2 miles of 50% of the US population.
But it was on the management side that I found the most interesting information.
Before the merger, there were clear guidelines, such as “if we have two systems, A and B that are doing the same thing, we will pick the best, either A or B. No C options.” This was a merger of equals, at least in terms of the technology. They chose the best of the two orgs, then committed to making that the One True New System for everyone. They ended up with an almost 50/50 split of technology from the two companies.
But, no matter where the talk went in management and technology, it just kept coming back to culture. Building one culture from the best of both was a top management priority for the entire company. Just as they (IT) selected the best tech from each, they (Executive management) worked to take the best of the culture from both, to be the foundation moving forward. They had a great advantage, as both companies share almost all of their core values, so this was a little easier than merging the technology. But there was an explicit decision to do this, it wasn’t left to chance.
Management made “culture” a number one priority. They focused on merging the culture as much as they focused on merging the technology. They made building communications between the employees an early priority. Very early on, they even created a “culture group” to look at the two cultures and make specific decisions about how to foster the culture merger.
Part of their culture involves employee value. Every company does “exit interviews” when employees leave. Wells does “stay interviews” where they engage with employees to actually gather their concerns, let them know how much the company values and appreciates them. Isn’t that better, to find any issues before key people leave? To constantly work to make the work environment better, instead of waiting until it’s too late?
In IT we often get too focused on the technology, and we can claim that “the business” is too focused on profits, or stock price, or some other “business” area.
When was the last time you heard a business, a bank, even, put their culture as one of their highest priorities?
More importantly, as IT, when was the last time we put “culture” high on our priority list?
At this point, there is an IPv6 tunnel from Hurricane Electric to my home Linux server. That’s OK, but not what I need for “production”. As I mentioned in the requirements, I want to use a commercial home router solution so that it can be easily replicated by some of our non-technical staff.
Based on recommendations from some Navy DREN folks, I selected an Apple Airport Extreme. Some of them have been doing IPv6 testing (and home IPv6 tunnels) for upwards of six-seven years. Based on their comments, the Airport has quite good IPv6 functionality. All that is needed is to load the tunnel configuration information from the tunnelbroker.net web page into the router, using the Airport Utility. You can see the instructions for this here.
The only drawback to the Airport is that every, and I mean every configuration change forces a reboot, which interrupts connectivity for about 30-50 seconds. Other than that, rock solid.
I’ll be at the Gartner Datacenter conference in Las Vegas all this week. In my new role at work I’m no longer directly responsible for our US datacenters, but I will be helping to shape our world wide datacenter and networking strategies (among others). If the conference is anything like last year’s there will be LOT of “cloud” in addition to the core topic. It will be interesting to see updates on the major initiatives that large scale operations like Bank of America, eBay and others talked about last year.
The usual Twitter hashtag for the conference is #gartnerdc. If you’re interested in datacenters, “devops”, “green IT”, “orchestration” or “cloud”, I recommend that you follow the tag.
The IPv6 series will continue as usual next week with posts on Tuesday and Thursday.
This evening I finished up the initial phase of the home IPv6 project. At this point, any client on my home network that fully supports IPv6 can connect to any IPv6 resources on the public Internet, such as http://test-ipv6.com/ and http://www.kame.net
(More on that “fully supports” thing, later.)
This should have taken just a few hours, but I ran into a lot of incorrect documentation and obsolete software that led me down a few ratholes. More on that later, too
For, now, it’s (dancing, IPv6) turtles, all the way down!