Archive for January, 2012
If you’re a system (or network, or storage, or database, or security…) administrator you need to be a continuous learner. And you need to read widely, so that you get perspectives from people in different situations. Your peers are out there doing interesting things, and you should take advantage of their experiences.
Head on over to the blogs by the members of LOPSA, the system administrator’s professional society:
The RSS feed leads to 999 blogs about all aspects of system and network administration. It’s the best collection of “what’s actually happening now” in sysadmin that I’ve seen. If you’re serious about your craft, you should seriously consider spending time each week reading some of these.
The last two weeks at work have been some of the most fun in the past few years. A few months ago I moved from management back to my first love: deep technical work. In my new position I’m responsible (with a co-worker) for technical strategy, creating our Enterprise Architecture, and forward-looking technical projects. We’re also tasked with finding new ways to collaborate and take on projects as well as take a hard look to ensure that IT is supporting the rest of the business.
For some of these, we act as facilitators for IT projects, even though we aren’t in the management chain.
IPv6 has been one of my “back burner” projects for almost a year. There is a business mandate that we must have IPv6 connectivity to one of the inter-corporate networks by 1 April. A select set of our internal users need to have IPv6 connectivity to business applications that will only be available over IPv6 via this network.
To prepare for this, we had a need to ramp up IPv6 knowledge from almost nothing, to ready to plan a limited IPv6 deployment next month.
We decided to try a new project methodology (loosely) based on agile concepts: we performed IPv6 testing and deployment preparation as a “sprint”. We got 12 of our most senior system and network admins together in a large conference room with a pile of hardware, a stack of OS install disks, a new IPv6 transit connection and said, “Go!”.
No distractions, no email, no phone calls. Just 12 people off in a completely different building, in a big room with a pile of gear and the mandate to “explore IPv6” and learn enough to be comfortable planning a limited IPv6 deployment at the end.
It was great seeing people from different IT departments who usually specialize in Linux, MS Windows, VMWare, networking, security, etc. all come together to explore IPv6 on all these platforms, bring up services, test, find vendor bugs 🙂 and in general build a standalone IPv6 lab from scratch.
We truly did start from scratch; we started with an empty room, a bunch of tables and chairs, two pallets of PCs, assorted network kit, three boxes of ethernet cables and installation media.
Along the way, all of these people stepped out of their comfort zones, learned about each others’ specializations, and worked together for a common goal that we all created together.
At the end of the 2 weeks, we had a fully functioning dual-stack IPv4/IPv6 network:
- Routers and switches, firewall and IPv4/6 transit from a new provider
- Fully functioning Windows infrastructure: AD, DNS, DHCP, IIS, Exchange, etc.
- Linux infrastructure: DNS, DHCP, syslog, apache, Splunk, Puppet (mostly)
- Windows Server 2008 and 2008 R2, Windows 7 clients
- Linux Centos 5 and 6 servers and desktop
- MacOS Snow Leopard and Lion clients
All the results and everything we learned is documented in a wiki full of IPv6 configurations, hints and tips, debugging info, links to IPv6 info, lessons learned and plans for IPv6 next steps to production. I think we generated about 50-60 pages of new documentation along the way on IPv6, and about 6 pages of notes on the sprint experience itself.
The sprint wasn’t perfect, and we had a few stumbles along the way. But we learned a lot about how to run these kinds of sprints, and we’re pretty sure that we’ll have more of them in the future.
We also had two full weeks of face time with our colleagues from four sites in two states. In some cases we had never met each other in person, but had been exchanging email and tickets for years.
It was incredibly productive two weeks. We learned a lot about IPv6, each other and found new ways to work together.
Last year brought us World IPv6 (test) day on June 8. Dozens of content providers, network backbones and other technical groups came together to do a live test of IPv6 in production. Results were very good, and provided enough evidence that planning for a real, permanent cutover to full “dual stack” was practical.
However, there were enough issues that many of the participants took down their IPv6 sites after the experiment.
But this year, it’s gonna be real. June 6 2012 is World IPv6 Launch Day. The same big names and many other are participating. More importantly, some of the major providers of CPE (customer premise equipment) AKA “home routers” are committed as well.
Cisco and D-Link are committed to shipping “home equipment” with compliant IPv6 stacks and Ipv6 enabled by default by this date. Facebook, Google, Bing and Yahoo! will all permanently enable IPv6 for their main sites. In the US, AT&T, Comcast and Time-Warner will activate IPv6 for at “significant” portions of their home wireline customers.
And this time, it’s permanent. Unlike the 24 hour experiment last year, this is a permanent change. I expect that all the participants will have to shake out configuration issues and software bugs after the launch, but at least now they are committed to making IPv6 work for everyone, from now on.
The only thing that might make this better would be commitments from the operating system vendors. Apple, Microsoft and the Linux community already have known issues that will need to be addressed. Having the home router providers commit to some level of IPv6 support (firmware upgrades) for at least some currently shipping products would also be good, but I suspect they would rather sell new gear.
- World IPv6 Launch on June 6, 2012, To Bring Permanent IPv6 Deployment (internetsociety.org)
To recap, a useful system logging solution consists of four components: generation, transport, storage and analysis.
I will argue if you already have any logs at all, that your first step should be to build an analysis capability. This will let you begin to analyze the logs you already have, become familiar with your analysis tool on a smaller dataset and use the analysis tool to help debug any problems that you encounter while building the rest of the system.
I’ve been a big Splunk fan for years. The Splunk folks understand system and network administration and that shows in the design and capabilities of the product. The free “home” license is a great contribution to the community, too.
There is a lot of good documentation out there on getting started with Splunk, so I’ll focus on what it allowed me to find instead of the details of using it. I encourage you to experiment and try different kinds of searches, you’ll be surprised at what you find.
After starting Splunk, I pointed it at my /var/log directory, which has all the usual system logs, and also all my Apache logs. Splunk indexed about 2 million log events in less than 8 minutes, on my low-power Atom CPU with only 2G RAM and a single 150G IDE laptop disk.
In the 30 minutes or so, I found (all on a single host, all in the last 30 days)
- 935 root SSH root login attempts
- 838 attempts to exploit PHP bugs in my web server
- 20 attempts to buffer overflow my web server
- over 100K attempts to delivery SPAM or use my hosts as a mail relay
- 40 attempts to use MyAdmin scripts (which I don’t have)
So, less than 30 minutes to install Splunk and 30 minutes of playing with the search tool has already paid off 🙂
Next steps: get the home router sending its logs to the log server and setting up some Splunk “canned” searches.
I am a huge system log junkie. Logs are my go-to first place to look when there is a problem of almost any kind. I think they are one of the most under-utilized collections of useful information that a system (or network) administrator can use. System logs can tell you what has happened (system outages, security incidents), what is happening (performance monitoring and debugging) and what may happen in the future (trending).
At one time in the deep past I “owned” the first large-scale system log collection: 10 years (1993-2003) of continuous logs gathered from over 500 hosts, including four major supercomputers. That was one of (if not the first) large scale log repositories and it provided a great data set for log analysis for SDSC.EDU and CAIDA.ORG administrators and researchers. The log repo was incredibly useful for security research and practical intrusion analysis.
The most important thing to remember is that system logs are created in real-time, and if not captured (and saved), are lost forever.
A useful system logging solution consists of four components: generation, transport, storage and analysis.
Fortunately, you don’t have to build an entire complex large-scale system before you start seeing some value. As soon as you begin to generate and analyze a few log sources, you begin getting a return on your time investment. Your syslog system can grow incrementally, as needed and as time (and budget) permit. You can start small and simple and get some value, and then every small improvement or every system (log source) added to the collection just adds more value.
For a single host you can do an entire log solution on a single host: logs are generated locally, transport is local sockets, storage is on local disk and you analyze with grep (or even Splunk). In a solution like this, most of your incremental improvements will be in making sure that new software is logging as it is installed, and in improving your analysis methods.
I believe that any collection of more than about 3-5 hosts (or network devices) should have a central log repository. Being able to see everything that is going on in one place and correlate events across the network can be invaluable in trouble shooting problems and interactions between the systems.
I’ll be fixing up the system log situation here art home over the next few weeks, to include gathering and processing logs from all the Linux, Windows, Mac and other devices on the home network. I wonder what I will find as I begin the analysis?