I am a huge system log junkie. Logs are my go-to first place to look when there is a problem of almost any kind. I think they are one of the most under-utilized collections of useful information that a system (or network) administrator can use. System logs can tell you what has happened (system outages, security incidents), what is happening (performance monitoring and debugging) and what may happen in the future (trending).
At one time in the deep past I “owned” the first large-scale system log collection: 10 years (1993-2003) of continuous logs gathered from over 500 hosts, including four major supercomputers. That was one of (if not the first) large scale log repositories and it provided a great data set for log analysis for SDSC.EDU and CAIDA.ORG administrators and researchers. The log repo was incredibly useful for security research and practical intrusion analysis.
The most important thing to remember is that system logs are created in real-time, and if not captured (and saved), are lost forever.
A useful system logging solution consists of four components: generation, transport, storage and analysis.
Fortunately, you don’t have to build an entire complex large-scale system before you start seeing some value. As soon as you begin to generate and analyze a few log sources, you begin getting a return on your time investment. Your syslog system can grow incrementally, as needed and as time (and budget) permit. You can start small and simple and get some value, and then every small improvement or every system (log source) added to the collection just adds more value.
For a single host you can do an entire log solution on a single host: logs are generated locally, transport is local sockets, storage is on local disk and you analyze with grep (or even Splunk). In a solution like this, most of your incremental improvements will be in making sure that new software is logging as it is installed, and in improving your analysis methods.
I believe that any collection of more than about 3-5 hosts (or network devices) should have a central log repository. Being able to see everything that is going on in one place and correlate events across the network can be invaluable in trouble shooting problems and interactions between the systems.
I’ll be fixing up the system log situation here art home over the next few weeks, to include gathering and processing logs from all the Linux, Windows, Mac and other devices on the home network. I wonder what I will find as I begin the analysis?