Search tools have simplified our lives in many ways, so why not network management? So reason the founders of Splunk, a start-up that has released a search product to make sense of logs and other types of event information generated by systems as they go about their business.
Troubleshooting individual boxes is not difficult, according to Michael Baum, founder and chief executive splunker (yes, it says that on his card). The fun starts when you assemble multiple components into a system. No single vendor, developer, architect or administrator owns the problems that crop up, which usually stem from operator error, configuration errors, or integration and dependency problems.
“So customers approach it the old-fashioned way,” he says, “with picks and shovels.” To find out which of the many things that could go wrong did go wrong, you start digging.
One alternative is the autonomic self-healing approach advocated by IBM. Baum argues that although this approach may be feasible with stand-alone boxes, it’s impossible at the level of complex systems. “Automation is great, but it adds complexity,” he says. “Are you really increasing mean time between failures enough to cover the mean time to recover after a failure in these complex environments?”
Splunk sides with the experts exploring recovery-oriented computing. They assume systems are complex and failures are inevitable, so therefore it is a matter of how fast you can recover.
Baum says Splunk’s search tool is all about fast recovery. A typical application server, data-base or Web server can generate 100MB of event data per day. “And when something goes wrong, we ask people to make sense of it all,” he says.
With Splunk’s search product, every event builds a fingerprint based on its syntax and grammatical structure. The results are then organized into buckets, indexed by time and analyzed for relationships. That helps troubleshooters round up pertinent information from a range of resources and sift through the errors to find unique causal events. Log data changes every millisecond and all log data is different, so it’s not like searching documents or photos.
For now, the tool is intended for Java 2 Platform Enterprise Edition and messaging environments, and to augment commercial systems’ management tools. A free version of the product, called the Splunk Server, can be downloaded from splunk.com and used to index up to 500MB per day. Splunk Professional, which can be scheduled to run at set intervals, supports multiple user accounts and includes other features.