Tools used to monitor Unix-based servers have been around for more than a decade. During this time, we have seen many products come and go, with successful tools adapting and maturing as their customer base drove new features and functionalists. Given the current economic climate and the increasing interest in improving IT operations and reducing costs, many organizations are re-evaluating their existing tool investments and considering what was once thought improbable: replacing entrenched tools with newer tools/technologies, usually from different vendors. This action begs the question, “What evaluation criteria should be used?”
META Trend: Through 2005, users will seek to minimize the number of infrastructure and application management vendors for similar technology (e.g., one monitoring vendor). Through 2006, new infrastructure and service pricing models (e.g., subscription, usage, term), continuing capacity- and version-driven cost increases, and expiring enterprise agreements will cause buyers to reassess procurement efforts. The focus will be on more function-specific, loosely coupled, “good enough” suites built around Web services protocols.
With a technology base that can be a decade old, many Unix monitoring tools have not aged gracefully to fit comfortably into the stable of tools relied on by IT organizations (ITOs) for day-to-day monitoring. Many vendors have continued to invest R&D dollars into such technologies in an effort to stave off obsolescence, while others have moved their older products into more of a maintenance mode and sought to develop or acquire newer technology more in line with managing a complex modern data center. The end result for the consumer is a very jagged monitoring tool landscape that must be navigated wisely.
With the current more complex environment due to increasingly interconnected technologies (e.g., Web servers connected to application servers connected to database servers), the typical common assumption of “follow the market leader and you cannot go wrong” does not quite work as well. ITOs have learned that viewing infrastructure technology in segmented silos is somewhat helpful, but now mapping applications or business services across such silos raises their utilities to a whole new level. This is not to say that component level (silo) management is no longer needed and that everything should be looked at from an application topology perspective. Rather, silos need an added layer of intelligence on top of them to remain relevant. Given the state of the market and consumers’ increasing demands for tools, it is expected that, during the course of the next 18 months, features like automatic technology relationship discovery and mapping, more intelligent business views of monitoring data, and business impact assessments of identified problems will be the norm. There are four key areas of consideration when evaluating Unix monitoring technologies.
Agent Versus Agentless Versus a “Good” Agent. A great deal of discussion has taken place in the past about the best approach as to how data should be collected from Unix servers. The typical dilemma has been whether to use an agent on the server or to use a polling mechanism to collect data from across the network. Although it is commonly agreed that an agentless approach is preferable only when more simplistic data is needed and agents are preferred when more detailed data (typically application level) is required (as well as local action), there is a third dimension to this discussion to consider, which is whether or not the agent is a good agent.
A good agent would be described as having two main characteristics above and beyond what would be expected from an agent:
– A good agent is easy to implement: A good agent should be able to be installed on a server remotely with a minimum of system administrator involvement required.
– A good agent is a good technology citizen: A good agent in this respect is self-contained, does not compromise the integrity or security of the OS or applications, is resilient to problems it encounters, and will never create difficulties on the server itself (e.g., core dump).
For most organizations, agentless technology will be the preference based in part on past experiences with poorly performing agents. Although previous experience is an important aspect to consider, the quality and the maturity of some agents have improved during the past five years. Therefore, considering implementation of a good agent is not something to be immediately ruled out. However, if data requirements are light for certain Unix systems, and no connectivity or security concerns are raised, agentless approaches will be the smart choice.
Modern Monitoring-Agent Features. As with any maturing product, we expect additional features to be added over time to try to differentiate one technology from another and to attempt to offer better value to the user. The Unix monitoring agent should be looked on no differently. Previously, the value of Unix agents was tied to the level of detailed data it could produce from a server and from applications. This is no longer the case. In this day and age, server and application data have become increasingly commoditized, and the agent’s feature set becomes the real differentiator. The following five collection features should be expected of modern Unix monitoring agents:
– Agents provide relevant data correlated within the server: Agents can dig very deeply into an application stack but can also tie problems back to the OS or hardware within a given server (e.g., local-level correlation).
– Agents provide automatic baselining of alert thresholds: One of the most significant reasons for failed Unix monitoring tool implementations is improperly set alert thresholds causing erroneous alerts, leading system administrators to doubt the validity of the warnings they receive. Current tools should be mature enough to study normal system activity and automatically set proper general-threshold values.
– Agents provide dynamic alerting based on common known criteria: Not only should current tools be able to set appropriate baseline thresholds, but they should also be advanced enough to allow for maintenance windows and provide a means to dynamically deal with cascading fault effects (e.g., when a server experiences a problem due to CPU failure, the OS and the applications on that server that must cope with the consequences are not misidentified as separate problems).
– Agents provide autonomous operational characteristics: When using agents to monitor Unix servers, it is expected that the agent is capable of performing all monitoring and alerting capabilities on its own, with the need to have orchestration from other sources unless desired.
– Agents provide correlation of local-agent data: The agent should not just trigger alerts based on a simple metric it collects. Rather, it should be able to correlate among multiple metrics it collects. It should have a map or a model of how its metrics relate and how each impacts one another. When one metric exceeds its threshold, but none of its related metrics is high, then the problem is not as significant.
Do You See What I See? The representation of Unix server and application data has commonly been through simple component-level references. One might have an icon representing a server, and when drilling down into that server, see all the system and application components that comprise that single device. In a simple world, this would be adequate. With current more complex infrastructures, it is not. Modern Unix monitoring tools must have a means through which to represent not just individual servers and their components, but also the relationships that exist between such servers/applications and other servers/application within the environment. Providing this type of interface often requires the creation of a technology relationship map on a simplistic level that can help monitoring operators graphically view the environment and be able to quickly discern where problems have occurred. The other purpose of this type of mapping is to give operators an intuitive view of the infrastructure to understand where a single problem may be causing additional outages. Without this type of view, operators are stuck viewing not-so-useful component-level details.
It Is Not the Data But What to Do With It. With the maturity levels in current tools, it is not surprising to see that most Unix monitoring agents can collect roughly the same type and amount of data out of servers and applications. The data in and of itself is no longer a secret sauce and a source of added value, but rather is more of a commoditized condiment. The real value of all the data collected by Unix monitoring agents now lies in what can be done with the data collected. Having performance-monitoring data in a meaningful format for analysis after doing preliminary comparisons to alert thresholds has now become of paramount importance. Maturing ITOs are considering being able to perform more capable capacity planning and trend analysis without having to collect data multiple times in multiple places. With this in mind, it is critical that data collected from Unix agents be reusable for other reporting purposes, and the structure and the portability of this data for other reporting needs be clearly addressed. Unix monitoring tools that do not do a good job of providing collected performance data in a reusable fashion should be viewed as tactical solutions only, and longer-term strategic products should be sought.
Business Impact: Implementing modern Unix monitoring tools will help ITOs better align their activities to those of the business.
Bottom Line: ITOs considering evaluating Unix monitoring tools should not settle for aging technology feature sets. Unix tools that map to the business, provide modern feature sets, and offer meaningful performance data output should be expected as the norm.