Compaq’s cluster-in-a-box ensures uptime

If your branch offices are calling out for ensured server uptime, you might want to take a look at Compaq Computer Corp.’s cluster-in-a-box server.

The CL1850 is positioned to be a midlevel, high-availability server for use in remote sites and branch offices where containing downtime is critical, but expensive horsepower and disk capacity aren’t warranted. This product is a cluster-in-a-box, the first in this class of servers that we have evaluated. It ships with two redundant servers, two RAID controllers, a built-in keyboard, a KVM switch and a shared drive array. All components reside in a single 10-rack enclosure.

Compaq only supported Windows NT 4.0 on the CL1850 at the time we did our testing, so we limited our tests to that operating system. Also installed on the server was Microsoft Corp.’s Clustering Services (MSCS). This installation was not easy and required intervention by Compaq support staff, but once everything was installed, the server ran without any noticeable problems.

The CL1850 has two redundant servers – which Compaq calls nodes – in the master enclosure. A node can be easily removed by pushing two metal retaining tabs and sliding the unit out from the front after disconnecting the Ethernet connections, KVM connections, drive connections and power source from the rear of the unit. A nice improvement would be having these connections made internally so that removing the node would make the disconnections as well. The unit comes in either a stand-alone tower configuration or a 19-inch rack configuration. We tested the stand-alone version.

Our unit came with one gigabyte of RAM, two Pentium III 550MHz processors with 512K of L2 cache, three Ethernet network interface cards (NIC), two RAID controllers and 16 9.1gigabyte drives. Eight of the drives were loaded in an external Compaq 4214T array enclosure in a tower configuration.

Two of the Ethernet NICs were used for network connections outside the cluster, and the third was used for connection between the two nodes. This connection provided each node with information on the state of the other node. When the primary node is no longer operational, the back-up node detects the change and becomes the primary node. The CL1850 is a failover cluster, not a load-balancing cluster.

Our hard drive configuration was complicated. The system had to be partitioned to optimize performance for our tests and it requires partitions for cluster operations. One drive in each node is used exclusively for the operating system. An UltraWide SCSI controller built into each node controls each operating system drive. The remaining 14 drives are split across two drive bays and are managed via the shared SCSI controllers. Six drives are in the internal shared cluster drive and eight drives in the external array enclosure. Ten of these drives are striped with RAID-0 into two partitions. One partition is used for the file data set, and the other was used for the SQL data set. These 10 drives are managed by one of the RAID controllers that ship with the server. The other RAID controller handles the remaining four drives. Two drives are striped with RAID-0 into two partitions used for SQL executables and the cluster quorum. The quorum partition is used for cluster system housekeeping. The remaining two drives are striped with RAID-0 into one partition for the SQL log files.

In this configuration, the RAID controllers are not redundant. Our controllers were configured without failover so both controllers can be used simultaneously to improve performance.

The CL1850 offers a new twist in serviceability: server hot swap. The cluster management software from Microsoft is called Cluster Manager; it resides on either node and lets you take a node down from the cluster. The node can then be powered down and swapped. This process takes about five minutes and has no bearing on server performance or uptime. The fit and finish of the components comprising the product can be polished to make swapping nodes easier. Once the node is out of the enclosure, it is easy to work on with plenty of room to service all the components.

We used Benchmark Factory test software to determine how long it took for the server to switch context from one node to another. We found the clients need about 160 seconds to recover from the server switching active nodes.

The Cluster Manager software also lets you move an application from one node to another, configure the nodes and monitor the cluster. The application is fairly intuitive, but the MSCS installation is cumbersome. The cluster does not always operate as expected.

But what about performance?

The CL1850 earned an 8.0 in overall performance. This server did best in file performance, scoring a respectable 8.3 due in part to the number of disk drives in the RAID-0 stripe set and the good performance of the RAID controllers. The CL1850 earned an 8.2 for CPU performance, which is about what we expect for dual 550-MHz PIII processors. The server scored a 7.5 in network performance due in part to possible network overhead of the cluster stack.

The CL1850 scored high with a 9.2 in features and flexibility because of its clustering capabilities, 66MHz PCI and drive capacity.

The CL1850 earned a 7.8 in manageability and a 10 for serviceability. Manageability was lacking due to the bugs and usability problems with MSCS. Serviceability was a breeze, allowing an administrator to completely replace the processors and system memory one OserverO at a time without any downtime.

The Compaq CL1850 is a great server if uptime is of utmost importance. Once the cluster is up and running, it is a breeze to maintain and service.

Configuration is another story. This server set is not for the faint of heart. Installing the cluster with NT 4.0 with MSCS and then installing applications over the cluster can be frustrating. As Microsoft cleans up the usability of its clustering software, these hardware cluster solutions have the potential to become prevalent in the mission-critical small to midsize server market.