Speedy returns are Google’s goal

Network World (U.S.)

While few Web sites can handle an average of 200 million queries and a billion HTTP requests per day, Google Inc. has done it for years. But the search engine leader wanted to do it better.

Known for its speedy return of relevant search topics, Google recently decided to give its site a boost with a Web server load-balancing upgrade. But the firm wanted a product that did more than just keep traffic flowing smoothly among its thousands of servers, says Urs Holzle, a Google fellow, who heads infrastructure strategy at the firm.

“What we were mostly looking for was performance and stability” in a load-balancing device, Holzle says. Stability includes not only uptime, he adds, but also “the ability to handle denial-of-service (attacks), viruses and ping floods. We’re now using the NetScaler (Inc.) switches to do that after deploying them about three months ago.”

Since its rollout began in January, Google has installed “dozens” – Holzle would not say specifically how many – of NetScaler 9800 Secure Application Switches. The boxes sit at the front end of Google’s Image Search Web presence, where the boxes balance traffic among Web servers and search engine boxes.

“Our image search is one of our most performance-intensive apps,” Holzle says. “Each search result has at least 25 with images. And we receive thousands of requests per second. The NetScaler boxes do well with that.”

While the NetScaler boxes have Gigabit interfaces and can handle several hundred megabits per second of Layer 4 to Layer 7 traffic processing, Holzle says, “we’re not exploiting that.” Most of Google’s traffic consists of bursts of traffic, instead of long, sustained packet flows. “Bandwidth is not a limitation for our applications,” he says.

Google’s data centres number “in the dozens” and operate in the U.S. and around the world in undisclosed locations. In the data centres, the firm operates hundreds of Web servers, running both Apache and a home-grown Web server application developed by Google.

These servers, typically Intel-based machines running Linux, sit in front of thousands (Google declined to give a number) of similarly configured search engine PCs, which run the proprietary applications that troll the Web and rank pages according to search criteria – Google’s claim to fame.

“We carefully optimized our site for speed before and after the NetScaler installation,” Holzle says. But he says the NetScaler boxes add a degree of speed and security to the site.

One of Google’s imperatives is to prevent its front-end Web servers from being pecked to death by floods of ping and TCP requests, whether benign or malicious. Google has deployed several features on the NetScaler boxes to do this.

“Attacks and malicious traffic are something that’s always happening here,” Holzle says. “As a large Internet site, we are always a potential target. We’re not going to wait until something happens.”

One way the NetScaler boxes provide security is by acting as a second-defence firewall, behind the company’s dedicated firewall boxes.

“All the load balancers are designed to accept IP traffic on Port 80,” Holzle says. “That’s what they forward. If incoming traffic does not designate a port, or is not IP, all packets automatically get dropped.”

Other features on the box allow rate limiting for such diagnostic traffic as ping and Internet Control Messaging Protocol, which can be used to swamp a Web server in a DoS attack. “That reduces the time our (Web server) CPUs have to spend on that,” he says.

Another NetScaler feature in use are “SYN cookies,” which let the NetScaler boxes process and respond to SYN messages – packets sent to initiate a TCP/IP connection, which can be used as a tool in “SYN flood” DoS attacks, where hundreds or thousands of hacked computers are coerced into sending bogus SYN packets to a Web server to overwhelm the box.

“SYN cookies allow you to respond to SYNs without congesting space on the box,” Holzle says. “That lets you sustain a high rate of SYNs without consuming CPU usage.”

While the NetScaler boxes offer Layer 7 deep packet inspection features, which some users turn on for such applications as HTTP, cookie and XML switching and traffic acceleration, Holzle says Google is not using these features.

“We’re using Layer 4 only,” Holzle says. “That’s because we have an existing Layer 7 solution which runs on Web servers.”

Load-balancing technology is not new to Google, as the firm had another vendor’s Layer 4 to Layer 7 gear installed for several years before the NetScaler rollout. (Holzle would not say which company.) The decision came down to “three major players” he says, and adds Google won on its performance, scalability and security features.

While Google got a good price on the NetScaler gear (Holzle wouldn’t give that away, either) he adds that cost was not a deciding factor in the deal. “We were really looking for the best performance out there,” he says.