On Sept. 11, 2001, many New York businesses disappeared from the Internet because their DNS services were fragile. Fragility is the opposite of resilience, the ability to continue operations despite damage to individual elements.
User-reported difficulties indicate insufficient resilience. The first reports of infrastructure problems should come from internal monitoring systems, not a flurry of phone calls from users.
DNS translates domain names into IP addresses. The most publicized concerns with DNS involve root name servers, which are beyond the control of typical Internet users. Less publicized are issues involving the organization and provisioning of the name servers for enterprise domains, which are within a company’s control and often are neglected.
A misconception is that a company’s ISP is responsible for providing servers to answer queries for the company’s domains. While most ISPs provide DNS services for their customers, the details vary greatly. Some ISPs will act as authoritative secondary name servers, downloading the actual DNS zones from user-maintained DNS servers; some will not. Beware: DNS failure is e-commerce death.
In the end, DNS resilience is determined by the steps a company takes to ensure that its domain data remains available to the Internet.
The most rudimentary step to ensure resilience of your Internet presence is to always have, at a minimum, primary and secondary DNS servers for the domain. These servers should be distinct systems in different locations.
Single points of failure must be avoided. Achieving geographic dispersion is neither difficult nor expensive. Resorting to a hosting service or ISP is often unnecessary, although it is an option. A field office or sister organization easily can provide the few cubic feet and kilobytes per hour (yes, per hour) required to house an alternate DNS server. The system even can be managed remotely.
A production site with many concurrent users justifies extensive monitoring. Each link in the chain connecting customers to the site should be monitored on a basis sufficient to alert the organization to problems in a timely manner. With DNS servers, regular verification that the name servers are online and responding properly is prudent.
Diversity of carriers, geographic location and routing are important steps to ensure that single-source errors do not disrupt your DNS services and impair your Internet presence.
In a piece of fabric, an individual thread or moderate number of threads may break without compromising the function of the whole. Analogously, failures that do not result in service disruption will never lead to customer dissatisfaction. Dispersion of functionality is far less expensive and far more resilient than attempts to harden facilities beyond the possibility of damage.
Gezelter is a network security consultant and a contributor to The Computer Security Handbook, 4th Edition. He can be reached at email@example.com.