Now that you have a fairly good understanding of scalability and availability, the next step is to familiarize yourself with the techniques you can use to achieve scalable and highly available Web sites.
This section describes the following topics:
Clustering is a technique in which two or more Web servers supporting one or more domains (www.yourcompany.com
) are grouped together as a cluster of servers to collectively accommodate increases in load and provide system redundancy.
The following figure shows an example of a server cluster for a sample Web site:
Clustering for scalability works by distributing load among each server in the cluster (load balancing) using either an unintelligent-but-regular distribution sequence (round-robin DNS and routers) or a predefined threshold or algorithm that you specify and can adjust for each server in the cluster (specialized clustering software).
Clustering for failover relies on redundant servers to ensure that business-critical applications remain available if one of the servers in a cluster fails. Intelligent software-based failover solutions can detect when a server has failed and automatically redirect new incoming HTTP requests to the cluster members that are available. Some hardware-based failover devices that have less built-in intelligence require an administrator's intervention once the failure is detected.
Clustering can be accomplished using software-based solutions, such as round-robin DNS by itself or together with a third-party package, a hardware-based solution, such as a packet router, or a combination of the two.
The most common and reliable hardware-based clustering solution is a device known as a packet router. One of the most popular routers on the market is Cisco System's LocalDirector. A router sits in front of a cluster of Web servers and directs incoming HTTP requests to available Web servers that form the cluster. A router works by assessing the speed and volume of IP packet flow to and from the Web servers and then selecting the best server to accommodate the traffic. This process is fast and efficient. The router device in conjunction with the clustered Web servers comprise what is known as a virtual server.
Routers are considered semi-intelligent devices because they can detect a server failure and redirect requests to other servers. If a Web server fails or stops responding, the router stops sending packets to the unresponsive server. Routers are not considered fully intelligent because while they can redirect requests upon discovering a failure, they do not allow you to configure redirection thresholds for individual servers. They also do not provide for application-aware load balancing.
The following figure shows a router distributing requests in round-robin fashion to the available servers in a Web server cluster:
A hardware-based clustering solution, such as a router, is an attractive solution for the following reasons:
Routers can load balance in a round-robin fashion, detect failures, redirect traffic and remove failed servers from a cluster.
Note Not all load-balancing devices have the same features or offer the same capabilities. |
Carefully evaluate the following issues against a router's attributes:
Hardware devices can be expensive relative to some software solutions, even without yearly licensing fees.
If a problem develops on the load-balancing device itself and it fails, your load balancing and failover strategies are no longer working. Although some load-balancing devices come with secondary systems for just this reason, this additional equipment is often what inflates the overall price of a hardware solution.
The device cannot be tuned for particular types of Web applications (static vs. dynamic sites) or for the development tools used to build them (scriptlets vs. JSP vs. CGI vs. ASP and so on). Consequently, a router cannot measure the performance of a Web application server.
The device does not allow you to configure individual load and redirection thresholds for each server in a cluster, and therefore, it is unable to effectively manage load to prevent failures.
There are several flavors of software-based clustering solutions on the market. Just like hardware-based clustering solutions, there are strengths and weaknesses associated with each. These software solutions include:
A very popular choice because of its relative simplicity and low implementation cost, but it does not contain any intelligence for load-balancing or failover.
Two cloned systems provide redundancy for one another. This type of clustering does not provide any parallel server load balancing.
Combines the advantages of round-robin DNS and backup clustering to provide simplicity with intelligence and redundancy.
ClusterCATS, Allaire's software clustering solution for load balancing and high availability, allows you to easily create, optimize, and maintain "smart" clusters to support your Web applications. ClusterCATS runs on NT, Solaris, and Linux platforms and works with leading mission-critical Web servers, including Microsoft IIS, Netscape Enterprise Server, and Apache. It is easily administered from remote locations and provides robust features, including:
The following benefits make a software-based clustering solution attractive:
Compared to the cost of hardware devices, such as routers or switches, software-based clustering solutions are relatively inexpensive. In fact, you can cheaply implement Internet DNS on UNIX and Windows platforms for initial load balancing needs and augment it with third-party clustering software.
Some clustering software can augment existing hardware devices, thereby providing a more robust load balancing and failover solution. Additionally, by integrating hardware with software, you diminish, if not eliminate, losses on capital expenditures that your organization has already made. See "Combining hardware and software clustering solutions" and "Load-Balancing Devices" for more information about how hardware and software solutions can be integrated.
Some software solutions provide a level of intelligence that enables preventive load balancing measures that actually minimize the chance of servers becoming unavailable. In the event that a server does becomes overloaded or actually fails, some software can automatically detect the problem and reroute HTTP requests to available servers in the cluster.
By distributing the load balancing and failover capabilities among multiple servers in a cluster or multiple clusters, as opposed to relying on only a single device, no individual server failure can disable your application.
Consider the following issues when evaluating software-based solutions for your environment:
Not all software-based clustering solutions are the same in terms of capabilities and features. For instance, some have no automatic failure detection, notification, or IP address assumption, and others have significantly delayed detection. Some let you configure load thresholds to enable preventive measures, some don't. Determine your scalability and failover needs in advance and pick your solution accordingly.
Determine if the software solution you are considering will be available on your platform or operate with your preferred Web server. If reviewing data sheets and other marketing collateral from vendors, make sure that the robust features you want are available on the platform you need.
Some software-based clustering solutions have relatively low complexity. Others introduce a higher level of complexity because of the features offered, the amount of initial configuration and subsequent administration, or the amount of integration that needs to occur between other systems and devices.
Instead of having to choose either a hardware solution or a software solution, another possibility is to combine both types of clustering choices. Combining hardware and software solutions will certainly provide the greatest scalability and availability capabilities for your site. Additionally, a combined solution is an attractive option if your organization has already invested in one but is looking for more comprehensive coverage. Having the flexibility to integrate hardware with software means that your organization won't necessarily have to absorb a capital loss on a previous technology investment if you decide to purchase additional clustering technology.
However, as already discussed, not all hardware or software solutions are equal. Many have different features and capabilities, and not all hardware and software integrate well together. Be sure to investigate thoroughly when purchasing additional technology to augment your current solution.
For a visual representation of hardware and software clustering solutions working together, see "Hardware-based clustering solutions".