What is Web Site Availability?

As you've already learned from the previous section, it's critical to design, develop, test, and deploy your Web applications so that they can scale well under heavy and ever-increasing load. However, the reality is that in spite of the best-laid plans and preparations, servers can fail for seemingly unknown reasons, causing your site to become unavailable. If and when a server fails or becomes overloaded, regardless of why it has, you want to ensure that it won't adversely affect your business by preventing your customers from accessing and using your Web application. If it does, you risk jeopardizing your bottom line with lost sales and disgruntled customers who will look to your competitors' products for goods and services.

This section defines and describes Web site availability and failover. It contains the following topics:

Availability and reliability

In the simplest of terms, availability and reliability means you can access your Web site whenever you request it by entering the site's URL in your browser and all of its features work as intended. Thus, availability and reliability refers to the uptime of a Web site, which is often directly related to the uptime of the Web server and other dependent servers, such as a database server, an application server, or a file server. All of the servers that provide your site's functionality must work for a site to be considered available.

For ColdFusion Web applications, it is particularly important that the ColdFusion servers remain as highly available and responsive as the Web server and other dependent servers. ColdFusion processes requests that are sent to it from the Web server. Upon successfully processing the application logic, ColdFusion returns the results back to the Web server, which in turn returns an HTML response back to the browser.

Availability and reliability are concerned with keeping the relevant servers that provide services to your Web application available at all times. However, if a server on which your site depends becomes unavailable, it's critical that a sound redundancy scheme makes certain that your site remains available. As your organization moves into an e-business paradigm, you must plan, design, and implement load balancing and failover strategies that guarantee that your servers will remain operational and serving your customers.

If servers employ a good strategy for load balancing and failover, there's no reason why they should not provide high availability and reliability to their users. In fact, Internet Service Providers (ISPs) that host commercial Web sites and offer 24x7 technical support as a competitive service differentiator will typically specify in written service-level agreements (SLA) a percentage of time that they guarantee a Web site will be available. If the ISP has a sound scalability and failover strategy in place, this figure is usually in the range of 99% or better.

Common failures

Following are typical types of failures that can negatively impact your Web application's availability and reliability:

Hardware failures While less common than software failures, hardware failures do occur and may include crashed hard drives, blown processors, and corrupted network cards. Diagnosing and fixing these kinds of issues can be a lengthy endeavor because of time spent procuring the parts and performing the labor. If your Web application is mission-critical, you should ensure a sound hardware redundancy strategy to avoid costly downtime. A sound strategy includes a minimum of two Web servers but preferably three.
Software failures The types of software failures that will most likely affect a Web application involve the Web server's operating system, the Web server software itself, or the Web application software. If the operating system crashes or becomes corrupt, the Web server cannot function properly (or perhaps at all), causing your Web application's availability, reliability, and performance to be compromised. Similarly, if the Web server software crashes or acts erratically, it will likely cause the Web server to stop running when you didn't intend it to. It's hard to prepare for software failures, but if you have mirrored secondary hardware systems in place to account for failures, you'll minimize your Web application's downtime.
Server failures In addition to the Web server, other servers on which your Web application depends can also fail, causing either downtime or diminished capabilities on your site. For example, for distributed applications, a proxy server may go down, causing requests for your Web application's services to go unanswered. Or, the database server can crash, making it impossible for users to submit or retrieve information from your database. Or, a mail server can go down, making it impossible for your users to successfully send mail to you. Ensure that your organization's IT architecture includes network monitoring and notification software that can quickly report on the general health of your network and alert you about any failed servers.

A Web site availability scenario

Imagine that you've just built a robust, interactive e-commerce Web site on which you plan to sell the most sought-after books and music in the world. You've used Java scriptlets to build the application, so of course you've taken advantage of it's many built-in features, including secure database access, multi-threading, and integrated session management.

Upon finishing the development work and quality assurance testing, you deploy the Web site onto a single production Web server that is hosted within your IT department. The IT department informs you that it is able to use its existing Internet connection to make your site "live" while minimizing additional hosting support costs by going to an outside vendor. The site goes live the following day and it's an instant success. Orders start pouring in the very first day, and huge numbers of people log on to browse and buy. Everything seems perfect. Except, on the second day of business, the load hitting the site is so high, the Web server's performance slows to a crawl, eventually causing the server to become unavailable. Suddenly, your tech support lines are ringing off the hook with complaints that users cannot access your site, causing you to miss out on tons of sales.

Although the application may have contained many useful features and capabilities, the customers were not able to use them for very long because the site's performance degraded to the point that the site eventually became unavailable. Because the site was deployed on only a single server, there was no way to load balance the incoming traffic. Additionally, without multiple redundant servers in place, the site was not capable of intelligently load balancing increasing traffic nor able to redirect traffic to other available servers (no failover).

This simple scenario illustrates that a critical part of any successful Web development effort must include adequate scalability, performance, and failover planning. Servers can become overloaded or fail at any time for many reasons, so make sure that your design, development, testing, and deployment strategies are sound, promote good communication between necessary departments, and include adequate disaster recovery capabilities.

Failover considerations

The ability to fail over servers that have become unavailable to redundant servers is a cornerstone of any mission-critical application, one that ensures an application's continuous and reliable operation. Such disaster planning and recovery can be broken down into:

Review the following considerations to ensure that you have a sound failover strategy in place-one that guarantees your Web site's availability.

Hardware planning

As illustrated in the availability example above, it's important to acquire all of the necessary hardware and configure it before you deploy the application. All Web sites have different requirements, feature sets, purposes, audiences, and budgets. It all translates into determining appropriate needs. However, if your site is a business-critical system that affects your company's bottom line, you must ensure an appropriate redundancy strategy by having two or more redundant systems in place. In fact, Allaire recommends that you use a minimum of three servers to support any critical Web site so that you can take one server offline to perform update and maintenance tasks while maintaining at least two servers in production at all times. This scheme provides administrative flexibility while simultaneously protecting your site from hardware or software failures.

The two predominant redundancy models used today are:

Primary/Backup Servers
An example of this model would be an important Web application that receives relatively little traffic. For instance, a corporate intranet. Typically, this redundancy model uses an expensive, high-capacity server for the primary server and uses an inexpensive, lower quality server for the backup server in case the primary server fails.
Parallel Servers
This model is known as a classic load balancing/redundancy model and is used most often for business-critical applications. Unlike the primary/secondary scheme discussed above, the multiple servers used in a parallel scheme are considered peers and are grouped together as a single entity to support one or more applications.
You can use identical cloned hardware for creating your server clusters, or you can mix hardware sizes and models. Cloned, higher capacity, higher-end hardware may have greater up-front hardware costs but will help minimize administration costs down the line. Conversely, mixing hardware models and capacities may be less expensive up-front but can add administrative costs later on.
If you plan to use a parallel model, Allaire recommends that you use many middle range servers rather than fewer high-end ones or lots of inexpensive ones. Servers that provide adequate capacity and are moderately priced can generally accommodate all your needs just as well as expensive ones at a fraction of the cost.

Systems monitoring

In addition to redundant hardware, you should ensure that your network and the mission-critical sites that reside on its servers are supported by systems monitoring software. This type of software actively and continuously monitors an application's availability and its service levels. These monitoring programs must not only be able to detect problems, but they must also be able to route alerts to the correct administrators for immediate notification of problems.

Corrective actions

The third major failover consideration is the corrective actions that need to occur if a failure causes a server to become unavailable. Generally speaking, if a server goes down and causes your site to become unavailable, some level of human interaction is usually required to effectively diagnose and correct the problem.

However, before the analysis and repair can happen, the administrator needs to be notified. Whatever failover system you put in place, it should include an automated notification system that can route alerts via your telecommunications infrastructure (e-mail, pagers, real time web-based alerts, etc.) to the appropriate administrator for prompt attention.

Besides notifying the administrator that a problem has occurred, you also want your failover solution to automatically redirect traffic intended for the unavailable server to other available servers until the unavailable server is fixed. This crucial corrective action is what keeps your Web site up and available to your users even if one of the servers supporting it is experiencing problems.

Advanced ColdFusion Administration
Scalability and Availability Overview