As an administrator, it's likely that you often hear about the importance of having Web servers that scale well, but what exactly is scalability? Simply, scalability is a Web server's ability to maintain a site's availability, reliability, and performance as the amount of simultaneous Web traffic, or load, hitting the Web server increases.
The major issues that affect Web site scalability include:
Performance refers to how efficiently a site responds to browser requests according to defined benchmarks. Application performance can be designed, tuned, and measured. It can also be affected by many complex factors, including application design and construction, database connectivity, network capacity and bandwidth, back office services (such as mail, proxy, and security services), and hardware server resources.
Web application architects and developers must design and code an application with performance in mind. Once the application is built, various administrators can tune performance by setting specific flags and options on the database, the operating system, and often the application itself to achieve peak performance. Following the construction and tuning efforts, quality assurance testers should test and measure an application's performance prior to deployment to establish acceptable quality benchmarks. If all of these efforts are performed well, consequently you are able to better diagnose whether the Web site is operating within established operating parameters when reviewing the statistics generated by Web server monitoring and logging programs.
Depending on the size and complexity of your Web application, you may be able to handle anywhere from 10 to thousands of concurrent users. The number of concurrent connections to your Web server(s) will ultimately have a direct impact on your site's performance. Therefore, your performance objectives must include two dimensions:
Thus, you must establish desired response benchmarks for your site and then achieve the highest number of concurrent users connected to your site at the desired response rates. By doing so, you will be able to determine a rough number of concurrent users for each Web server and then scale your Web site by adding additional servers.
Once your site runs on multiple Web servers, you will need to monitor and manage the traffic and load across the group of servers. See "Hardware planning" and "Techniques for Creating Scalable and Highly Available Sites" to learn about the ways you can do this.
Perfect scalability-excluding cache initializations-is linear. Linear scalability, relative to load, means that with fixed resources, performance decreases at a constant rate relative to load increases. Linear scalability, relative to resources, means that with a constant load, performance improves at a constant rate relative to additional resources.
Caching and resource management overhead affect an application server's ability to approach linear scalability. Caching allows processing and resources to be reused, alleviating the need to reprocess pages or reallocate resources. Disregarding other influences, efficient caching can result in superior linear application server scalability.
Resource management becomes more complicated as the quantity of resources increases. The extra overhead for resource management, including resource reuse mechanisms, reduces the ability of application servers to scale linearly relative to constraining resources. For example, when an extra processor is added to a single processor server, the operating system incurs extra overhead in synchronizing threads and resources across processors to provide Symmetric Multi-Processing. Part of the additional processing power that the second processor provides is used by the operating system to manage the additional processor and is not available to help scale the application servers.
It is important to note that application servers can only hope to scale relative to resources when the resource changes affect the constraining resources. For example, adding processor resources to an application server that is constrained by network bandwidth will provide, at best, minor performance improvements. When discussing linear scalability relative to server resources, it is implied that it is relative to the constraining server resources.
Understanding linear scalability in relation to your site's performance is important because it not only affects your application design and construction but also indirectly related concerns, such as capital equipment budgets.
Load management refers to the method by which simultaneous user requests are distributed and balanced among multiple servers (Web, ColdFusion, DBMS, file, and search servers). Effectively balancing load across your servers ensures that they do not become overloaded and eventually unavailable.
There are several different methods that you can use to achieve load management:
Each option has its own distinct merits.
Most load balancing solutions today manage traffic based on IP packet flow. This approach effectively handles non-application-centric sites. However, to effectively manage ColdFusion Web application traffic, it is important to implement a mechanism that monitors and balances load based on specific ColdFusion Web application load. ColdFusion relies on a leading software-based clustering technology, ClusterCATS, to ensure that the ColdFusion Web servers, the Web server, and other servers on which your ColdFusion Web applications depend remain highly available.
To learn more about different hardware and software load management solutions, see "Techniques for Creating Scalable and Highly Available Sites".