It’s easy for cloud customers to get confused about the roles and responsibilities of their internal team and their cloud vendor. That confusion is especially evident when it comes to application availability and business continuity planning. How does disaster recovery differ from high availability? Does my cloud provider automatically load balance my application servers? The answers to these questions are critical, but sometimes overlooked until a crisis occurs. In this post, we’ll talk about load balancing, high availability, and disaster recovery in the cloud, and what the Tier 3’s cloud infrastructure has to offer.
What is it?
Wikipedia describes load balancing as:
Load balancing is a computer networking method to distribute workload across multiple computers or a computer cluster, network links, central processing units, disk drives, or other resources, to achieve optimal resource utilization, maximize throughput, minimize response time, and avoid overload. Using multiple components with load balancing, instead of a single component, may increase reliability through redundancy.
You commonly see this technique employed in web applications where multiple web servers work together to handle inbound traffic. There are at least two reasons why load balancing is employed:
- The required capacity is too large for a single machine. When running processes that consume a large amount of system resources (e.g. CPU and memory), it often makes sense to employ multiple servers to distribute the work instead of constantly adding capacity to a single server. In plenty of cases, it’s not even possible to allocate enough memory or CPU to a single machine to handle all of the work! Load balancing across multiple servers makes it possible to host high traffic websites or run complex data processing jobs that demand more resources than a single server can deliver.
- Looking for more reliability and flexibility in a solution deployment. Even if you *could* run an entire server application on a single server, it may not be a good idea. Load balancing can increase reliability by providing many servers able to do the same job. If one server becomes unavailable, the others can simply pick up the additional work until a new server comes online. Software updates become easier since a server can simply be taken out of the load balancing pool when a patch or reboot is necessary. Load balancing gives system administrators more flexibility in maintaining servers without negatively impacting the application as a whole.
Load balancing can be accomplished using either a “push” or a “pull” model. For web applications or database clusters that sit behind a load balancer, inbound requests are pushed to the pool of servers based on an algorithm such as round-robin. In this scenario, servers await traffic sent to them by the load balancer. It’s also possible to use a “pull” model where work requests are added to a centralized “queue” and a collection of servers retrieve those requests from that queue when they are available. For instance, consider big data processing scenarios where many servers work to analyze data and return results. Each server takes a chunk of work and the overall processing load is distributed across many machines.
How can Tier 3 help?
Tier 3 offers multiple load balancing options to our customers. All customers have access to a free, shared load balancer. This load balancer service – based on the powerful Citrix Netscaler product – provides a range of capabilities including SSL offloading for higher performance, session persistence (known as “sticky sessions”), and routing of TCP, HTTP and HTTPS traffic for up to three servers. To use this service today, send a request to firstname.lastname@example.org. We plan to launch a self-service version of this capability in the very near future.
If you’re looking for more control over the load balancing configuration or have higher bandwidth needs, you can deploy a dedicated load balancer (virtual appliance) into the Tier 3 cloud. This “bring your own load balancer” option leverage internal expertise you may have with a particular vendor. It also gives you complete control over the load balancer setup so that you can modify the routing algorithm or enable/disable features that matter to your business.
What is it?
Returning to Wikipedia, high availability is defined as:
High availability is a system design approach and associated service implementation that ensures a prearranged level of operational performance will be met during a contractual measurement period.
High availability is described through service level agreements and achieved through an architecture that focuses on constant availability even in the face of failures at any level of the system. While load balancing introduces redundancy, it’s not a strategy that alone can provide high availability. Servers sitting behind a load balancer may be running, but that doesn’t mean that they are available!
Availability addresses the ability to withstand failure from all angles including the network, storage, and even the data center itself. Enterprise cloud services like those from Tier 3 are built on a highly available architecture that uses redundancy at all levels to ensure that no single component failure in a data center impacts overall system availability. This includes “passive” redundancy built into data centers to overcome power or internet provider failures, as well as “active” redundancy that leverages sophisticated monitoring to detect issues and initiate failover procedures.
All of our customers get platform-level high availability when they use the Tier 3 cloud “out of the box.” That means that you can rely on us for your workloads knowing that our architecture is well-designed and highly redundant. However – back to the introductory paragraph – it’s the customer’s responsibility to design a highly-available application architecture. Simply deploying an application to our cloud doesn’t make it highly available. For example, if you deploy a single Microsoft SQL Server instance in the Tier 3 cloud, you do not have a highly available database. If that database server goes offline or network access is interrupted, your application’s availability will be impacted. To design a highly available Microsoft SQL Server solution, you have multiple options. One choice is to create a cluster of database servers (where all nodes are active at the same time, or, nodes sit passively by waiting to be engaged) that access data from a shared disk. When a failure in the active node is detected, the alternate node is automatically called into action.
How can Tier 3 help?
Designing highly available systems is complex. Unfortunately, no cloud provider can offer a checkbox labeled “Make this application highly available!” in their cloud management portal. Crafting a highly available system involves a methodical approach that navigates through every single layer of the system and identifies single points of failure that should be made redundant. For components that cannot be made redundant, it’s important to make sure that the application can continue to run even if that component becomes unavailable.
The Tier 3 professional services team consists of skilled, experienced architects who have designed and built cloud-scale solutions for customers. They can sit with your team and make sure that you’ve taken advantage of every relevant feature that Tier 3 has to offer, while helping you make sure that your system landscape is constructed in a way that will ensure continual availability.
Don’t forget to regularly test your high availability design in order to uncover weak points or ensure that configurations remain valid.
What is it?
Once more we turn to Wikipedia which defines disaster recovery as:
Disaster recovery (DR) is the process, policies and procedures that are related to preparing for recovery or continuation of technology infrastructure which are vital to an organization after a natural or human-induced disaster. Disaster recovery is a subset of business continuity. While business continuity involves planning for keeping all aspects of a business functioning in the midst of disruptive events, disaster recovery focuses on the IT or technology systems that support business functions.
DR is all about how you handle unexpected events. Typically, your cloud provider has to declare a disaster before explicitly initiating DR procedures. A brief network outage or storage failure in a data center is usually not enough to trigger a disaster response. There are two phrases that you often hear when defining a DR plan. A recovery point objective (RPO) describes the maximum window of data that can be lost because of a disaster. For example, an RPO of 12 hours means that it is possible that when you get back online after a disaster, you may have lost the most recent 12 hours of data collected by your systems. A recovery time objective (RTO) identifies how long the IT systems (and processes) can be offline before being restored. For example, an RTO of 48 hours means that it may take two days before the systems lost in the disaster are brought back online and becoming usable again.
How can Tier 3 help?
Tier 3 customers have disaster protection natively in the platform. We offer two classes of storage: standard and premium. The major difference is that standard storage get five days of rolling backups within a given data center, while premium storage users get fourteen days of rolling backups including replication to an in-country data center. Tier 3 is powered by global data centers in multiple countries and we use storage replication to enable you to get back online within 8 hours (RTO) and with a maximum RPO of 24 hours.
While this provides assurances against losing all of your data in the event of a disaster, it still may not provide the level of business continuity that you need. If your business cannot tolerate more than a few moments of downtime, even in the event of a disaster, then it’s critical to architect a solution that can withstand the loss of an entire data center. Returning to our earlier Microsoft SQL Server example, consider the ways to construct a highly available database that remains online with minimal data loss, even during a disaster. SQL Server offers replication technologies like database mirroring and AlwaysOn that make it possible to do near-real time replication across geographies.
The experts in the Tier 3 services team can help you identify all the DNS, networking, compute and storage considerations for building systems that are not only highly available within a data center, but across data centers.
It’s often the case that load balancing, high availability and disaster recovery lapses don’t surface until it’s too late. While Tier 3 does everything we can to architect our platform for maximum availability and resiliency, our customers still retain responsibility for deploying their systems in a manner that meets their performance and business continuity needs. We are eager to talk to you about how to validate your existing cloud applications or design new solutions that can function at cloud scale. Contact our services team today!