It’s easy for cloud customers to get confused about the roles and responsibilities of their internal team and their cloud vendor. That confusion is especially evident when it comes to application availability and business continuity planning. How does disaster recovery differ from high availability? Does my cloud provider automatically load balance my application servers? The answers to these questions are critical, but sometimes overlooked until a crisis occurs. In this post, we’ll talk about load balancing, high availability, and disaster recovery in the cloud, and what the Tier 3’s cloud infrastructure has to offer.
What is it?
Wikipedia describes load balancing as:
Load balancing is a computer networking method to distribute workload across multiple computers or a computer cluster, network links, central processing units, disk drives, or other resources, to achieve optimal resource utilization, maximize throughput, minimize response time, and avoid overload. Using multiple components with load balancing, instead of a single component, may increase reliability through redundancy.
You commonly see this technique employed in web applications where multiple web servers work together to handle inbound traffic. There are at least two reasons why load balancing is employed:
- The required capacity is too large for a single machine. When running processes that consume a large amount of system resources (e.g. CPU and memory), it often makes sense to employ multiple servers to distribute the work instead of constantly adding capacity to a single server. In plenty of cases, it’s not even possible to allocate enough memory or CPU to a single machine to handle all of the work! Load balancing across multiple servers makes it possible to host high traffic websites or run complex data processing jobs that demand more resources than a single server can deliver.
- Looking for more reliability and flexibility in a solution deployment. Even if you *could* run an entire server application on a single server, it may not be a good idea. Load balancing can increase reliability by providing many servers able to do the same job. If one server becomes unavailable, the others can simply pick up the additional work until a new server comes online. Software updates become easier since a server can simply be taken out of the load balancing pool when a patch or reboot is necessary. Load balancing gives system administrators more flexibility in maintaining servers without negatively impacting the application as a whole.
Load balancing can be accomplished using either a “push” or a “pull” model. For web applications or database clusters that sit behind a load balancer, inbound requests are pushed to the pool of servers based on an algorithm such as round-robin. In this scenario, servers await traffic sent to them by the load balancer. It’s also possible to use a “pull” model where work requests are added to a centralized “queue” and a collection of servers retrieve those requests from that queue when they are available. For instance, consider big data processing scenarios where many servers work to analyze data and return results. Each server takes a chunk of work and the overall processing load is distributed across many machines.
How can Tier 3 help?
Tier 3 offers multiple load balancing options to our customers. All customers have access to a free, shared load balancer. This load balancer service – based on the powerful Citrix Netscaler product – provides a range of capabilities including SSL offloading for higher performance, session persistence (known as “sticky sessions”), and routing of TCP, HTTP and HTTPS traffic for up to three servers. To use this service today, send a request to firstname.lastname@example.org. We plan to launch a self-service version of this capability in the very near future.
If you’re looking for more control over the load balancing configuration or have higher bandwidth needs, you can deploy a dedicated load balancer (virtual appliance) into the Tier 3 cloud. This “bring your own load balancer” option leverage internal expertise you may have with a particular vendor. It also gives you complete control over the load balancer setup so that you can modify the routing algorithm or enable/disable features that matter to your business.
What is it?
Returning to Wikipedia, high availability is defined as:
High availability is a system design approach and associated service implementation that ensures a prearranged level of operational performance will be met during a contractual measurement period.
High availability is described through service level agreements and achieved through an architecture that focuses on constant availability even in the face of failures at any level of the system. While load balancing introduces redundancy, it’s not a strategy that alone can provide high availability. Servers sitting behind a load balancer may be running, but that doesn’t mean that they are available!
Availability addresses the ability to withstand failure from all angles including the network, storage, and even the data center itself. Enterprise cloud services like those from Tier 3 are built on a highly available architecture that uses redundancy at all levels to ensure that no single component failure in a data center impacts overall system availability. This includes “passive” redundancy built into data centers to overcome power or internet provider failures, as well as “active” redundancy that leverages sophisticated monitoring to detect issues and initiate failover procedures.
All of our customers get platform-level high availability when they use the Tier 3 cloud “out of the box.” That means that you can rely on us for your workloads knowing that our architecture is well-designed and highly redundant. However – back to the introductory paragraph – it’s the customer’s responsibility to design a highly-available application architecture. Simply deploying an application to our cloud doesn’t make it highly available. For example, if you deploy a single Microsoft SQL Server instance in the Tier 3 cloud, you do not have a highly available database. If that database server goes offline or network access is interrupted, your application’s availability will be impacted. To design a highly available Microsoft SQL Server solution, you have multiple options. One choice is to create a cluster of database servers (where all nodes are active at the same time, or, nodes sit passively by waiting to be engaged) that access data from a shared disk. When a failure in the active node is detected, the alternate node is automatically called into action.
How can Tier 3 help?
Designing highly available systems is complex. Unfortunately, no cloud provider can offer a checkbox labeled “Make this application highly available!” in their cloud management portal. Crafting a highly available system involves a methodical approach that navigates through every single layer of the system and identifies single points of failure that should be made redundant. For components that cannot be made redundant, it’s important to make sure that the application can continue to run even if that component becomes unavailable.
The Tier 3 professional services team consists of skilled, experienced architects who have designed and built cloud-scale solutions for customers. They can sit with your team and make sure that you’ve taken advantage of every relevant feature that Tier 3 has to offer, while helping you make sure that your system landscape is constructed in a way that will ensure continual availability.
Don’t forget to regularly test your high availability design in order to uncover weak points or ensure that configurations remain valid.
What is it?
Once more we turn to Wikipedia which defines disaster recovery as:
Disaster recovery (DR) is the process, policies and procedures that are related to preparing for recovery or continuation of technology infrastructure which are vital to an organization after a natural or human-induced disaster. Disaster recovery is a subset of business continuity. While business continuity involves planning for keeping all aspects of a business functioning in the midst of disruptive events, disaster recovery focuses on the IT or technology systems that support business functions.
DR is all about how you handle unexpected events. Typically, your cloud provider has to declare a disaster before explicitly initiating DR procedures. A brief network outage or storage failure in a data center is usually not enough to trigger a disaster response. There are two phrases that you often hear when defining a DR plan. A recovery point objective (RPO) describes the maximum window of data that can be lost because of a disaster. For example, an RPO of 12 hours means that it is possible that when you get back online after a disaster, you may have lost the most recent 12 hours of data collected by your systems. A recovery time objective (RTO) identifies how long the IT systems (and processes) can be offline before being restored. For example, an RTO of 48 hours means that it may take two days before the systems lost in the disaster are brought back online and becoming usable again.
How can Tier 3 help?
Tier 3 customers have disaster protection natively in the platform. We offer two classes of storage: standard and premium. The major difference is that standard storage get five days of rolling backups within a given data center, while premium storage users get fourteen days of rolling backups including replication to an in-country data center. Tier 3 is powered by global data centers in multiple countries and we use storage replication to enable you to get back online within 8 hours (RTO) and with a maximum RPO of 24 hours.
While this provides assurances against losing all of your data in the event of a disaster, it still may not provide the level of business continuity that you need. If your business cannot tolerate more than a few moments of downtime, even in the event of a disaster, then it’s critical to architect a solution that can withstand the loss of an entire data center. Returning to our earlier Microsoft SQL Server example, consider the ways to construct a highly available database that remains online with minimal data loss, even during a disaster. SQL Server offers replication technologies like database mirroring and AlwaysOn that make it possible to do near-real time replication across geographies.
The experts in the Tier 3 services team can help you identify all the DNS, networking, compute and storage considerations for building systems that are not only highly available within a data center, but across data centers.
It’s often the case that load balancing, high availability and disaster recovery lapses don’t surface until it’s too late. While Tier 3 does everything we can to architect our platform for maximum availability and resiliency, our customers still retain responsibility for deploying their systems in a manner that meets their performance and business continuity needs. We are eager to talk to you about how to validate your existing cloud applications or design new solutions that can function at cloud scale. Contact our services team today!
If you’ve ever looked at cloud server prices, or deployed a cloud server instance, you’ve likely noticed that most providers have a selection of “templates” to choose from. Users browse and select from a library of pre-baked server templates that contain combinations of compute, storage, operating systems, database technology, web servers, and commercial software. This isn’t the approach we take at Tier 3, however.
We see at least two challenges with templates.
- Impossible for providers to match complete need, and difficult for customers to maintain custom templates. The number of templates offered by leading cloud providers range from dozens to thousands. With templates, the provider aims to offer as many useful combinations of OS + software as possible. However, this requires providers to engage in an endless quest to assemble server images that are useful to customers.
What if the customer doesn’t see anything they like? Sure, you can upload custom templates, but that shifts the maintenance responsibility to the customer. The provider may have automation tools available for updating and patching images, but enterprise IT departments may not have the necessary capabilities to do the care and feeding of a custom template library.
- Not a complete replacement for the way enterprise IT builds servers today. IT organizations don’t typically rely on a library of server templates when they build new machines. Instead, they follow a more assembly-line approach to stand up a server for a particular system. This includes selecting the operating system, joining a server to a domain, adding storage, and installing the relevant software. Advanced organizations have software catalogs that help with automated installation, but many companies still rely on physical media or installation files residing on shared network drives. So what’s the problem? We find that a template-driven model can give a misleading sense of deployment speed as the server is *available* quickly, but still requires a significant number of follow on tasks until the server is actually enterprise-ready.
So what is Tier 3’s model?
The Better Way
Instead of relying on templates, Tier 3 offers a reliable orchestration engine (called Blueprints) that lets you choose what software and script commands to run when creating a new server.
There are three things that our customers like about this.
- Match unique needs through just-in-time software combinations. It’s impossible to pre-build server templates that match the individual needs of each customer. While a good template can serve as a foundation for subsequent manual activities, we went a step further. We offer a diverse set of base operating system templates, and offer a catalog of enterprise software products that can be layered on after the server is built.
The logical – and automated – extension to how IT builds servers today.Building a server isn’t just about installing an operating system and some software. System administrators go through a series of activities to provision storage, join network domains, acquire IP addresses, disable unnecessary services, and much more. Besides just offering a software catalog, we also provide a series of tasks and scripts that you can run against a new server. Tasks include activities such as adding a (public) IP address or taking a snapshot of the new server.
Scripts are commonly used to configure the server (and its corresponding software). Tier 3’s build process lets you run a variety of scripts to get your server into a finished state.
Extensible to meet enterprise standards. We won’t claim to have all the software and scripts that you need to meet your enterprise security and software standards. That’s why we fully encourage you to upload your own software and scripts into a private library just for your organization. Anything in your library can be applied to your new or existing servers.
Do you have a unique script command to run just for a single server build process? The Tier 3 Blueprints engine supports custom PowerShell, Command, and SSH script statements that get executed after the server is built.
Customers are free to create and maintain server templates in the Tier 3 environment, and some do. But we’re seeing more and more customers opt for the orchestration engine approach. This way, customers can build servers exactly how they want them, every time! Check out our tier3.com Servers and Blueprints pages to learn more about how we help you automate the server build process.
Cloud adoption is growing significantly as more enterprises see the business value of having a scalable, elastic pool of computing resources at their fingertips. However, enterprise CIOs are concerned with building application silos in the cloud that don’t integrate with the rest of their systems, data, and infrastructure. One survey asked respondents to rank their areas of satisfaction for a set of SaaS applications and found that integration with on-premises systems was the area with the most frustration. Another survey found that 67% of CIOs reported problems integrating data between cloud applications. The long-term competitive advantage you gain from the cloud will likely depend – in part – on how well you can connect your assets, regardless of location. There are unique considerations for integrating with the cloud, but the core business needs remain the same. We at Tier 3 see four areas that require focus from both the cloud provider and the customer.
Each application – whether packaged or custom built – serves a unique functional purpose. Frequently, information from another applications is required to meet this purpose. For example, a CRM system may submit a query to an accounting system so that a call center agent can get a full picture of the customer’s billing history with a company. Or, an application that validates employee security badges may rely on a real-time feed of data from an ERP system that stores employee status information. Application integration is about connecting business applications at a functional level. It’s not simply data sharing, but rather, involves triggering some activity in another application by issuing requests or sending “live” business events.
So how does this affect applications in the cloud? Architects are wary of attempting synchronous remote procedure calls across the Internet. Latency is a big factor, and synchronous actions don’t scale particularly well.One alternative approach: “callbacks” where the application request is issued asynchronously, and the reply is sent to a pre-determined location that is monitored by the calling application. Or, embrace the more scalable asynchronous messaging strategy where business data is sent between systems using a fire-and-forget technique. Whether synchronous or asynchronous, application integration with cloud endpoints involves a high likelihood of encountering REST (vs. traditional SOAP) web service endpoints, so choose your tools accordingly!
To this end, you’ll come across two types of application integration products: traditional platforms that have been extended to work with the cloud, as well as entirely new platforms that are built and hosted in cloud platforms. Because each Tier 3 customer gets their own VLAN(s) that can connect to the corporate network (see Network Integration below), it’s relatively straightforward to use existing on-premise integration servers (e.g. Microsoft BizTalk Server, TIBCO ActiveMatrix Service Bus, IBM WebSphere MQ) to link to applications running in the Tier 3 cloud.
If you’re looking to do application integration between SaaS applications and servers in the Tier 3 cloud, you can either use on-premises integration servers or one of the newer cloud-based tools. For one-way messaging that requires durability but not the weight of an integration server, consider cloud-based queues such as Amazon SQS. Note that Tier 3 servers don’t receive a public IP address by default, so any integration tool that requires a “push” from the public internet to a Tier 3 server will require you to add a public IP address to the target server. If you need a full-fledged messaging engine that runs in the cloud and has adapters for cloud endpoints, consider something like the CloudHub from Mulesoft.
Data integration refers to the synchronization, transformation, quality processing, and transportation of large amounts of data between repositories. Unlike application integration, data integration is typically batch-oriented and works against data that’s already been processed by transactional systems. You’ll often find the need for data integration when doing master data management (MDM) solutions, importing dirty data from a variety of sources, or loading data warehouses for in-depth analysis.
Doing extract-transform-load (ETL) processes in the cloud introduces a few new considerations. While latency may not be as big of a factor for batch processes, bandwidth will be. Moving petabytes of data over an Internet connection is still not a speedy endeavor. Where possible, consider a Cross Connect architecture to maximize bandwidth while minimizing latency. Data integration solutions frequently include staging databases where data is manipulated or standardized as part of the processing pipeline. Depending on where the data is coming from, you may choose to stage sensitive data on your internal network instead of storing it temporarily on public cloud-based servers. Also, keep in mind that data integration tools are oriented towards relational databases, but many cloud databases leverage NoSQL designs or highly distributed architectures that may be unfamiliar to enterprise staff that primarily works with Oracle, Microsoft, and IBM technologies.
As with the application integration tools, the data integration tools market consists of existing players that may offer adapters to cloud endpoints, as well as entirely new providers that are oriented to cloud-based data repositories. Tools like Microsoft SQL Data Sync make it easy to synchronize Microsoft SQL Server databases running in Windows Azure to SQL Servers running on-premises or in public clouds like Tier 3. Traditional ETL provider Informatica has an innovative cloud service called Informatica Cloud which includes a growing set of adapters for connecting cloud databases to on-premises databases. Even the Amazon Web Services Data Pipeline service makes it simple to transfer data between AWS databases and on-premises databases. In each case, the ETL tool uses a locally-installed server agent that securely connects the data repositories to your network. This means that you do NOT need to have your internal databases exposed to the public internet in order to synchronize with cloud-based data repositories.
If you run your enterprise databases in the Tier 3 cloud, you can perform data integration using existing ETL tools or any of this new crop of cloud-friendly products.
Ideally, cloud servers are simply an extension of on-premises servers. To have a fully integrated enterprise landscape, servers in the corporate data center should be able to freely communicate with servers running off-premises. For example, Tier 3 customers use our cloud to run their enterprise collaboration environment, email infrastructure, line of business applications, and many other critical internal-facing systems. In order for these scenarios to work, the enterprise network must be extended to include the cloud network.
One choice is to set up simple client virtual private networks (VPNs) that connect an individual machine to the cloud network. In this case, an individual user would establish a VPN connection and access the application or database residing on the cloud server. However, this only works well for small businesses or temporary access to applications. For a persistent connection between networks, consider working with the cloud provider on a point-to-point VPN tunnel. This provides a much better end user experience. An even tighter integration is possible through Direct Connect. For enterprises that use one of our co-location partners for their data center hosting, Tier 3 can establish a cross-connect between the physical hardware. This ensures a high performing connection that doesn’t travel over the public internet channel. If cross-connect isn’t an option, then perhaps a MPLS network mesh with any number of major network carriers is feasible. We can easily add a secure connection from your MPLS network to the Tier 3 cloud.
Finally, security. It’s an important consideration when working with distributed systems, and identity management is an oft overlooked area. We’ve all become accustomed to countless credentials for the variety of business systems (on-premises and off-premises) that we use every day. Whether accessing cloud systems, integrating with partner systems, or enabling a remote workforce, a strong identity management strategy is key. How can employees use a single set of credentials to access a diverse range of systems across the Internet? Is centralized role-based-access-control possible or does each application have to maintain their own role hierarchy? These are among the many questions you should ask yourself when figuring out a long term identity strategy.
Identity federation is an emerging area in the cloud. There are multiple standards that come into play, including SAML, XACML, and WS-Trust. Microsoft offers its Windows Active Directory Federation Services and Windows Azure Active Directory products. You’ll also find strong products from Ping Identity and CA. As enterprises face more and more demand by employees and partners to “bring your own identity”, there will be a greater need to invest in a complete identity management solution.
Tier 3 supports SAML for access to our Control Portal. So, our customers can manage their cloud environment without ever manually logging in. This not only makes it convenient, but also creates a more secure environment where there are fewer passwords to remember and access is controlled from a central location.
By planning for all four of these integration dimensions, enterprises can more fully achieve the benefit of cloud computing while getting maximum reuse out of existing assets. Neglecting any one of these can introduce barriers to adoption or lead to inefficient or insecure workarounds.
Want our help designing your solution for each of these integration dimensions? Contact us to set up a working session with our experienced services team.
One size definitely doesn’t fit all. Nearly every cloud infrastructure provider gives their customers a choice of virtual machine configurations.These configurations often take the form of pre-defined “buckets” of VM attributes, so fine-grained choice is still not really an option. But does this really matter?
At Tier 3, we think it does, and our customers do too. Instead of asking our customers to decide between a set of vendor-specified instance sizes, Tier 3 encourages customers to provision machines with any combination of processors, memory, and storage that best fits their needs.
There are at least three benefits we see to offering in-depth customization of virtual machines attributes.
Meet the hardware requirements of pre-packaged software without over-provisioning. The cloud isn’t just for custom web applications. Many users want to run commercial-off-the-shelf software in a cloud environment and apply vendor-recommended hardware sizing guidelines. Whether you’re installing Microsoft Dynamics CRM 2011 (recommended hardware: 4+ cores and 8+ GB of RAM per web server), or the Adobe Creative Suite (recommended hardware: 16 cores, 16 GB of RAM), each application will have its own battle-tested preferences. One choice would be to fit the packaged software into the “best fit” instance size offered by a cloud infrastructure provider. That may very well work, but it’s possible that you will end up paying for processor or memory allocation that wasn’t needed. Tier 3 enables you to provision virtual machines with up to 16 virtual CPU cores, 128 GB of RAM, and multiple terabytes of storage. Choose whatever combination you want!
Every infrastructure cloud provider makes it easy to create servers. However, one of the most important characteristics of cloud computing is that it’s easy to delete servers. Go ahead and use a server for as long as you need to and then get rid of it, along with all associated costs. However, cloud cleanup is typically the sole responsibility of the customer. Some smart folks have come up with their own solutions to this problem (see Netflix and their open-sourced Janitor Monkey for AWS), but we prefer to give our customers the automation capabilities they need to easily get rid of servers that aren’t needed anymore.
Set a Server Time-to-Live
When building temporary cloud environments, you often know exactly when you’ll be done using a server or set of servers. However, after the requisite days/weeks/months have passed, who remembers to shut off those machines? During the provisioning process of a Tier 3 server, users have the option to select a server lifespan and decide whether to archive or completely delete the server.
Use Scheduled Tasks to Automatically Shut Down and Resume Environments
Many servers will not be for temporary use, but they do have a defined windows of usefulness. For instance, consider a development environment that a project team uses to build a web application. There may be multiple web, database and application servers that collectively stay online during the work week. However, such an environment may be completely idle during weekends and holidays. Instead of incurring the cost for an unused environment, Tier 3 customers can create a Scheduled Task to pause or stop an environment and stop accruing CPU and memory charges. Unlike many cloud infrastructure providers, Tier 3 uses persistent VMs and only charges for OS licensing (if any) and storage costs when a server is inactive.
Scheduled Tasks operate on Groups of Tier 3 servers. This makes it simple to archive/pause/power on/reboot/shutdown/snapshot all of the servers in a Group at once. In this example, imagine setting up a pair of Scheduled Tasks that pause the entire development environment every Friday evening, and bring it back online early Monday morning. For even a small environment consisting of a database plus a pair of web servers, this results in savings of over $1000 a year.
In the image below, notice that I’ve configured a pair of Tasks. The “pause” Task runs every Friday evening at 6PM, and the “power on” Task fires every Monday morning at 7AM.
Your cloud environment should be made up of a well-groomed collection of active servers that provide tangible value for each hour that they are running. Tier 3 offers multiple options to automatically delete servers and take idle environments offline. This saves our customers both time and money. Any other automation that we should introduce to make your life easier? Let us know!