On Premises

Fail-over overview

Architecture Designs

Introduction

We understand that 24/7 availability for your customers is very important. Your company cannot afford downtime in any of its communication channels, whether it is basic telephony, video conferencing or web chat, being reachable means you can be of value to your customers.

Because there are different ways to remain available and restrict downtime, both planned and unplanned, we created this document to explain several possibilities. In the world of redundancy we can separate two terms:

  1. High Availability
  2. Disaster Recovery

High availability and Disaster Recovery are not the same. Although there is overlap in planning and solutions, they are subsets of business continuity. The purpose of high availability is to provide resiliency within the primary node for planned downtime. The purpose of disaster recovery is to enable an organization to resume computer operations on a secondary node when a disaster at the primary node makes that part of the infrastructure unusable.

Anywhere365 High Availability and Disaster Recovery

Single Server setup

In its most basic form there will be a single node (site or datacenter) with a single Anywhere365 server in a standard Skype for Business pool, a single SharePoint server role for settings management, and a single SQL server role for storing dialogue intelligence. In this case, there is no High Availability for any of the server roles (Anywhere365, SharePoint and SQL) nor will there be a possibility for Disaster Recovery.

Single Server Architecture Setup

Configuration Cache

As shown above Anywhere365 makes use of Skype for Business, SharePoint and SQL, although it might be useful to make these server roles High Available as well, it is not required. Because Anywhere365 generates a cache every time the service starts up and every time a setting is changed in SharePoint, it is not required to have SharePoint available all the time. If SharePoint cannot be reached by Anywhere365 it will use its cache to keep running. The only downside is that it is (temporarily) not possible to make any changes to the Universal Contact Centers. Similarly, if a single SQL server role is unavailable, Anywhere365 will store all its SQL actions into a Message Queue. Once the SQL database is up-and-running again, all the data will be synched so there is no loss of data after all.

The only single-point of failure that exists when Skype for Business, SharePoint and SQL are not High Available is Skype for Business itself. As soon as the connection with the Skype for Business Front End server is broken, and there is no other Skype for Business Front End server available that will take over the required actions, Anywhere365 is not able to continue its operations.

High Available Anywhere365 with Single Server Skype for Business Pool, SQL and SharePoint servers

To make just Anywhere365 more High Available within a Skype for Business pool, it is necessary to have a multi-server trusted application pool consisting of two Anywhere365 servers. These two servers link to the same Skype for Business Front End Pool, SharePoint server and SQL database. If one of the two Anywhere365 servers is down (e.g. for maintenance) the second Anywhere365 server can startup and take over the activities (Active-Passive configuration).

With monitoring tools (e.g. Microsoft System Center Operations Manager, PRTG or Intelligent Loadbalancers), the passive Anywhere365 server can be activated automatically once the active Anywhere365 server is no longer available.

High Available Single Server

Note It is theoretically possible to have multiple active Anywhere365 servers within the same trusted application pool, however having the same UCC A Unified Contact Center, or UCC, is a queue of interactions (voice, email, IM, etc.) that are handled by Agents. Each UCC has its own settings, IVR menus and Agents. Agents can belong to one or several UCCs and can have multiple skills (competencies). A UCC can be visualized as a contact center “micro service”. Customers can utilize one UCC (e.g. a global helpdesk), a few UCC’s (e.g. for each department or regional office) or hundreds of UCC’s (e.g. for each bed at a hospital). They are interconnected and can all be managed from one central location. running on both servers will not work. The real-time management information and webservices (e.g. wallboard, presence, agent reservation, impersonation, etc.) of identical UCC’s on multiple Anywhere365 servers is not consolidated into a pooled mechanism, making the real-time management and monitoring of UCC’s on multiple servers not possible. Running unique UCC’s on separate Active servers within a pool is possible, however this configuration requires two active A365 Server Licenses instead of one active- and one passive A365 Server License.

High Available Anywhere365 Skype for Business Enterprise Pool, SQL Cluster and SharePoint Farm

If so required, it is possible to make all server roles High Available by setting up a Skype for Business Enterprise Edition multi-server Pool (instead of a Skype for Business Standard Edition single-server), creating a multi-server SharePoint farm and put a Multiserver SQL Cluster (for example Always-On) in place. This could host both the SharePoint- and Anywhere365 databases). By making all server roles High Available, all operations like real-time communications, SharePoint changes and Reports can be made to the Universal Contact Centers always.

Disaster Recovery

Once all server roles are High Available, the next step might be Disaster Recovery. When talking about Disaster Recovery, you might think of a second data center that will take over operations if the primary data center goes black (requires monitored intervention and/or manual failover). However, with today's virtualization, in which multiple virtual servers are stored on a single physical server, you can also think of Disaster Recovery within a single datacenter. For example, a secondary physical server that can be activated once there is a failure on the primary physical server.

Disaster Recovery within a single datacenter might be easier to realize because the machines are close to each other and there is a good network with enough bandwidth available.

Realizing Disaster Recovery is, in essence, duplicating your (High Available) environment on a different node (either within the same datacenter or not) and connect it together. The connection of these two nodes can be realized by either Pool Pairing or Stretched Pool (a.k.a. Tier2).

DC Pool Pairing

When using Pool Pairing both nodes can be used stand alone. So each node has its own Skype for Business Front End server(s), its own Anywhere365 server(s), its own SharePoint server(s), and its own SQL server(s). There will be one active node (primary) and one passive node (secondary), and there is no connection between the two pools. Because the two pools are not aware of each other, synchronization scripts are necessary to stay up-to-date on both nodes, in order to be able to initiate a failover.

First, the UCC's and all of its endpoints (telephone numbers and SIP The Session Initiation Protocol, or SIP, is a protocol for multimedia communication (audio, video and data communication). SIP is also used for Voice over IP (VoIP). SIP has interactions with other Internet protocols such as HTTP and SMTP. addresses) should be synced from the active node to the passive node. Second, the SharePoint UCC configuration sites and all of its settings should be synchronized from the active node to the passive node. Third, once a failback is finished, after a failover was initiated, the SQL data should be synced from the passive node to the active node, so that all data is available on the active node after the situation is returned to its original state.

Pool Pairing can also be used in an active-active situation. Both nodes are as well active as passive, in this situation the synchronization scripts should work in both ways.


Stretched Pool (Tier 2)

If there is enough bandwidth and low latency (<1 ms) between the two nodes, (so no latency will be experienced during phone calls), it might also be possible to connect the two nodes in real-time, called Stretched Pool. In this case the two nodes function as one single node. So, the Skype for Business Front End Servers of both nodes are in a single pool, the Anywhere365 Servers of both nodes are in a single pool, the SharePoint Servers of both nodes are in a single farm, and the SQL Servers of both nodes are in a single cluster.

In this case no synchronization scripts are required because all servers are aware of each other and are acting as one.

Hybrid DC

Pool Pairing and Stretched Pools can be combined into a Hybrid model so that some of the server roles are acting independently of each other on both nodes and synchronized using scripts (Pool Pairing), while other server roles are acting as a single instance on both nodes (Stretched Pool).

Beware though also SharePoint farm requires a low latency (<1ms) connection between the 2 nodes.