Save the State: Data Synchronization Between Master and Backup Services

When the center is actively handling calls, minimizing any disruptions is key. You don’t want to have your agents stop, log out, log back in, then attempt to continue as they were. Doing that is an efficiency killer. To avoid this, multiple copies of a single service can be set to run on different servers. However, data has to be kept current in order to avoid losing information when switching services from backup to master.  Synchronization of information is absolutely required when we are attempting to set up redundancy for high availability in the call center.

Two special cases where data must be available are call recordings and voicemail. The topic of recordings has been discussed in the past especially in the case of call recordings in the Cloud. Once recordings are in the Cloud they are available to redundant systems. In the case of voicemail, using database-backed voicemail allows the use of a single point of access for voicemails that were originated on multiple servers. The difference between these two services and others is that the data is available as a service that is (potentially) outside our system.

When there is a disruption, the three primary systems where backups need to be kept up to date are:

  •  Call and client data (the database)
  •  Telephony connections and sessions (for call survival/recovery)
  •  Agent sessions and states (the ACD software).

Database replication is a well understood problem. MySQL, for example, allows for easy synchronization of data as it comes into the system. A strict master/backup system is the simplest to set up. Circular replication, where each server of a pair acts both as master and backup, is also a fairly common configuration. The database should be configured so that automatic values such as IDs are unique. This is usually done by specifying an auto-increment that is at least the number of servers, and a unique offset for each. When a failure occurs and the system switches to the other database, the data is present and available. When the original master is repaired and replication restarted, both databases will again have a complete copy of the data.

Telephony connections (calls) have two different ways that data redundancy is required. If a telephony server itself goes down, the situation is similar to the database one above. If call survival is desired, then the Indosoft High Availability SIP Proxy (HAASIPP) must be used. The HAASIPP itself is responsible for maintaining the SIP session, and if it detects an issue, it can resume the same call by sending the call information and voice data to another Asterisk box and redialing the agent. This allows the call to stay live, with the client staying on the line the whole time.

In order for the HAASIPP to maintain redundancy, it must transmit information to the backup HAASIPP in real time. Every state must be synchronized between the master and backup service. This allows SIP sessions to continue even if the master HAASIPP goes down; the second HAASIPP server has been receiving live data all along and is ready to be promoted to master at any time without a loss of service.

Synchronization is also a requirement for redundancy in your call center automated call distributor (ACD). While the calls themselves continue on, the agent states must be maintained. This includes details such as:

  •  who is logged in
  •  which telephony device they are attached to
  •  who is ready to receive calls
  •  who is in an away state
  •  any other details that may be relevant

Of course this has to be maintained live. As soon as an agent is ready to receive a call, that fact must be noted on the backup ACD. That way the agent doesn’t notice a disruption in call handling when the ACD server or service goes down.

Synchronization isn’t needed only for failures, either. In situations where it’s desirable to replace hardware or to do software upgrades, having the ability to move services to a different server and bring down a service or server while production is ongoing is very beneficial. Having this capability allows for maintenance to be done even in 24/7 contact center configurations.