The Problem with Traditional HA/DR Solutions (Part 1 of 2)

We asked Hugh, newest addition to the Maxava team, to write about his first day learnings on High Availability Software solutions!

As a graduate from a non-technical background, joining the team at Maxava was a rather intimidating prospect. But despite knowing very little about HA/DR and IBM i, I was determined to rise to the challenge and so I entered the office on my first day with confidence.

Having been introduced to the various team members, everything was going well until we sat down to do the technical introduction where I would learn all about the software and how it worked. Backing up a computer was something I was familiar with but the thing that really stood out as terrifying was the idea of a ‘Command system intercept’. After many iterations from the patient technical staff I finally understood how the Maxava HA solution worked and the reason it is such a leap ahead in the world of High Availability Disaster Recovery.

How Do Traditional High Availability/Disaster Recovery Replication Software Solutions Work?

My first step towards an appreciation of Maxava HA’s benefits was to understand the alternatives currently available for IBM i or AS400 machines. In a post-tape environment, traditional HA/DR software system processes are detailed in Image 1 below.




At a basic level, the actions of users on the primary system can be categorized as either Object Activity, which is the creation of a new file, or Data Activity which is the action of writing records to the file.

In order to transfer this information to the backup system, Object Activity is packaged up into the Audit Journal and Data Activity is written into Data Journals. HA/DR replication software works by taking a stream of changes and applying them in the same order to the backup system. The sequence in which Audit and Data journals arrive is extremely important because in order for the system to know where to place the Data Activity, it must first have a record of the Object contained in the Audit Journal.

Traditional HA/DR software replication solutions typically run a server task whose job it is to search through the Audit Journals in order to locate any new objects which have been marked for replication on the production machine. When they are found they are packaged up and sent across to the backup system for replication. Upon mutual arrival at the backup server, Audit Journals containing the Objects and the Data Activity are then merged, to create the backup.

The Problem with Traditional HA Solutions

Although the process seems simple there are a number of risk issues which range from costly to potentially disastrous for an organization.

These issues primarily exist due to the way in which objects and data are handled separately on the production side and then merged upon reaching the backup server.

  1. The first issue is the server task detective and the resources it requires to do its job correctly. The Audit journal was primarily created as a security measure and as such, when used in this way there are many entries written to it which are not relevant to HA/DR. Despite this the detective must read all of these entries in order to find those marked for replication. This requires a high level of production system resources to be diverted away from core activities to help the detective.
  2. Handling of Objects and Data separately continues to cause issues because if there are separate jobs detecting Object updates and Data updates, then some re-sequencing of these updates must occur in order to ensure that changes to the file are replicated in the correct order.
  3. But the most troubling issue when relying upon the audit journal for HA/DR is that replication cannot take place until all the pieces are matched. Traditional HA/DR replication software must wait for the detective to locate the new object before it can begin to record the Data Activity occurring within the object due to be replicated. The Data Activity Journal is then sent to the backup server. However if the user has the current object open and therefore locked, the Data Activity is essentially left homeless and the backup is incomplete. As well as placing strain on the primary system this locking of the document means that should a disaster occur while the user has the document open, although the Data Activity is being logged, the homeless Data Activity will not be backed up until the document is closed – rendering the expensive backup solution effectively redundant until that point. This third piece of information would have been invaluable to have known during my Master’s thesis because my colleagues and I frequently left our work open on our office computers overnight, believing that the backup software would keep them safe. Unfortunately an earthquake in our city caused a city-wide power cut and our primary machines corrupted the files. The consequence of this was that we lost approximately 130,000 words between us and close to 3 days of work with our final deadlines looming.

Does Maxava use the Audit Journal for High Availability/Disaster Recovery?

At Maxava we chose NOT to use the Audit Journal as the basis for our software because we found that relying on it in the way described above creates a sub-optimal process.

Reliance upon the Audit Journal requires significant system resources on the production machine. The organization has the choice of either settling for slower processing speeds, or purchasing hardware add-ons to manage the load. With Maxava HA neither is unnecessary. Maxava believes that it is unacceptable for a backup software solution to have a large footprint which places strain on your system, so designed a product specifically to utilize the processing on your backup machine and leave your production machine relatively free for use.

Despite the additional strain that traditional HA/DR places upon your system, there is not only a lack of significant advantage but worryingly, it may leave your organization at risk of data loss due to the delayed nature of Audit Journal reliant solutions. At Maxava we did not feel comfortable with offering our customers a product which compromised safety, leaving them vulnerable due to a flaw in program architecture.

So what does Maxava do instead?

Maxava has minimized the strain on customers’ production machines, and has innovatively created a product that offers real-time data replication on the IBM i platform. To find out more about it and why the ‘command system intercept’ is such a big leap forward in the HA/DR world, look out for Part 2 next week.

Hugh's photoHugh is a recent graduate of the University of Canterbury and holds a Masters degree in Commerce as well as undergraduate degrees in International Business, Marketing, Strategy and Entrepreneurship and a degree in Performance Violin. Outside of his studies Hugh has won both national business and sporting competitions while running his own start up companies. For more information on Hugh, please check out his LinkedIn.

Comments are closed.