As disaster recovery moves to greater levels of efficiency via automation, the GUI front end is becoming obsolete. It is therefore critical to understand how the API works under the hood. Frequently, what an administrator would recognize as the GUI is just a front end that calls the API to do its bidding.
In real terms, using the API for automated failover opens up the environment to highly customized failover of a single group of servers to a mass DR failover, such as when there is a complete site failure, with all the complexity and configuration done ahead of time. But there are some important things to bear in mind when it comes to using APIs for automated DR failover.
Choosing the right orchestration engine is critical. Orchestration engines are used to create workflows to ensure all the correct steps are brought together across disparate systems. And there are many options to choose from.
Start simple. Build complexity over time. Less is more to start with.
Design the workflow with checks and balances, and validate everything. Assume that everything can fail, and make preparations to prevent those failures. There are many potential disaster scenarios, and an administrator should check all of them. A bad DR failover can just compound issues.
Automating DR tests is a best practice, but exercise caution and ensure the failover infrastructure is isolated. Automating the DR failover into an isolated environment on a regular basis enables an administrator to test the crash consistency of the systems without disrupting the live environment. When properly configured with software robotics automation testing, it enables functionality testing of the failover.
Using an isolated environment will prevent potential network IP address issues and data corruption/misconfiguration. Bear in mind that a lot of services require some IP and authentication infrastructure to allow authentication.
Once implemented, be careful with system upgrades. Small changes can affect the way the infrastructure works, so check and verify the workflows and outcomes to ensure the functionality hasn't changed.
With great power comes great responsibility. It is important to understand that the DR API itself is only one component of implementing DR failover. Other factors, such as domain name system changes and firewall changes, should be taken into account as well. Almost all these items can be scripted via API and need to be extensively tested. Check the availability of all the items or workarounds before attempting it. Done incorrectly, it can be a recipe for disaster.
Done correctly, when the fateful day arrives, the time put into designing and implementing the automation will be directly reflected in the outcome of the DR failover.