WavebreakMediaMicro - Fotolia
Managing and troubleshooting cloud environments can be challenging because the infrastructure is owned by a service provider. Let's examine some ways to make your cloud management easier to tackle.
First, consider cloud automation use cases. These can streamline the oversight of cloud implementations, although the specific features offered will depend on the cloud vendor. Some hide the differences between cloud APIs, a benefit if you're considering a multi-cloud or hybrid cloud deployment. A number of multi-cloud orchestration tools are commercially available, and for those companies that want to roll out their own, open source packages like Ansible and Python can also be used.
Whichever approach you take, ensure that it sufficiently masks the differences between cloud vendors. Just make sure the tool you select doesn't hide so much detail that it obscures the visibility you need for accurate troubleshooting.
Manage clouds with workflows
Automating common workflows is going to be the most valuable mechanism at your disposal. Tasks such as provisioning, deprovisioning, auditing and troubleshooting are critical. Organizations that have embraced Agile software development will use workflows that support both continuous integration and continuous delivery and deployment. The more automation you can deploy, the less manual effort is needed to manage cloud computing resources.
Cloud management should be focused on the five areas below.
1. Expense reduction
One of the advantages of using a cloud provider is that any expenses associated with maintaining the physical infrastructure will be shifted to the provider, but this benefit only pays dividends if the process is managed properly. To that end, IT systems and processes must be designed to automatically provision and deprovision resources as needed, reducing manual interaction to a minimum. So-called zombie IT can easily consume the forecast savings.
2. IT security
A common misconception of cloud computing is that it is somehow more secure than enterprise-hosted computing. It isn't. In fact, it presents some new challenges. To ensure your data is protected, contract with a security company that can provide validated products to protect both data in flight and data at rest. Of all cloud automation use cases, creating and maintaining good cloud security are among the most tangible.
Application performance can suffer if the cloud computing environment isn't well designed and implemented. Key parts of an application should be served by a single cloud provider to minimize communications latency among components. Applications that can take advantage of multiple cloud instances can optimize client-to-cloud location selection.
One way to monitor cloud performance is through the OpenTelemetry cloud observability framework. It is an open source, vendor-neutral system for collecting telemetry data from cloud computing systems. Software agents that collect metrics and logs are loaded on the computing systems. The collected data is forwarded to a variety of analysis systems where system performance can be monitored. Check with your cloud provider to see if it is supported.
Good cloud computing design will incorporate a foundation that calls for running an application in more than one availability zone or by more than one cloud provider. Be careful though; it is easy to make a mistake and find that an application relies on a nonredundant internal component that was overlooked. The best approach to validating resilience is to run active tests. Consider an outside organization to validate the testing; we've often seen cases where a testing shortcut invalidated the resilience tests.
Troubleshooting infrastructure you don't own or control is challenging. The data tapped to troubleshoot an enterprise network isn't available; instead, users must rely on digital experience (DX) monitoring, a combination of synthetic transactions and real-time traffic monitoring.
Detailed diagnostic information is captured by software agents and correlated, producing a comprehensive view of client-to-server application performance. At NetCraftsmen, we've used DX tools to diagnose problems as varied as ISP routing protocol issues and client-side Wi-Fi signal strength, all without having access to the networks and network devices involved.
Finding what works best
Another step involves the use of ChatOps automation workflows to streamline the troubleshooting process and reduce the time to resolution. When a problem is detected, a bot runs predetermined workflows to collect diagnostic information. The bot posts the diagnostic information in a Slack or Teams chat space, for example, where IT team members can begin the troubleshooting process. More bot workflows can be created as needed, thus freeing the IT team from having to manually investigate or collect performance data.
The shift to cloud computing from enterprise computing comes with challenges and opportunities. You can't just take what worked in the enterprise and migrate those systems and workflows to the cloud. What's required is a careful evaluation of cloud automation use cases to determine which tools will give you the ability to scale up resources as needed while avoiding the errors inherent in manual processes.