It is hard to fathom cloud platforms have been available to the IT community for close to a decade. The popularity of cloud systems has progressed to a point to where they have become the infrastructure of choice for many organizations.
As a database consultant for more than 20 years, I have worked with companies of all types and sizes from a wide range of industries. As their cloud implementations matured, many companies were experiencing a common theme: Several issues impacted the quality of their cloud database platforms.
Here are a couple of the more common issues and recommendations to prevent them from occurring.
Cloud database performance monitoring
Ensuring proper performance of any database system is a wonderfully complex task. From disk reads and buffer cache hit ratios to multi-user, concurrent transaction throughput, there is a seemingly endless array of metrics to monitor and root causes of poor database performance to resolve.
In addition to standard database performance challenges, DBaaS and IaaS database platforms add another dimension to performance monitoring and troubleshooting. Transferring information into and out of a cloud database system can be challenging, especially if there are large data volumes and tight time constraints.
A phrase I commonly use with clients is "no database is an island." Most DBaaS and IaaS databases take feeds from various sources, interact with other databases and systems during daily operations, and send output to other applications and end users.
Many IT shops have found that the cloud requires an "all in" strategy. When the application software and the data it accesses are on two different cloud systems or split between cloud and on-premises platforms, data access lag times can negatively impact performance. This is a significant problem for applications that require extremely fast response times.
Best practice recommendations: In addition to reviewing their favorite DBMS performance metrics, cloud platform administrators should also focus on monitoring data transfer volumes into and out of the cloud systems. Document all the inputs and outputs and include them in your monitoring strategy. Although your shop may have estimated the data transfer volumes during initial system design, it's a pretty safe assumption that they will change over time.
Here is a list of starter questions to help identify additional monitoring activities:
- How is the database populated? Is it loaded using flat files or database-to-database data transfers?
- What type of output does the database generate? Does it create large reports, flat files or data streams that other applications use as input? One of the most overlooked data transfers is when the information from the cloud database is used to refresh other systems.
The goal is to forecast future transfer times and work with network engineers to discuss potential solutions and application development teams to reschedule large data transfers that are impacting other jobs.
Regulatory compliance reporting
DBaaS platforms do not expose their underlying architecture to users. In addition, recording the evidence auditors need for vendor, administrator and end-user change control procedural compliance can be challenging when using cloud database systems.
As a result, organizations that adhere to internal, industry-specific or governmental regulatory compliance rules often find that they are unable to provide the supporting evidence their auditors need to verify the system meets the framework's control objectives. Regulatory frameworks such as SSAE16 SOC, PCI DSS, NIST, NERC, GDPR and HIPAA all require system specific settings and change control information as evidence.
Although most of the leading cloud platform vendors provide compliance documentation for some of the more popular regulatory frameworks, smaller competitors may not provide the level of supporting evidence your organization needs. In addition, internal and third-party auditors often lose their sense of humor when they ask for specific compliance evidence and you respond with a generic link to a vendor's website.
Best practice recommendations: Most organizations that store and process data that is subject to one or more regulatory compliance frameworks have classification procedures that categorize the data according to its sensitivity. One of the more common issues that affects both cloud and on-premises systems is that sensitive information tends to propagate its way into other data stores across the organization.
During the creation of new cloud database systems or the migration of existing databases to cloud platforms, meet with security and auditing teams to classify data and agree upon the evidence they need to demonstrate compliance with regulatory frameworks. In addition, you will need to perform a deep-dive review of the cloud vendor's compliance documentation to identify their regulatory agency certifications. One method that will help you to adhere to all compliance frameworks is to create a spreadsheet that contains the following columns:
- Control objective description
- Description of evidence needed for compliance
- Source of evidence -- cloud platform provider, your organization or both
- Evidence location, naming conventions and format
Maintaining business continuity
During the genesis of cloud systems, many in the IT community thought that the vendors' multiple layers of connectivity, computing platforms and data redundancy would make outages a thing of the past. We quickly learned from a series of well-publicized service interruptions that no matter how robust of an architecture the vendors created, our organizations would still need to plan for application outages.
Best practice recommendations: Here are a few recommendations that will help you to mitigate the impact of cloud service interruptions. Some of the recommendations may be obvious, but many organizations continue to rely solely upon their cloud vendors to maintain application availability during an outage.
- Classify your applications according to criticality. Since the inception of computers, application uptime has always been directly related to system cost and complexity. The higher the level of uptime your organization requires for a given application, the more costly and complex it becomes. How much availability are you willing to purchase?
- Thoroughly evaluate the cloud provider's high availability features. Although all leading cloud platforms provide a robust set of mechanisms that protect against outages, many of these features will require the customer to purchase, configure and administer them.
- Lessen the impact of any single cloud provider's service interruption by implementing a multi-cloud strategy. Flexera's 2021 State of the Cloud survey of 750 cloud decision-makers and users found that 92% of the respondents now use multiple cloud platforms.
Like all high-quality disaster recovery and business continuity programs, develop a plan to mitigate the impact of cloud service interruptions. Design, implement and test the actions your organization will perform when an outage occurs. It is important to note that in many cases, your mean time to resolve (MTTR) will be totally dependent on your cloud provider.