putilov_denis - stock.adobe.com


Minimize the high costs of AI in cloud with FinOps

Deploying and managing AI applications can become costly, especially if it is your first generative AI initiative. Learn how these costs can impact cloud costs.

While the cloud is getting AI into the hands of more enterprises, it's also impacting cloud spending. Your first generative AI initiative demands substantial computing power and data storage beyond what your organization already runs in the cloud.

It's up to you to prepare for the surge in cloud costs that AI brings. Take a look at how you can use cloud financial operations (FinOps) practices to accommodate the possible effects on your cloud costs.

Impact of AI on cloud spending

AI workloads can be resource-intensive and have varying utilization patterns. It is essential to adapt FinOps strategies accordingly. AI can impact cloud spending in a variety of ways:

  • Consumes more resources and time. AI models that are complex and not optimized require more computational resources and time to process data in the cloud leading to higher cloud costs.
  • Requires more compute and storage. Training AI models is resource-intensive and costly because of increased computational and storage requirements.
  • Performs frequent data transfers. Additional data transfer costs may occur because your AI applications require more frequent data transfers between edge devices and your cloud service provider (CSP).

FinOps practices for AI

FinOps for AI requires ongoing collaboration among IT, finance and the AI development teams for continuous optimization and cost efficiency. Accounting for AI in your cloud cost optimization strategies requires extra attention, especially at the outset. Enterprises need to regularly review and refine their FinOps strategy based on how new AI requirements impact cloud performance and costs:

  • Provisioning and sizing. AI workloads impact how you provision and size resources since AI workloads often require graphics processing units and other accelerated processing units. Prepare to account for those adjustments.
  • Instance type. Your AI requirements dictate how to optimize resource provisioning by selecting the right instance types. Spot instances can save organizations money on noncritical AI workloads. For example, organizations can schedule the execution of AI training tasks to off-peak hours or lower-demand periods.
  • Scaling. AI workloads can experience spikes. Autoscaling dynamically adjusts the number of resources based on workload demand and avoids overprovisioning during idle periods.
  • Monitoring. Monitoring AI infrastructure and usage requires adjustments, such as new reporting from your cloud monitoring tools to identify further cost trends, resource utilization patterns and potential cost optimizations. Add cost allocation tags to attribute expenses to specific AI projects or teams to your current tagging schema.
  • Storage. AI models often generate large volumes of data. This leads to unexpected and significant rises in storage costs, especially if it's your first major AI project as an organization. Review current storage options based on access patterns continuously before and after you put an AI project into production.
  • Data transfers. AI workloads may involve transferring data between different cloud services or regions. Content delivery networks can help you optimize these costs with your current CSP. Data transfer in a multi-cloud environment means using automation to ensure data transfer efficiency between your cloud environments.
      When planning your first AI projects, implement policies and guidelines to govern AI resource usage, cost limits and approval processes to help ensure responsible AI spending.

      Create a cost-aware AI strategy

      If your AI project involves algorithm development, it's essential to work with data scientists and AI developers to create cost-aware algorithms that reduce resource consumption and prioritize efficiency without compromising your AI application's performance

      Say your organization launches and runs multiple AI projects that share cloud services and resources. In that case, consider implementing usage-based cost allocation to ensure your FinOps team attributes costs accurately and can identify which of your AI projects is driving the most expenses.

      When planning your first AI projects, implement policies and guidelines to govern AI resource usage, cost limits and approval processes to help ensure responsible AI spending. Putting such documentation in place early also gives your organization's AI and cloud experts a chance to provide insights and feedback to ensure the policies and guidelines you're putting in place represent reality.

      If your organization has predictable AI workloads, you could use your CSP's reserved instances or savings plans to commit to long-term usage and reap the discounts. However, if your organization is already benefiting from these strategies, you know it isn't an immediate solution to your AI workload costs. You need to track your AI spending until you can discern historical trends before implementing reserved instances or savings plans. Stay aware of any discount promotions or incentives from your CSP, which can help optimize your cloud costs.

      Next Steps

      The relationship between cloud FinOps and security

      How to build a successful FinOps team

      Dig Deeper on Cloud app development and management

      Data Center