peshkov -

Amazon Bedrock users adapt app dev to GenAI

Early adopters of Amazon Bedrock shared lessons learned about incorporating generative AI into software engineering workflows, from managing cloud costs to writing prompts.

Two companies that were among the first to sign on to a large language model hosting service from AWS have encountered similar benefits and drawbacks to generative AI app development.

Both Alida, a Toronto-based customer research platform provider, and Verint Systems, a contact-center-as-a-service provider in Melville, NY, began using Amazon Bedrock when it first launched in preview in April 2023. Now, a year into working with the service, IT pros at both companies have found developing generative AI applications via API an attractive alternative to hosting LLM infrastructure in-house.

Among the big three cloud providers, Microsoft was first to roll out generative AI services for enterprises with the GitHub Copilot coding assistant in 2022. Microsoft's Azure and Google Cloud also now offer model gardens and developer tools for evaluating LLMs and building generative AI applications.

But last year, Amazon staked an early claim to this deeper layer of AI infrastructure with Bedrock. The service was the first to offer an extensive selection of LLMs, including Meta's Llama 2. From the beginning, Bedrock also built in fine-grained data privacy and security options that appealed to enterprises.

OpenAI's ChatGPT API, which launched in March 2023, let enterprise users opt out of having their data used to train models. But Amazon Bedrock lets users privately train their own customizable instances of foundational models, providing control over which data sets are used. That data is encrypted in transit and at rest, and Bedrock also supports identity-based access controls and Amazon CloudWatch audit logs.

At the time, those features made Amazon Bedrock unique, said Sherwin Chu, chief architect at Alida.

Sherwin Chu, chief architect, AlidaSherwin Chu

"Our primary motivation for reaching out to AWS was just to have a solution that basically would be in compliance with what we provide to our customers [for] … their GDPR, security, privacy and data sovereignty requirements," Chu said. "We were struggling with OpenAI on this one. Even though they were the pioneers in the LLM space and had the most reliable service, they just couldn't comply with any of our GDPR requirements."

However, generative AI app development hasn't been without significant challenges for early adopter companies, whether managing cloud costs to reckoning with the learning curve for prompt engineering as the technology matures.

"One of the things that we've really tried to hammer on the product management side is taking … into account from the beginning how you're going to control for misbehavior," said Ian Beaver, chief scientist at Verint. "Regardless of how great a particular LLM performs on some benchmark, as soon as you start feeding things in, like a 40-minute call transcript, who knows what that model is going to pick up on and do with it?"

Avoiding generative AI sticker shock

For Alida, Amazon Bedrock satisfied compliance and security needs but didn't require the company to host LLMs or their training infrastructure, which was the only affordable way it could incorporate generative AI into its applications, Chu said.

Some of the bigger LLMs, such as Anthropic's Claude or AI21 Labs' Jurassic, call for GPU-based servers that can cost between $2,000 and $4,000 per month. To host these models in production, Alida would require five separate multi-server clusters of these machines.

Bedrock was also simple to incorporate into the company's existing application development process from a technical perspective because it used a well-understood process of calling into an API endpoint. Alida uses a similar setup to work with other hosted machine learning services, such as Amazon Comprehend for natural language processing and IBM Watson.

Amazon Bedrock LLMs have their own separate DevOps pipeline infrastructure at Alida because of different limitations on pay-as-you-go API calls for generative AI as well as other types of AI and machine learning, Chu said.

"There are token-per-minute and request-per-minute constraints. We have to write logic into the pipeline and make it aware of those," he said. "So we can manage, for example, if [a developer] is going to hit a token limit. They may have to back off whatever data they're sending for analysis or get an error that tells them to retry at a later time. … There's only so much content you can pass to the LLM in a given API call."

Alida's Amazon Bedrock pipeline is built using an event-driven architecture based on AWS services such as Amazon SQS and CloudWatch to manage the flow of data into LLMs and monitor costs, Chu said. Chu's engineering team is also looking into ways to group Bedrock API calls into batch jobs so more developers within the company can access the service.

"We're looking at ways where we can democratize the usage of LLMs," Chu said. "You can always ask AWS to raise your rate limits, but that can also be prohibitively expensive."

Ian Beaver, chief scientist, VerintIan Beaver

For Verint, which is both a customer of AWS cloud services and a partner that offers its contact-center as-a-service platform in the AWS marketplace, working with generative AI is nothing new. The company has been using generative AI in-house for about three years, which has also expanded to include services from all major cloud providers, according to Beaver.

In part, it's necessary for Verint as a global company to work with all the major cloud providers' LLM services because of varying regional availability, Beaver said. It's also necessary to work with many LLMs because some produce higher quality results in different languages than others.

That's where Amazon Bedrock first came in for Verint. It began testing the service before it was launched last year as a way for Verint's R&D team to evaluate the performance and compare the results between models before it used them in its Da Vinci AI products.

Despite the company's experience working with AI and ML, using LLMs comes with entirely new cost considerations, Beaver said.

"The costs are hugely different. Even to this day, there's sticker shock with some product managers," he said. "They say, 'I have this product idea. I think it's gonna be transformational,' and we turn around and find out what they want to do is call GPT-4 10 million times a day, [which] is going to cost us millions of dollars. … Then they need to rethink that use case."

This kind of cost negotiation isn't as common with machine learning, Beaver said.

"Everybody's focused on the capabilities [of generative AI], which are great. But you have to balance that with the cost and how you are going to make a margin on top of that and sell it," he said. "A lot of this pricing is based on tokens. Product managers aren't even sure of what a token is, let alone how to calculate these things. … There's been several projects we've actually just killed because we couldn't figure out how to make them cost effective."

It also falls to Verint's platform engineering team to facilitate making choices on the fly to accommodate cost considerations when deploying to LLMs in-house or in the cloud.

"Because of our MLOps platform, it's completely hidden from the product [team] behind an API that allows us to do cost triaging and say, 'Guess what? Anthropic just cut their prices by 40% on Claude 3. Let's go flip this switch and route traffic in, say, the Americas region.' And suddenly we've just improved the margin on this product by 30% or something like that. That has been huge for us."

The art of writing prompts

Another key difference between working with LLMs and other kinds of AI is the greater potential for inaccurate results from LLMs, sometimes referred to as hallucinations. This calls for careful prompt engineering that often requires multiple rounds of refinement to generate quality results, both Chu and Beaver said.

"Any time generative models are being employed in a product, there needs to be some kind of a human-in-the-loop aspect where you have some way to control [it] at the UI level so that wrong behaviors can be reported or corrected," Beaver said. "The other thing is setting appropriate expectations with users so they don't just take everything as blind truth."

By building a means for end users to provide detailed feedback on LLM outputs in its products' user interface, Verint can collect data that will help it improve prompts and model training, Beaver said.

In the past, data scientists would handle such tasks at Verint and retrain machine learning models accordingly. But refining LLM prompts requires participation from the entire organization -- from end users of the company's software to the company's customer support agents and software engineers. This is partly because the company wants to avoid the cloud costs associated with customizing LLMs, Beaver said.

"Not only are you paying the training cost, but most of these platform providers like AWS and Azure are charging you additional money for custom model hosting," he said. "If we can get by with the base model and just manipulate it to work through prompt [engineering,] … we can save ourselves a lot of money by not hosting a custom model."

Everybody's focused on the capabilities [of generative AI], which are great. But you have to balance that with the cost and how you are going to make a margin on top of that and sell it.
Ian BeaverChief scientist, Verint

Training is also a critical undertaking. Multiple teams must learn prompt engineering because so much of it is required throughout the process of supporting generative AI applications, making it too big a job for any one team at the company to handle, Beaver said.

"Another thing that's really been changing with the advent of generative AI is [training] professional services and support teams to do this kind of work," he said. "Because it's not scalable to take your research organization [and] your data scientists and have them handling all of these customer support issues at scale."

Alida has looked to hackathons and training services offered by AWS to help developers learn prompt engineering, Chu said.

"There's the art of writing the right prompts to get [an LLM] to generate the output in the format that you want and be able to process the inputs that you're providing," he said. "There's a lot of trial and error, and that is where I would say the bulk of our development efforts have gone."

For example, Amazon subject matter experts taught Alida's Bedrock team in a workshop on prompt engineering that even subtle tweaks to prompts, such as putting certain words in uppercase letters, can alter the output from an LLM, Chu said.

'The promise … is massive'

During one recent hackathon for Alida's Bedrock team, engineers experimented with using LLMs to generate customer experience surveys and saw great potential for that application, Chu said. However, survey generation requires layers of nested JSON documents.

"We gave the LLM the schema for the top-level node and then a kind of placeholder where it could insert child nodes. And it was able to use that template to generate the fully structured JSON document," he said.

This hinted at the potential for LLMs to handle complex tasks and multi-stage workflows if prompted properly, Chu said.

"The promise of what's possible in generative AI is massive and impressive," he said. "The challenge that we're still trying to deal with today is inconsistency."

At Verint, generative AI is already changing the nature of work that professional services and customer service teams do, along with software engineers, Beaver said, especially as customers begin to demand more customization from LLMs for their specific industries, clients and languages.

"Another new thing that we're having to deal with is just managing all these prompts at scale," Beaver said. "Prompt management and treating prompts as source code with versioning and access control and gated releases has become basically a product in its own right."

Beth Pariseau, senior news writer for TechTarget Editorial, is an award-winning veteran of IT journalism covering DevOps. Have a tip? Email her or reach out @PariseauTT.

Dig Deeper on Software design and development

Cloud Computing
App Architecture