Getty Images


How to build an enterprise generative AI tech stack

Generative AI tech stacks consist of key components like LLMs, vector databases and fine-tuning tools. The right tech stack can help enterprises maximize their generative AI ROI.

The potential of generative AI and large language models in the enterprise is a growing area of interest. Enterprises are currently in the early stages of adoption, mainly experimenting with LLM APIs available from companies like OpenAI, AI assistants like Microsoft Copilot, and specialist products designed for tasks such as image generation or marketing copywriting.

Pretrained LLMs offer some impressive capabilities, such as language processing, data analysis and content generation. However, these models are trained on public data sets, not specific enterprise data. Without training on or ongoing access to enterprise data, the full potential of LLM applications remains untapped. Furthermore, LLMs have other notable limitations, such as hallucinations, data privacy risks and security concerns.

For custom LLM applications, the generative AI stack comprises several key components:

Beyond this application-level stack, of course, there is also the hardware infrastructure required to train AI models and host them in production for real-time inference.

Considerations for planning an enterprise generative AI stack

While LLM technology is promising, it also poses certain risks related to accuracy, relevance and security.

To alleviate these concerns and ensure that your enterprise maximizes its ROI, it's crucial to understand the tradeoffs associated with different technology choices and adopt a modular component stack. Modular generative AI stacks allow for greater flexibility and adaptability, enabling organizations to swap out components as needed and incorporate new technologies as they emerge.

Establishing guidelines and standards for the generative AI stack's key components is crucial to project success. The following are some of the key considerations.

Choice of LLM

The LLM is the cornerstone of the generative AI stack. While a few LLMs, such as those from OpenAI and Google, have gained widespread recognition, the LLM landscape is diverse, offering a range of options. These LLMs vary in terms of their training data, optimal use cases and performance on common tasks.

LLMs also come in different sizes, with model size determined by number of parameters. Larger models might achieve better accuracy but require more computing power and have longer inference times. Likewise, LLMs differ in the size of the context windows they support. A larger context window allows for more detailed prompts and enables the model to generate more relevant, contextually aware output.

LLM usage costs

LLM pricing is based on input size (number of tokens in the original prompt) and output size (number of tokens generated). Users often experiment with multiple prompts or iterate over initial outputs, so it's important to consider these additional costs when budgeting. Exploring different LLM providers and comparing pricing models can also help organizations find the most cost-effective model for their specific use case.

Open source vs. proprietary LLMs

Another important consideration is whether to choose an open source or proprietary (also known as closed source) LLM.

Open source options offer greater flexibility and control. For example, organizations can run them on their own IT infrastructure -- whether on-premises or in a private cloud -- which enables better oversight over data privacy. However, running open source LLMs internally requires a higher degree of in-house technical expertise.

Commercial options come with better support and regular updates, making them easier to implement and maintain. However, when using proprietary LLMs, it's important to carefully review the fine print regarding the provider's data handling practices to ensure they meet organizational regulatory and compliance standards.

Cost is another factor to consider when it comes to open source vs. proprietary generative AI. Open source LLMs won't require organizations to pay usage charges, but IT leaders will need to invest in the necessary infrastructure and personnel to run and maintain them. Conversely, proprietary LLMs typically charge based on usage, which can be more cost effective for organizations with lower or intermittent LLM access needs.

Domain-specific vs. horizontal LLMs

Another important consideration is whether a domain-specific or horizontal LLM best suits your needs. Domain- or industry-specific LLMs are trained on data from particular sectors, such as finance, legal or healthcare. These models are also optimized for common tasks and applications within those industries, enabling them to better generate outputs tailored to their specific domain.

However, such LLMs might have limited value outside their intended domain. Use cases that require a broader knowledge base might be better served by a horizontal LLM, which is a more general model trained on a wide range of data across different domains. Horizontal LLMs can also be fine-tuned or adapted to specific domains through techniques like transfer learning.

Approaches to refining enterprise LLM applications

When refining and customizing enterprise LLM applications, typical approaches include prompt engineering, retrieval-augmented generation (RAG) and fine-tuning. The latter two techniques incorporate additional domain- or enterprise-specific data into LLM applications, helping to address concerns around factual accuracy and relevance.

  • Prompt engineering and templates. Prompt engineering is a light-touch approach that focuses on improving model output without changing model weights. Prompt templates and prompting best practices are used to guide the model toward desired outputs. Prompt marketplaces like PromptBase offer a variety of prompts for different AI models.
  • RAG. RAG is a technique that involves retrieving data from enterprise repositories or external sources to generate more contextual and accurate responses. The internal data is first stored as vectors in a vector database such as Pinecone or Chroma. SDKs and frameworks such as LlamaIndex and LangChain facilitate the connection between LLMs and data sources.
  • Fine-tuning. Fine-tuning is the practice of further training a pretrained LLM on enterprise- or domain-specific data. Unlike RAG, which leaves LLM weights unchanged, fine-tuning updates the LLM weights to better capture domain-specific nuances. Tools like Snorkel and Databricks MosaicML can be used to fine-tune LLMs.

After customizing and refining an enterprise LLM application, the next step is deployment and ongoing monitoring. Tools such as Portkey and Arize are used to deploy and monitor LLM applications in production, including troubleshooting, updates and enhancements.

Kashyap Kompella is an industry analyst, author, educator and AI advisor to leading companies and startups across the U.S., Europe and the Asia-Pacific region. Currently, he is the CEO of RPA2AI Research, a global technology industry analyst firm.

Dig Deeper on AI infrastructure

Business Analytics
Data Management