Embracing GenAI for Business – Part IX

Using AI is not as daunting as even a few years ago. Powerful models are becoming easier to implement, and more organizational heads are considering assimilating GenAI into their processes, workflows, and strategic plans, hoping to benefit from its efficiency. Organizations adopting this transformative technology encounter a common challenge: delivering accurate results.

In the previous blog, we introduced the two steps of building an LLM mode: pre-training and fine-tuning. This edition will examine how end users can interact with LLMs and what steps organizations can take to improve model accuracy.

As we touched on in earlier editions of this blog series, the first task of the data scientist/software/ML/AI/advanced analytics/GenAI engineer working with LLMs is not to train an LLM or even to fine-tune one (usually), but rather to take an existing LLM (base model) and work out how to get it to accomplish the task you need for your application and measure accuracy. There are commercial providers of LLMs, like OpenAI, Anthropic, and Google, as well as open-source LLMs (Llama, Gemma, and others), released free of charge for others to build upon. Adapting an existing LLM for your task is called prompt engineering.¹

The user interacts with the LLM model via a chat interface using prompts (Fig. 1). Prompting is the bridge between humans and AI, allowing us to communicate and generate results that align with our business needs. Creating a good prompt is critical to producing an accurate LLM response. A subfield within GenAI is an emerging area of specialization called prompt engineering. Some of the principal components guiding the LLM model toward more purposeful and accurate answers include providing specific questions, examples, persona, tone, instructions, context, format, etc. We will unpack prompt engineering in future editions of the blog.

Figure. 1: LLM model user interface

LLMs customization via model finetuning improves response accuracy and relevance. However, LLM customization can be complex, time-consuming, and costly, and not all organizations have engineers specialized in this discipline. Another customization option is integrating your data with your generative AI applications. This process helps transform a generic application into one that truly knows your organization. Your organization’s current domain data improves the model’s accuracy by helping it understand your company’s processes, products, customers, and terminology.

LLM’s accurate and contextual response to users is only as good as the quality of the data it has been trained on. So, choosing credible, validated internal and external sources, both current and historical data, and following the constructs of an organization’s data AI governance policy is key to generating error-free results and improving the accuracy of the output. A key aspect of the quality of data is also knowing the source of the data and whether it is an authoritative or authentic source, which is essential. LLMs rely heavily on data quality and availability, and any problems with the data can significantly affect the models’ performance and accuracy.

Enterprises must implement robust data validation and quality control processes and monitor the data sources and pipelines used to train and deploy the models to ensure the data is accurate, relevant, and current. This can be done by measuring accuracy with predictive performance metrics, relevance through task-specific evaluations, and currency by tracking data freshness. They should implement robust monitoring systems and document data lineage to maintain high data integrity standards that show customers and other users that you know them and their preferences, creating value and building a competitive advantage.

Any LLM is bound to understand the data it was trained on, sometimes called parameterized knowledge. Thus, even if the LLM can perfectly answer what happened in the past, it will not have access to the newest data or any other external sources on which it was not trained.

Retrieval-augmented generation (RAG) is a technique that addresses this limitation by enabling the model to retrieve and incorporate new data or information during the generation process. RAG consists of finding relevant pieces of text, also known as context, domain-specific facts you’d find in an organizational proprietary data set (databases, wikis, search engines, documents, codebase), and including that context in the prompt.

RAG is emerging as a significant addition to the hub of the generative AI toolkit. It harnesses LLM intelligence and new content generation capabilities and integrates them with a company’s internal data. This method significantly enhances organizational operations, augmenting LLMs’ capabilities and leveraging internal corporate data for strategic advantage.

RAG combines the best of both worlds: the ability to retrieve information from vast datasets about your company and your customers and the capability to generate coherent, contextually relevant responses. Thereby enhancing the relevance and accuracy of its outputs. The Meta team created RAGa to improve the accuracy and reliability of LLM, reduce false information inaccuracies (or “hallucinations”), and increase the relevance of answers.

Figure. 2: Multistep LLM training approach

Although there may be exceptions, RAG is primarily involved in the second step of the LLM model building, the inference or fineturning step of the GenAI process. Once the LLM has learned from the data in step one, the model pre-training step, RAG, is leveraged in step two and focused on improving finetuning. Introducing RAG in the GenAI workflow is more cost-effective and requires less expertise than labor-intensive techniques such as fine-tuning and continued pre-training of LLMs. Technical support, financial services, healthcare, and e-commerce are some areas where RAG applications have demonstrated value.

A RAG pipeline typically collects and prepares data by cleaning it, for example, chunking the documents, embedding them (vector embeddings), and storing them (vector store). The vector embedding data set in the vector dataset is then queried to augment the user input of a generative AI model to produce an output.

RAG boils down to the following five domains and questions that go with them:

User Input
- A client input layer to collect user input as text queries or decisions.
- A prompt engineering layer to construct prompts that guide the LLM.

Data Retrieval
- A knowledge base to improved prompt context
- Searching for relevant data, data collection and preparation (Retrieval)
- Optional integration with external services via function APIs, knowledge bases, and reasoning algorithms to augment the LLM’s capabilities.
- Where is the data coming from? Is it reliable? Is it sufficient? Are there copyright, privacy, and security issues?
- How will the correct data be retrieved to augment the user’s input before it is sufficient for the generative model?
- What type of RAG framework will be successful for a project?
Data Storage:
- Vector embedding and loading into the dataset of a vector store (Storing)
- How is the data going to be stored before or after processing it?
- What to store and how much to store?
Prompt Augmentation:
- Querying the vectorized dataset to augment context to the user prompt (Augmented),
- How will the correct data be retrieved to augment the user’s input before it is sufficient for the generative model?
- What type of RAG framework will be successful for prompt augmentation?
Response Generation:
- An LLM backend to analyze prompts and produce relevant text responses.
- An output parsing layer to interpret LLM responses for the application interface.
- Use the augmented prompt as an input to the LLM to produce a response (Generation)
- The final response is sent back to the user, as user outpul
- Which generative AI model will fit into the type of RAG framework chosen?

Figure 3. Simplified RAG pipeline

The starting point of an RAG ecosystem is thus an ingestion data process, of which the first step is to collect data and store data in Vector databases. We will cover the role of vector databases in RAG in future editions of this blog series.

Two commonly used frameworks help build RAG and LLM pipelines: LangChain and LlamaIndex. RAG pipelines can be built from scratch without the need for these frameworks. However, the above frameworks allow enterprises to jump-start the project and reduce development time and cost. They typically offer a quicker path to deploying a solution with default configurations. These libraries experiment with various settings and combinations to deliver a ready-to-use, effective solution, all without requiring significant time or effort. They are perfect for reducing the complexities of selecting models or worrying about the language of prompt templates for different tasks. The open-source nature of these libraries further ensures that their methods are tested and effective. They offer the convenience of experimenting with various models through a simple code alteration, personalizing prompt templates, or managing outputs.

One drawback of pre-built frameworks is introducing extra dependencies on external libraries and being wary of updates and framework changes. Building an end-to-end LLM chatbot with RAG pipelines without leveraging the frameworks can be challenging. It requires knowledge and an arsenal of tools and technologies to stitch together. Once the engineers have developed sufficient expertise with the LLM models RAG pipeline and achieve the requisite model accuracy, one can switch out RAG frameworks with custom modules for more custom code later if it improves your project.

In this blog edition, we examined how users interact with LLM and the function of RAG pipelines in improving model accuracy and RAG frameworks.

Embracing GenAI in business means being open to radical change, questioning existing business processes without fear of disrupting the status quo, and being dauntless in throwing out the rulebook and starting anew to achieve better business outcomes. Trailblazers, innovators, and those who are curious and on the lookout for technological developments that lie around the corner will reap the greatest benefit from GenAI. AI will not replace the role of humans in critical functions, but those incapable of embracing AI technologies will find themselves at a disadvantage, unable to partner and collaborate with AI practitioners within their organizations and beyond.

References: