LLMs have become a household name, driving the rise of generative AI and capturing public interest, while serving as a cornerstone for organizations adopting AI across diverse business functions and use cases. Fine tuning LLM has emerged as a critical requirement for organizations aiming to tailor AI to their unique needs.
LLMs represent a significant breakthrough in NLP and artificial intelligence, and are easily accessible to the public through interfaces like Open AI’s Chat GPT-3 and GPT-4, which have garnered the support of Microsoft. Other examples include Meta’s Llama models and Google’s bidirectional encoder representations from transformers (BERT/RoBERTa) and PaLM models.
The need for fine tuning LLM
Any normal LLM model has the tools to help generic use cases, for example, when an organization wants to set up customer support chatbot, or when one needs help in research work. These use cases can be tackled by these powerful LLM systems.
Though, when we talk about micro needs, the model may not perform very well. This is when one needs to consider finetuning their LLM model to mold it to accuracy.
Fine Tuning not only eases future scope of use for the LLM, but also expands the horizons with which the model can perform its tasks.
These might seem like fancy words, but fine tuning a model is as simple as using a dataset to train a model. It essentially transforms a model from generic working, to expertise. Datasets that are used for finetuning allow the model to think in a specific direction, a specific domain.
Types of Fine Tuning
Fine Tuning is the process of re-training a pretrained model on data that is in line with a specific domain. This ensures the model understands niche concepts, and applies smoothly to corner cases.
Here’s different types of fine tuning LLM techniques:
Supervised Fine Tuning :
The LLM is trained on a dataset of labeled data that is specifically designed for the target task. For example, to fine-tune an LLM for text classification, the LLM is given a dataset of text snippets with class labels.
Unsupervised Fine Tuning :
The LLM is exposed to a large amount of unlabeled text from the target domain. It analyzes the statistical properties of the text to refine its understanding of the language.
Instruction Fine Tuning :
The LLM is provided with instructions in natural language. For example, to create a support assistant for an organization, the LLM is given instructions.
Low-rank adaptation (LoRA) :
LoRA keeps the original model intact, but adds small, changeable parts to each layer. It also decomposes large weight matrices into smaller matrices, which reduces the number of parameters that need to be trained.
Parameter-efficient fine-tuning (PEFT) :
It is a technique designed to optimize the performance of pretrained large language models (LLMs) and neural networks for specific tasks or datasets without retraining the entire model. Traditional fine-tuning often requires adjusting all the parameters of a model, which can be computationally expensive and memory-intensive, especially for large models like GPT or BERT.
Based on your specific needs, you can pick the type of fine-tuning to apply to your model.
Step-by-Step process of Fine-tuning an LLM
For this specific example, we will be discussing Instruction Fine-tuning, which will focus on the concept of LoRA.
Choose the Model
To begin with fine-tuning, you need to have access to your pre-trained model.
Hugging Face is a popular platform. It contains a variety of models like GPT-2, GPT-3, and T5.
Fine-tuning can be a computationally heavy task, if the model is powerful. This requires us to have access to resources that can handle this workload. Therefore, for experimentation, it is advised to use a smaller model and/or a smaller dataset.
Prepare your Dataset
Collect the data that needs to be fed to the model. This data can be in the form of 2 columns, containing ‘Instruction’ and ‘Response’.
It can be formatted in a csv format, which can have an outline like the one shown below.
instruction,response
"Summarize this article:...", "Here's a summary:..."
"Translate to French:...", "French translation:..."
Other formats like JSON, XML, XLSX, etc can also be used for the dataset.
Setup your Environment
First, we install the hugging face CLI, to access the Hugging Face libraries which will be needed for further processing. Hugging Face has a vast variety of libraries, that will ease the fine-tuning process.
pip install huggingface_hub
Begin fine tuning LLM
Using Python and necessary libraries like Hugging Face Transformers for setting up, we can begin the fine tuning process.
We need to setup a tokenizer to process the text of the dataset. This can be done from the following code :
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-cased")
def tokenize_function(examples):
return tokenizer(examples["text"], padding="max_length", truncation=True)
tokenized_datasets = dataset.map(tokenize_function, batched=True)
Next, we set hyperparameters for our model, which can be done using TrainingArguements, as follows:
from transformers import TrainingArguments
training_args = TrainingArguments(output_dir="test_trainer")
Evaluate the Model
To evaluate our model’s accuracy and working, we can use The Hugging Face’s Evaluate library provides a simple accuracy function we can load with the evaluate.load function.
import numpy as np
import evaluate
metric = evaluate.load("accuracy")
This is the most basic way of fine-tuning any model, and checking for it’s accuracy. If the dataset is proper, the accuracy of the fine-tuned model will be better than the pre-trained one.
Deploy the model
After proper evaluation, and re-training if required, we can finally use the pre-trained model for our use case. We can integrate this model in our web application, API, or other platform to make it accessible.
This is a simpler step-by-step process for fine-tuning. Other methods have more in-depth technological computations, as well as more training data. For this specific example, we can use a dataset with 100 entries.
This will ensure the training happens quickly, and one can see the final results from the metric scores. As the training increases, the model will becomes more aware.
What about RAG?
Retrieval-Augmented generation(RAG) is also used in making generic models more accurate. One can choose which method to use according to their personal preference and use cases. Each method has it’s own pros and cons
Even though RAG has lesser computational requirements, it may not perform as well as fine-tuning does. RAG is especially useful when the model’s training data doesn’t include certain up-to-date or niche information.
When to use RAG?
Dynamic or Frequently Updated Information:
RAG is ideal for tasks requiring real-time or frequently changing knowledge, such as accessing live data feeds, news articles, or updated knowledge bases.
Fact-Based Question Answering:
When the model needs to provide accurate answers to specific, fact-based queries that go beyond its training data, such as customer support with changing policies or technical queries referencing the latest standards.
Large External Knowledge Base:
RAG works best when there is access to a structured, rich knowledge base or database that can be easily queried for relevant information.
Cost and Scalability Constraints:
Use RAG when it is impractical to fine-tune or retrain a model repeatedly, especially for domains with rapidly changing data or diverse requirements.
How RAG works
- The model generates a query based on the user’s input.
- It retrieves relevant external documents or data that match the query.
- The retrieved information is used to generate a response.
Is RAG better for fine tuning LLM?
RAG offers several advantages that make it a powerful tool for dynamic and knowledge-intensive tasks.
- It provides access to real-time knowledge, allowing the model to retrieve and utilize the most up-to-date and relevant information, even if it was not part of the original training data.
- Additionally, RAG is highly scalable, as its external knowledge base can be updated without the need to retrain the model, making it a flexible solution for fast-changing fields.
- Furthermore, it reduces training requirements by relying on external sources for domain-specific knowledge, minimizing the need for fine-tuning in niche areas.
Conclusion
Fine-tuning pre-trained language models is a powerful technique to enhance their performance on specific tasks. It has high throughput for adaptability, as well as molding the model into your specific requirements.
If you’re trying to enhance the way your chatbots work, while also balancing it’s maintenance costs, check this article out.