Building a RAG-based Chat Using Azure OpenAI, LlamaIndex, and Gradio

Overview

I tried building a RAG-based chat using Azure OpenAI, LlamaIndex, and Gradio, so here are my notes.

Azure OpenAI

Create an Azure OpenAI resource.

Then, click “Endpoint: Click here to view endpoint” to note down the endpoint and key.

Then, navigate to the Azure OpenAI Service.

Go to “Model catalog” and deploy “gpt-4o” and “text-embedding-3-small”.

The result is displayed as follows.

Downloading the Text

This time, we target “The Tale of Genji” published on Aozora Bunko (a free digital library of Japanese literature).

https://www.aozora.gr.jp/index_pages/person52.html

Download the texts in bulk using the following script.

import requests
from bs4 import BeautifulSoup
import os

url = "https://genji.dl.itc.u-tokyo.ac.jp/data/info.json"

response = requests.get(url).json()

selections = response["selections"]

for selection in selections:

    members = selection["members"]

    for member in members:

        aozora_urls = []

        for metadata in member["metadata"]:

            if metadata["label"] == "aozora":

                aozora_urls = metadata["value"].split(", ")

        for aozora_url in aozora_urls:

            filename = aozora_url.split("/")[-1].split(".")[0]

            opath = f"data/text/{filename}.txt"

            if os.path.exists(opath):
                continue
                # pass

            response = requests.get(aozora_url)

            response.encoding = response.apparent_encoding

            soup = BeautifulSoup(response.text, "html.parser")

            div = soup.find("div", class_="main_text")

            txt = div.get_text().strip()

            os.makedirs(os.path.dirname(opath), exist_ok=True)

            with open(opath, "w") as f:
                f.write(txt)

Creating the Index

Prepare environment variables.

AZURE_OPENAI_ENDPOINT=xxxx
AZURE_OPENAI_API_KEY=xxxx

Then, create the index using the following script.

import os
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
from llama_index.core import SimpleDirectoryReader, Settings, VectorStoreIndex

# Environment variables
api_key = os.getenv("AZURE_OPENAI_API_KEY")
api_version = "2024-05-01-preview"
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")

# LLM
llm = AzureOpenAI(
    model="gpt-4o",
    deployment_name="gpt-4o",
    api_key=api_key,
    azure_endpoint=azure_endpoint,
    api_version=api_version,
)

# Embedding
embed_model = AzureOpenAIEmbedding(
    model="text-embedding-3-small",
    deployment_name="text-embedding-3-small",
    api_key=api_key,
    azure_endpoint=azure_endpoint,
    api_version=api_version,
)

Settings.llm = llm
Settings.embed_model = embed_model

# Data Source -> Document conversion step
documents = SimpleDirectoryReader(
    input_dir="./data/text"
).load_data()

# Save
index = VectorStoreIndex.from_documents(documents)
index.storage_context.persist(persist_dir="./data/index")

Gradio

Finally, create an app using Gradio.

import os
import gradio as gr
from llama_index.core import StorageContext, load_index_from_storage, Settings
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding

api_key = os.getenv("AZURE_OPENAI_API_KEY")
api_version = "2024-05-01-preview"
azure_endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")

llm = AzureOpenAI(
    model="gpt-4o",
    deployment_name="gpt-4o",
    api_key=api_key,
    azure_endpoint=azure_endpoint,
    api_version=api_version,
)

# You need to deploy your own embedding model as well as your own chat completion model
embed_model = AzureOpenAIEmbedding(
    model="text-embedding-3-small",
    deployment_name="text-embedding-3-small",
    api_key=api_key,
    azure_endpoint=azure_endpoint,
    api_version=api_version,
)

Settings.llm = llm
Settings.embed_model = embed_model

# rebuild storage context
storage_context = StorageContext.from_defaults(persist_dir="./data/index")

# load index
index = load_index_from_storage(storage_context)

query_engine = index.as_query_engine(similarity_top_k=10)

# Function to handle chat messages with history
def echo(message, history):
    print("History:", history)
    context = "\n".join([f"User: {user_msg}\nBot: {bot_msg}" for user_msg, bot_msg in history])
    full_context = f"{context}\nUser: {message}"
    response = query_engine.query(full_context).response
    history.append((message, response))
    return response # history

demo = gr.ChatInterface(
    fn=echo,
    examples=[
        "What kind of person is Hikaru Genji?",
        "What kind of person is Yugao?"
    ],
    title="Llama Index Chatbot",
)
demo.launch()

The chatbot was successfully created as shown below.

Summary

There may be some misunderstandings on my part, but I hope this serves as a helpful reference.

Overview#

Azure OpenAI#

Downloading the Text#

Creating the Index#

Gradio#

Summary#