Medium Com Unstructured Io Setting Up A Private Retrieval Augmented Generation Rag System With Local Vector Database D42f34692ca7 1
Medium Com Unstructured Io Setting Up A Private Retrieval Augmented Generation Rag System With Local Vector Database D42f34692ca7 1
-- 1
In the realm of AI, access to current and accurate data is paramount. The
Retrieval Augmented Generation (RAG) model exemplifies this, serving as an
established tool in the AI ecosystem that taps into the synergies of large
language models with external databases to deliver more precise and up-to-
date answers. With RAG, you aren’t just getting black box answers; you’re
receiving insights grounded in real-time information, pulled directly from
the most recent data sources.
However, the power of RAG is truly unlocked when it has a reliable and
efficient source of data. This is where Unstructured shines. Imagine having a
treasure trove of data spread out in various formats and places, and the task
of harnessing it feels herculean. Unstructured is purpose-built to bridge this
gap. It acts as the ETL pipeline specifically designed for LLMs, connecting to
your data regardless of its format or location, transforming and cleansing it
into a streamlined and usable format. In essence, Unstructured is
empowering your AI models with real-time, actionable data. It’s the first, yet
the most crucial step in your AI journey, transforming unstructured data
chaos into data-driven insights.
Prerequisites:
Now that you have installed the required packages, open up your terminal
and run the following commands.
Clone Repo:
cd local-rag
pip install -r requirement.txt
Update Hugging Face Hub: To avoid any unforeseen issues, it’s a good
practice to ensure that you’re running the latest version of huggingface_hub.
pip install --upgrade huggingface_hub
Creating the model directory: This directory will store the required model
files.
mkdir model_files
3. Restart notebook
Before diving into the coding part, let’s first establish the foundation. Here,
we define some key parameters and constants (this tutorial was designed to
be run on Apple devices).
Notice how we’ve set up directories for both input and output files. We also
specify the weaviate_url, a crucial endpoint for our upcoming interactions.
# Print output
if process.returncode == 0:
print('Command executed successfully. Output:')
print(output.decode())
else:
print('Command failed. Error:')
print(error.decode())
Once our data is processed, the get_result_files function comes into play.
This utility function scours a directory to fetch a list of all JSON files,
providing us with the much-needed collection of processed results.
import uuid
import weaviate
from weaviate.util import get_valid_uuid
Before diving into the intricate process of document indexing, the primary
step is to set up our Weaviate client. This client will act as a bridge,
connecting us to the Weaviate instance.
client = create_local_weaviate_client(db_url=weaviate_url)
my_schema = get_schema()
upload_schema(my_schema, weaviate=client)
With this groundwork, we have our Weaviate system primed and ready to
accept the upcoming document chunks.
if hasattr(element.metadata, "coordinates"):
delattr(element.metadata, "coordinates")
chunks = chunk_by_title(
elements,
combine_under_n_chars=chunk_under_n_chars,
new_after_n_chars=chunk_new_after_n_chars
)
for i in range(len(chunks)):
chunks[i] = {"last_modified": chunks[i].metadata.last_modified, "text":
client.batch.flush()
Using the predefined functions, we’ll ingest our document chunks into the
Weaviate system.
add_data_to_weaviate(
files=files,
client=client,
chunk_under_n_chars=250,
chunk_new_after_n_chars=500
)
print(count_documents(client=client)['data']['Aggregate']['Doc'])
Upon completion, a quick print statement gives us insight into the total
number of documents we’ve successfully added.
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
n_gpu_layers = 1 # Metal set to 1 is enough.
n_batch = 100 # Should be between 1 and n_ctx, consider the amount of RAM of yo
# Make sure the model path is correct for your system!
llm = LlamaCpp(
model_path="model_files/llama-2-7b-chat.Q4_K_S.gguf",
n_gpu_layers=n_gpu_layers,
n_batch=n_batch,
n_ctx=2048, # context window. By default 512
f16_kv=True, # MUST set to True, otherwise you will run into problem after
callback_manager=callback_manager,
verbose=True, # Verbose is required to pass to the callback manager
)
client = weaviate.Client(weaviate_url)
vectorstore = Weaviate(client, "Doc", "text")
question = "Give a summary of NFL Draft 2020 Scouting Reports: RB Jonathan Taylo
print("\n\n\n-------------------------")
print(f"QUERY: {question}")
print("\n\n\n-------------------------")
print(f"Answer: {answer}")
print("\n\n\n-------------------------")
for index, result in enumerate(similar_docs):
print(f"\n\n-- RESULT {index+1}:\n")
print(result)
Results:
Conclusion:
-- 1