Retrieval Augmented Generation

Introduction to Retrieval Augmented Generation (RAG) with LangChain and Ollama

1. Why are we talking about Retrieval Augmented Generation (RAG) today?

Generative AI models (like GPT, Claude, or LLaMA) are capable of producing fluent and relevant text. But they have a major limitation: they can only answer based on what they learned during training. Therefore, they can:

  • Provide outdated information.
  • Invent nonexistent facts (hallucinations).
  • Ignore data specific to a company or domain.

This is where RAG (Retrieval Augmented Generation) comes in: it combines the power of generative models with an external knowledge base (documents, databases, internal files, etc.).

In short: before generating an answer, the model retrieves the relevant information from your data, then uses it to provide an accurate response.

Note: some tools like ChatGPT with web browsing or Perplexity give the impression that the model knows the Internet in real time. In reality, they combine a generative model with an external online search mechanism. This approach is already a form of RAG, but applied to the public web. The value of custom RAGs is that you can apply the same principle… to your own private data (internal documents, client databases, reports, etc.).


2. The principle of Retrieval Augmented Generation

A Retrieval Augmented Generation pipeline works in three steps:

  1. Document indexing
    Text is split into small pieces (chunks), then transformed into numerical vectors (embeddings) so they can be efficiently searched.
  2. Contextual search
    When a question is asked, the system searches your documents for the most relevant passages using a vector database.
  3. Augmented generation
    The language model takes these passages as context and generates a more reliable response, possibly including the sources used.

3. The tools we will use

For our example, we will build a small RAG with:

  • LangChain: a Python framework that makes it easier to build AI chains (LLM + retrieval + memory…).
  • Ollama: a tool that allows you to easily download and run language models locally (for example LLaMA, Mistral, Gemma). In our code, we use a small local model gemma3:1b.
  • FAISS: a library from Meta for managing vector searches.
  • HuggingFace Embeddings: to transform our texts into numerical vectors.

ℹ️ It is not mandatory to use Ollama or open models, you can choose any model you like: open source (LLaMA, Mistral, Gemma…) or closed (GPT, Claude, etc.).


4. Step-by-step implementation

We will start from a file doc.txt (our knowledge base). Here is the file we use for this example:

Bio of a Fictional Person

Name: Clara Mendoza
Age: 34
Location: Barcelona, Spain
Profession: Environmental Policy Analyst

Clara Mendoza is a passionate environmental policy analyst dedicated to shaping sustainable urban development. With over a decade of experience in climate policy and renewable energy initiatives, she has worked with NGOs, municipal governments, and international organizations to design strategies that balance economic growth with environmental preservation.

She holds a master’s degree in Environmental Policy and Governance from the London School of Economics, where her thesis focused on community-driven renewable energy projects in Southern Europe. Fluent in Spanish, English, and French, Clara has presented her work at global conferences, advocating for greener cities and stronger cross-border collaborations.

Beyond her professional life, Clara is an avid traveler and amateur photographer. Her weekends often involve hiking in the Pyrenees, experimenting with plant-based recipes, or capturing urban landscapes through her camera lens. She also volunteers as a mentor for young women entering the field of environmental science and policy.

a) Load the document

from langchain.docstore.document import Document

# Opening the document
with open("./doc.txt", "r", encoding="utf-8") as f:
    input_doc = f.read()

document = Document(page_content=input_doc)

Here, we read our text file and convert it into a Document object usable by LangChain.


b) Split the text into chunks

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=150, chunk_overlap=40)
chunks = splitter.split_documents([document])

We segment the document into small pieces of 150 characters, with an overlap of 40 characters to avoid cutting an idea in half.


c) Create the vector database

from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vector_store = FAISS.from_documents(chunks, embeddings)

Each chunk is transformed into a numerical vector using a HuggingFace model, then stored in a FAISS vector database.


d) Build the RAG chain

from langchain_ollama.llms import OllamaLLM
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm=OllamaLLM(model="gemma3:1b"),  # local model executed with Ollama
    retriever=vector_store.as_retriever(),
    return_source_documents=True,
    chain_type="stuff",
)

Here, we put everything together:

  • The LLM (gemma3:1b via Ollama).
  • The vector search engine.
  • A chain that sends the retrieved passages to the model.

e) Ask a question

result = qa_chain.invoke({"query": "What languages does Clara speak ?"})
print("Answer:", result['result'])

print("\nSource Documents:")
for doc in result['source_documents']:
    print(f"- {doc.page_content}")

When we ask the question “What languages does Clara speak?”, the system will search in our doc.txt and give the answer:

Answer: Clara speaks Spanish, English, and French.
Source Documents:
- Clara Mendoza is an environmental policy analyst [...]
  She is fluent in Spanish, English, and French.

5. Complete code

Here is the full Python module you can run directly:

from langchain.docstore.document import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain_ollama.llms import OllamaLLM
from langchain.chains import RetrievalQA

# Opening the document (knowledge base for the RAG)
with open("./doc.txt", "r", encoding="utf-8") as f:
    input_doc = f.read()

document = Document(page_content=input_doc)

# Automatic split into chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=150, chunk_overlap=40)
chunks = splitter.split_documents([document])

# Transforming chunks into vectors (embeddings)
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vector_store = FAISS.from_documents(chunks, embeddings)

qa_chain = RetrievalQA.from_chain_type(
    llm=OllamaLLM(model="gemma3:1b"),  # We use the gemma3:1b model installed locally
    retriever=vector_store.as_retriever(),
    return_source_documents=True,
    chain_type="stuff",
)

# Prompt and retrieval of the answer (and sources)
result = qa_chain.invoke({"query": "What languages does Clara speak ?"})
print("Answer:", result['result'])
print("\nSource Documents:")
for doc in result['source_documents']:
    print(f"- {doc.page_content}")

6. Conclusion

Retrieval Augmented Generation (RAG) makes it possible to transform a generic language model into a specialized assistant, capable of relying on your own data.

  • Applications are numerous: enterprise chatbots, research assistants, educational tools…
  • Thanks to frameworks like LangChain and tools like Ollama, it becomes easy to build a working prototype.

And most importantly, you stay in control: you can choose your own data and your own models (open source or closed).

 

To go further, check out our article on automation and agentic AI.


Useful resources:

agentic AI

A gentle introduction to workflow automation and agentic AI

If you’d like to dive deeper into this topic, you can watch the full webinar here : A gentle introduction to workflow automation and agentic AI.

Why workflow automation and agentic AI are essential today

Artificial Intelligence (AI) has become one of the fastest-growing technology trends. Two concepts in particular, workflow automation and agentic AI, are transforming how businesses streamline their operations, improve productivity, and unlock new opportunities.

According to a McKinsey report published in June 2025 [1], generative and agentic AI could unlock between $2.6 and $4.4 trillion in additional value beyond traditional analytical AI. Yet most organizations are still struggling: 78% of companies report deploying generative AI, but 80% see no tangible results, and only 1% consider their AI strategy mature.

This paradox highlights the gap between experimentation and real business impact, a gap that workflow automation and agentic AI can help close.

What is an AI agent?

The term AI agent is widely used but rarely defined consistently. At its core, an AI agent is a system capable of:

  1. Perceiving its environment (input)
  2. Making decisions (reasoning or planning)
  3. Acting to achieve a goal
  4. Adapting through feedback and learning

AI agents can be placed along a spectrum:

  • Specialized agents: very narrow scope (e.g., email classifier, sentiment detector)
  • Autonomous agents: able to act on goals without step-by-step instructions (e.g., an AI assistant managing a calendar)
  • Multi-agent systems: networks of agents collaborating to solve complex tasks such as supply chain optimization

The automation landscape: where to start

Before diving into AI-enhanced automation, it’s often recommended to begin with traditional workflow automation tools such as:

These platforms let you connect apps and services with “if this, then that” logic, reducing repetitive manual work.

Pros: easy to use, little or no coding required, accelerates structured workflows.
Cons: rigid rules, fragile when inputs change, unable to process unstructured data like PDFs or free-text emails.

What AI brings to workflow automation

Adding AI components makes automation smarter, more flexible, and closer to human behavior. Key advantages include:

  • Understanding unstructured data (emails, documents, images, videos)
  • Context-aware decision making
  • Dynamic adaptation to new information
  • Natural language understanding and generation
  • Multi-step reasoning and autonomy

Concrete examples:

  1. Email triage with NLP: AI detects intent and routes automatically to the right person or system.
  2. Smart document processing: AI extracts and validates data from PDFs or scanned forms.
  3. AI-generated customer replies: drafts personalized responses that can be reviewed before sending.

From simple automation to agentic AI

The path to agentic AI follows a maturity progression:

  1. Traditional automation: structured, rule-based workflows
  2. Automation + AI: AI modules integrated into workflows (chatbots, NLP, intelligent routing)
  3. Agentic AI: systems able to generate their own action plans, reason across steps, and act autonomously

For example, while Make.com can automate email logging and replies, next-generation platforms like Manus or ChatGPT’s Agent Mode showcase autonomous agents able to plan and execute tasks independently.

Next steps: going more technical

Once you’ve experimented with no-code automation, the next step is to explore agent frameworks and protocols such as:

  • LangChain, LangGraph, LlamaIndex: to build advanced AI-powered workflows.
  • MCP (Model Context Protocol) and A2A (Agent-to-Agent Protocol): to enable agents to collaborate with each other or with traditional systems.

These tools pave the way for scalable and maintainable AI systems beyond simple prototypes.

Conclusion

Workflow automation and agentic AI mark an evolution: from rigid processes to intelligent and adaptive systems. Whether you start with Zapier or Make, or explore multi-agent systems, the key is to experiment while keeping business value and scalability in mind.

However, before diving headfirst into agentic AI, take the time to anchor a clear strategic vision and build leadership awareness, as explained in our guide on integrating AI into business in 2025.

To go further, watch the full webinar : A gentle introduction to workflow automation and agentic AI.

References

 

[1] https://www.mckinsey.com/capabilities/quantumblack/our-insights/seizing-the-agentic-ai-advantage