Agentic AI in 2025: why moving to production remains the real challenge



In 2025, agentic AI has established itself as a major trend. But beyond the demonstrations, one question dominates: how can organizations truly move to production in a controlled, sustainable, and responsible way? This article synthesizes the main obstacles observed and the governance, orchestration, and observability requirements needed to industrialize AI agents.

Agentic AI in 2025: why moving to production remains the real challenge

In just a few months, agentic AI has become one of the most visible topics in the AI ecosystem. Autonomous agents, multi-agent systems, orchestration of complex tasks: the promises are strong and the demonstrations are often impressive.

Yet one reality stands out in 2025: moving AI agents into production remains rare, complex, and widely underestimated. Behind PoCs and “showcase” approaches, very few organizations have actually industrialized AI agents in critical, governed, and sustainable environments.

This gap is not anecdotal. It reflects a structural reality: AI agents introduce specific challenges in terms of governance, orchestration, monitoring, observability, and accountability [1][2][3].

The PoC illusion: when the AI agent never leaves the lab

Many agentic AI initiatives remain stuck at the prototype stage. Recent frameworks on AI agent governance reveal a recurring pattern: agents are often designed as experimental artifacts, without explicit requirements for operational deployment [1].

A PoC AI agent may work in a controlled environment, but it usually relies on implicit assumptions:

  • data quality and stability of surrounding systems;
  • a limited scope of action (few integrations, few exceptions);
  • low exposure to failure cases, unexpected behaviors, and unplanned interactions.

In production, these assumptions quickly break down. The agent becomes a full-fledged actor within the system, capable of interacting with resources, humans, other agents, and business processes, with tangible consequences [2].

Moving to production is therefore not just about “scaling up”: it represents a change in the very nature of the system and its associated risks.

What changes when an AI agent goes into production?

The reference publications converge on a key point: an AI agent is not merely a model, nor just a conversational application. It is a system that can perceive, plan, act, and adapt its strategy based on context, sometimes with significant autonomy [1][2].

In production, this requires clarifying levels of autonomy and the safeguards associated with them:

  1. Scope of action: what the agent is allowed to do, on which systems, and with which access rights.
  2. Accountability: who is responsible when the agent triggers an action or decision [2].
  3. Supervision: when and how humans must validate, arbitrate, or take back control [1][3].
  4. Failure management: how the agent fails safely, escalates issues, and remains stoppable in case of anomalies [3].

AI agent governance: you cannot govern an agent like a model

Traditional AI governance (focused on models, data, and evaluation) remains necessary, but it is insufficient for agents capable of planning and executing actions in real environments [1][2].

Agent-specific governance frameworks emphasize additional requirements:

  • Defining roles, objectives, and limits: explicit mandates, autonomy thresholds, rules of engagement [1].
  • Assigning human accountability: supervision, validation, arbitration, responsibility [2].
  • Implementing intervention mechanisms: suspension, recovery, escalation, and safety controls [3].
  • Formalizing compliance and accountability: how the organization demonstrates control over the effects produced by agents [2].

In practice, this means designing AI agent governance as a transversal framework: technical, legal, organizational, and operational [1][2][3].

AI agent orchestration: the system-level problem PoCs tend to ignore

Many demonstrations focus on a single agent. However, industrial use cases often lead to more complex architectures: chains of specialized agents, multi-agent systems, and agent–agent or agent–human interactions [1][3].

Orchestration then becomes central:

  • Who triggers the agent, and in what context?
  • How are steps sequenced, and with which dependencies?
  • How are loops, conflicts, or uncontrolled action escalations avoided?
  • Which control points and human validations are required?

Without explicit and governed orchestration, an agent-based system can become unpredictable at scale, even if each individual agent appears to function correctly in isolation [1].

Monitoring and observability of AI agents: making actions auditable, not just outputs

Production deployments often fail on one critical point: the inability to observe and audit what an agent actually does. Governance frameworks emphasize that an agent must be observable not only through its outputs, but also through its decisions, actions, and interactions [1][3].

Production-grade observability typically includes:

  • Traceability: logging actions, tools used, resources accessed, and outcomes produced [3].
  • Decision context: the ability to reconstruct decision paths (within reasonable limits) to understand “why” an action was taken [1].
  • Anomaly detection: identifying drifts, loops, escalations, unexpected behaviors, and performance degradation [1][3].
  • Auditability and compliance: producing usable evidence for accountability, incident analysis, and regulatory requirements [2].

Without these mechanisms, securing, scaling, and continuously improving AI agents over time becomes extremely difficult [2][3].

Accountability, compliance, and responsibility: the real wall to production

AI agents fundamentally transform the risk landscape: the issue is no longer just the quality of an answer, but the impact of actions taken. Global governance analyses highlight that the central question becomes: who is responsible for what an agent does, and with which control and accountability mechanisms? [2]

In production, this forces organizations to align:

  • AI governance, IT governance, and business governance;
  • compliance frameworks, internal policies, and escalation processes;
  • human oversight mechanisms, controls, audits, and documentation [1][2][3].

Without this foundation, agentic AI remains confined to experimentation, lacking the guarantees required for large-scale, real-world use [1][2].

Conclusion

In 2025, agentic AI is entering a phase where the differentiator is no longer the demo, but the ability to move into production in a controlled way. The organizations that succeed will not be those accumulating PoCs, but those that invest early in:

  1. governance (mandates, limits, accountability) [1][2];
  2. orchestration (controls, decision points, supervision) [1][3];
  3. monitoring and observability (traceability, auditability, anomaly detection) [1][3].

In other words, industrializing agentic AI is not about building flashy demos; it is about governance and operational engineering. This is where sustainable value creation truly happens.

References

 
[1] https://partnershiponai.org/resource/preparing-for-ai-agent-governance/
[2] https://partnershiponai.org/resource/ai-agents-global-governance-analyzing-foundational-legal-policy-and-accountability-tools/
[3] https://adoption.microsoft.com/files/copilot-studio/Agent-governance-whitepaper.pdf

Retrieval Augmented Generation

Introduction to Retrieval Augmented Generation (RAG) with LangChain and Ollama

1. Why are we talking about Retrieval Augmented Generation (RAG) today?

Generative AI models (like GPT, Claude, or LLaMA) are capable of producing fluent and relevant text. But they have a major limitation: they can only answer based on what they learned during training. Therefore, they can:

  • Provide outdated information.
  • Invent nonexistent facts (hallucinations).
  • Ignore data specific to a company or domain.

This is where RAG (Retrieval Augmented Generation) comes in: it combines the power of generative models with an external knowledge base (documents, databases, internal files, etc.).

In short: before generating an answer, the model retrieves the relevant information from your data, then uses it to provide an accurate response.

Note: some tools like ChatGPT with web browsing or Perplexity give the impression that the model knows the Internet in real time. In reality, they combine a generative model with an external online search mechanism. This approach is already a form of RAG, but applied to the public web. The value of custom RAGs is that you can apply the same principle… to your own private data (internal documents, client databases, reports, etc.).


2. The principle of Retrieval Augmented Generation

A Retrieval Augmented Generation pipeline works in three steps:

  1. Document indexing
    Text is split into small pieces (chunks), then transformed into numerical vectors (embeddings) so they can be efficiently searched.
  2. Contextual search
    When a question is asked, the system searches your documents for the most relevant passages using a vector database.
  3. Augmented generation
    The language model takes these passages as context and generates a more reliable response, possibly including the sources used.

3. The tools we will use

For our example, we will build a small RAG with:

  • LangChain: a Python framework that makes it easier to build AI chains (LLM + retrieval + memory…).
  • Ollama: a tool that allows you to easily download and run language models locally (for example LLaMA, Mistral, Gemma). In our code, we use a small local model gemma3:1b.
  • FAISS: a library from Meta for managing vector searches.
  • HuggingFace Embeddings: to transform our texts into numerical vectors.

ℹ️ It is not mandatory to use Ollama or open models, you can choose any model you like: open source (LLaMA, Mistral, Gemma…) or closed (GPT, Claude, etc.).


4. Step-by-step implementation

We will start from a file doc.txt (our knowledge base). Here is the file we use for this example:

Bio of a Fictional Person

Name: Clara Mendoza
Age: 34
Location: Barcelona, Spain
Profession: Environmental Policy Analyst

Clara Mendoza is a passionate environmental policy analyst dedicated to shaping sustainable urban development. With over a decade of experience in climate policy and renewable energy initiatives, she has worked with NGOs, municipal governments, and international organizations to design strategies that balance economic growth with environmental preservation.

She holds a master’s degree in Environmental Policy and Governance from the London School of Economics, where her thesis focused on community-driven renewable energy projects in Southern Europe. Fluent in Spanish, English, and French, Clara has presented her work at global conferences, advocating for greener cities and stronger cross-border collaborations.

Beyond her professional life, Clara is an avid traveler and amateur photographer. Her weekends often involve hiking in the Pyrenees, experimenting with plant-based recipes, or capturing urban landscapes through her camera lens. She also volunteers as a mentor for young women entering the field of environmental science and policy.

a) Load the document

from langchain.docstore.document import Document

# Opening the document
with open("./doc.txt", "r", encoding="utf-8") as f:
    input_doc = f.read()

document = Document(page_content=input_doc)

Here, we read our text file and convert it into a Document object usable by LangChain.


b) Split the text into chunks

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=150, chunk_overlap=40)
chunks = splitter.split_documents([document])

We segment the document into small pieces of 150 characters, with an overlap of 40 characters to avoid cutting an idea in half.


c) Create the vector database

from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vector_store = FAISS.from_documents(chunks, embeddings)

Each chunk is transformed into a numerical vector using a HuggingFace model, then stored in a FAISS vector database.


d) Build the RAG chain

from langchain_ollama.llms import OllamaLLM
from langchain.chains import RetrievalQA

qa_chain = RetrievalQA.from_chain_type(
    llm=OllamaLLM(model="gemma3:1b"),  # local model executed with Ollama
    retriever=vector_store.as_retriever(),
    return_source_documents=True,
    chain_type="stuff",
)

Here, we put everything together:

  • The LLM (gemma3:1b via Ollama).
  • The vector search engine.
  • A chain that sends the retrieved passages to the model.

e) Ask a question

result = qa_chain.invoke({"query": "What languages does Clara speak ?"})
print("Answer:", result['result'])

print("\nSource Documents:")
for doc in result['source_documents']:
    print(f"- {doc.page_content}")

When we ask the question “What languages does Clara speak?”, the system will search in our doc.txt and give the answer:

Answer: Clara speaks Spanish, English, and French.
Source Documents:
- Clara Mendoza is an environmental policy analyst [...]
  She is fluent in Spanish, English, and French.

5. Complete code

Here is the full Python module you can run directly:

from langchain.docstore.document import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import FAISS
from langchain_ollama.llms import OllamaLLM
from langchain.chains import RetrievalQA

# Opening the document (knowledge base for the RAG)
with open("./doc.txt", "r", encoding="utf-8") as f:
    input_doc = f.read()

document = Document(page_content=input_doc)

# Automatic split into chunks
splitter = RecursiveCharacterTextSplitter(chunk_size=150, chunk_overlap=40)
chunks = splitter.split_documents([document])

# Transforming chunks into vectors (embeddings)
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vector_store = FAISS.from_documents(chunks, embeddings)

qa_chain = RetrievalQA.from_chain_type(
    llm=OllamaLLM(model="gemma3:1b"),  # We use the gemma3:1b model installed locally
    retriever=vector_store.as_retriever(),
    return_source_documents=True,
    chain_type="stuff",
)

# Prompt and retrieval of the answer (and sources)
result = qa_chain.invoke({"query": "What languages does Clara speak ?"})
print("Answer:", result['result'])
print("\nSource Documents:")
for doc in result['source_documents']:
    print(f"- {doc.page_content}")

6. Conclusion

Retrieval Augmented Generation (RAG) makes it possible to transform a generic language model into a specialized assistant, capable of relying on your own data.

  • Applications are numerous: enterprise chatbots, research assistants, educational tools…
  • Thanks to frameworks like LangChain and tools like Ollama, it becomes easy to build a working prototype.

And most importantly, you stay in control: you can choose your own data and your own models (open source or closed).

 

To go further, check out our article on automation and agentic AI.


Useful resources:

agentic AI

A gentle introduction to workflow automation and agentic AI

If you’d like to dive deeper into this topic, you can watch the full webinar here : A gentle introduction to workflow automation and agentic AI.

Why workflow automation and agentic AI are essential today

Artificial Intelligence (AI) has become one of the fastest-growing technology trends. Two concepts in particular, workflow automation and agentic AI, are transforming how businesses streamline their operations, improve productivity, and unlock new opportunities.

According to a McKinsey report published in June 2025 [1], generative and agentic AI could unlock between $2.6 and $4.4 trillion in additional value beyond traditional analytical AI. Yet most organizations are still struggling: 78% of companies report deploying generative AI, but 80% see no tangible results, and only 1% consider their AI strategy mature.

This paradox highlights the gap between experimentation and real business impact, a gap that workflow automation and agentic AI can help close.

What is an AI agent?

The term AI agent is widely used but rarely defined consistently. At its core, an AI agent is a system capable of:

  1. Perceiving its environment (input)
  2. Making decisions (reasoning or planning)
  3. Acting to achieve a goal
  4. Adapting through feedback and learning

AI agents can be placed along a spectrum:

  • Specialized agents: very narrow scope (e.g., email classifier, sentiment detector)
  • Autonomous agents: able to act on goals without step-by-step instructions (e.g., an AI assistant managing a calendar)
  • Multi-agent systems: networks of agents collaborating to solve complex tasks such as supply chain optimization

The automation landscape: where to start

Before diving into AI-enhanced automation, it’s often recommended to begin with traditional workflow automation tools such as:

These platforms let you connect apps and services with “if this, then that” logic, reducing repetitive manual work.

Pros: easy to use, little or no coding required, accelerates structured workflows.
Cons: rigid rules, fragile when inputs change, unable to process unstructured data like PDFs or free-text emails.

What AI brings to workflow automation

Adding AI components makes automation smarter, more flexible, and closer to human behavior. Key advantages include:

  • Understanding unstructured data (emails, documents, images, videos)
  • Context-aware decision making
  • Dynamic adaptation to new information
  • Natural language understanding and generation
  • Multi-step reasoning and autonomy

Concrete examples:

  1. Email triage with NLP: AI detects intent and routes automatically to the right person or system.
  2. Smart document processing: AI extracts and validates data from PDFs or scanned forms.
  3. AI-generated customer replies: drafts personalized responses that can be reviewed before sending.

From simple automation to agentic AI

The path to agentic AI follows a maturity progression:

  1. Traditional automation: structured, rule-based workflows
  2. Automation + AI: AI modules integrated into workflows (chatbots, NLP, intelligent routing)
  3. Agentic AI: systems able to generate their own action plans, reason across steps, and act autonomously

For example, while Make.com can automate email logging and replies, next-generation platforms like Manus or ChatGPT’s Agent Mode showcase autonomous agents able to plan and execute tasks independently.

Next steps: going more technical

Once you’ve experimented with no-code automation, the next step is to explore agent frameworks and protocols such as:

  • LangChain, LangGraph, LlamaIndex: to build advanced AI-powered workflows.
  • MCP (Model Context Protocol) and A2A (Agent-to-Agent Protocol): to enable agents to collaborate with each other or with traditional systems.

These tools pave the way for scalable and maintainable AI systems beyond simple prototypes.

Conclusion

Workflow automation and agentic AI mark an evolution: from rigid processes to intelligent and adaptive systems. Whether you start with Zapier or Make, or explore multi-agent systems, the key is to experiment while keeping business value and scalability in mind.

However, before diving headfirst into agentic AI, take the time to anchor a clear strategic vision and build leadership awareness, as explained in our guide on integrating AI into business in 2025.

To go further, watch the full webinar : A gentle introduction to workflow automation and agentic AI.

References

 

[1] https://www.mckinsey.com/capabilities/quantumblack/our-insights/seizing-the-agentic-ai-advantage