AI Agents Are Overhyped But I'm Building One Anyway

Every AI company is shipping "agents" now. Autonomous AI that can browse the web, write code, book flights, and solve complex problems without human intervention!

The reality? I've been building agents for a year. Most of the time they:

Loop infinitely on simple tasks
Hallucinate tool calls that don't exist
Confidently produce wrong answers
Cost $50 to do what a human does in 5 minutes

And yet... I keep building them. Because when they work, they're magic. Here's my honest assessment of agent technology in 2025.

What Agents Actually Are

Strip away the marketing and an "agent" is:

text

while not done:
    observation = perceive(environment)
    thought = think(observation, goal, memory)
    action = decide(thought, available_tools)
    result = execute(action)
    memory.update(result)
    done = check_if_complete(result, goal)

The AI observes, thinks, acts, and repeats. That's it. Everything else is implementation details and hype.

The Hype vs Reality

The Hype

"Our agent autonomously researches topics, synthesizes information from multiple sources, and produces publication-ready reports!"

The Reality

text

Agent: I'll search for information about topic X
Agent: *calls search tool*
Agent: I found some results. Let me analyze them.
Agent: *analyzes results*
Agent: I need more information. Let me search again.
Agent: *calls search tool with same query*
Agent: I found some results. Let me analyze them.
Agent: I need more information. Let me search again.
[loops forever]

Or:

text

Agent: I'll use the database_query tool
Error: No tool named 'database_query' exists
Agent: I'll use the query_database tool  
Error: No tool named 'query_database' exists
Agent: I'll use the db.query tool
[gives up after 5 failed attempts]

Why Agents Are Hard

1. Planning is Hard

Humans are good at breaking down complex tasks. LLMs... aren't.

"Book me a flight to Tokyo next week, cheapest option, prefer window seat"

Requires:

Determine what "next week" means
Search multiple flight providers
Compare prices (handling different currencies, layovers)
Check seat availability
Handle login/payment
Confirm booking

Each step has failure modes. The AI needs to handle all of them, recover from errors, and maintain state. Current LLMs struggle with step 1.

2. Tool Use is Fragile

python

tools = [
    {
        "name": "web_search",
        "description": "Search the web for information",
        "parameters": {
            "query": {"type": "string", "description": "Search query"}
        }
    }
]

The AI must:

Know which tool to use (often guesses wrong)
Format parameters correctly (often doesn't)
Handle errors gracefully (often can't)
Know when NOT to use a tool (hardest part)

3. Context Window Limits

Agents need memory. Long tasks accumulate context:

Previous observations
Failed attempts
Intermediate results
Tool outputs

Eventually you hit context limits. Summarization helps but loses detail. There's no great solution.

4. Compounding Errors

If each step has 90% accuracy:

2 steps: 81% success
5 steps: 59% success
10 steps: 35% success

Multi-step tasks fail more than they succeed.

What Actually Works (Sort Of)

Despite the problems, some agent patterns are genuinely useful:

1. Constrained Tool Use

Give the AI 2-3 well-defined tools, not 20:

python

tools = [
    {
        "name": "search",
        "description": "Search documentation for information"
    },
    {
        "name": "answer",
        "description": "Provide final answer to user"
    }
]

Fewer choices = fewer mistakes.

2. Human-in-the-Loop

Don't aim for full autonomy. Instead:

python

def agent_step(state):
    action = ai.decide(state)
    
    if action.confidence < 0.8 or action.is_destructive:
        action = human.review(action)
    
    return execute(action)

The AI proposes, human approves. Much more reliable.

3. Structured Workflows

Instead of "figure it out," define explicit steps:

python

def research_agent(topic):
    # Step 1: Search (AI)
    search_results = ai.search(topic)
    
    # Step 2: Extract key points (AI)
    key_points = ai.extract(search_results)
    
    # Step 3: Synthesize (AI)
    summary = ai.synthesize(key_points)
    
    # Step 4: Format (deterministic)
    return format_report(summary)

Each step is scoped. Failures are isolated. Much more reliable than "go research X."

4. ReAct Pattern

The Reasoning + Acting pattern works reasonably well:

python

prompt = """
You have access to these tools: {tools}

Use this format:
Thought: What I need to do next
Action: tool_name
Action Input: input for the tool
Observation: result from the tool
... (repeat as needed)
Thought: I have enough information
Final Answer: the answer

Question: {question}
"""

Explicit reasoning steps improve reliability.

My Agent Architecture

Here's what I'm actually running:

python

from dataclasses import dataclass
from typing import Callable, Any

@dataclass
class Tool:
    name: str
    description: str
    fn: Callable
    
@dataclass
class AgentState:
    goal: str
    history: list[dict]
    tools: list[Tool]
    max_steps: int = 10

class Agent:
    def __init__(self, model: str = "gpt-4-turbo"):
        self.model = model
        self.client = OpenAI()
    
    def run(self, state: AgentState) -> str:
        for step in range(state.max_steps):
            # Think
            response = self.think(state)
            
            # Check if done
            if response.get("final_answer"):
                return response["final_answer"]
            
            # Execute tool
            if response.get("tool"):
                result = self.execute_tool(
                    response["tool"],
                    response["tool_input"],
                    state.tools
                )
                state.history.append({
                    "thought": response["thought"],
                    "action": response["tool"],
                    "result": result
                })
            
            # Safety check
            if self.is_looping(state.history):
                return "Agent appears stuck. Stopping."
        
        return "Max steps reached without answer."
    
    def think(self, state: AgentState) -> dict:
        messages = self.build_messages(state)
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=messages,
            response_format={"type": "json_object"}
        )
        
        return json.loads(response.choices[0].message.content)
    
    def execute_tool(self, tool_name: str, tool_input: Any, tools: list[Tool]) -> str:
        tool = next((t for t in tools if t.name == tool_name), None)
        
        if not tool:
            return f"Error: Unknown tool '{tool_name}'"
        
        try:
            return str(tool.fn(tool_input))
        except Exception as e:
            return f"Error: {str(e)}"
    
    def is_looping(self, history: list[dict]) -> bool:
        if len(history) < 3:
            return False
        
        # Check for repeated actions
        recent_actions = [h["action"] for h in history[-3:]]
        return len(set(recent_actions)) == 1

Key Design Decisions

1. Max steps limit Never let agents run forever. 10-20 steps max.

2. Loop detection If the agent does the same thing 3 times, stop it.

3. JSON output Force structured output to make parsing reliable.

4. Error messages as observations When tools fail, tell the agent why so it can adapt.

What I'm Actually Using Agents For

Use Case 1: Code Review Assistant

python

tools = [
    Tool("read_file", "Read a file from the repo", read_file),
    Tool("search_code", "Search codebase for pattern", search_code),
    Tool("submit_review", "Submit review comments", submit_review)
]

agent.run(AgentState(
    goal="Review this PR for security issues and bugs",
    tools=tools
))

Works ~70% of the time. Catches real issues the other 30% is wasted.

Use Case 2: Documentation Researcher

python

tools = [
    Tool("search_docs", "Search documentation", search_docs),
    Tool("read_page", "Read a documentation page", read_page),
    Tool("answer", "Provide final answer", lambda x: x)
]

agent.run(AgentState(
    goal="How do I configure authentication in [product]?",
    tools=tools
))

Works well because scope is limited. Documentation is structured.

Use Case 3: Data Analysis Pipeline

python

tools = [
    Tool("query_db", "Execute SQL query", query_db),
    Tool("plot", "Create visualization", plot_data),
    Tool("summarize", "Summarize findings", summarize)
]

agent.run(AgentState(
    goal="Analyze user signups for last 30 days, identify trends",
    tools=tools
))

Surprisingly effective. SQL is well-defined. Hard to go too wrong.

The Frameworks (Honest Review)

LangChain

Pros: Lots of integrations, active community Cons: Abstraction hell, hard to debug, changes constantly

I used LangChain for 6 months. Switched to custom code. No regrets.

AutoGPT/AgentGPT

Pros: Cool demos Cons: Useless for real work. Full autonomy doesn't work.

CrewAI

Pros: Multi-agent coordination, nice API Cons: Complexity doesn't add value for most tasks

My recommendation

Build your own minimal framework. 100-200 lines of code. You'll understand what's happening and can debug it.

What's Coming (My Predictions)

2025:

Better tool use from foundation models
More reliable reasoning (o1-style thinking)
Standardized tool calling conventions
Agents that know their limits

2026:

Agents as products (not just tech demos)
Self-correcting agents that learn from failures
Multi-modal agents (vision + action)

The dream (2027+):

Agents that can actually replace workflows
Reliable enough for high-stakes tasks
Continuous learning from experience

We're not there yet. But the progress is real.

Should You Build Agents?

Yes, if:

You have a well-defined, narrow task
Human-in-the-loop is acceptable
You're okay with 70-80% success rate
The value justifies the complexity

No, if:

You need 99%+ reliability
The task requires true reasoning
You're not willing to iterate extensively
A simple script would work

The Bottom Line

AI agents are overhyped. The demos are better than the reality. Full autonomy doesn't work. Most "agent" products are simple tool-calling loops with good marketing.

But.

The core idea—AI that can use tools and iterate—is powerful. With the right constraints, agents solve real problems. They're getting better quickly.

I'm building agents because the future is clearly agentic. The question isn't whether agents will work, it's when. I want to understand the failure modes now so I'm ready when the technology catches up to the vision.

Just don't believe the hype. Build simple, constrain scope, expect failures, and iterate.

The agents of 2027 will be amazing. The agents of 2025 are... educational.

---

The best agents are the ones that know their limits. We're still teaching them where those limits are.

AI Agents Are Overhyped But I'm Building One Anyway

AI Agents Are Overhyped But I'm Building One Anyway

What Agents Actually Are

The Hype vs Reality

The Hype

The Reality

Why Agents Are Hard

1. Planning is Hard

2. Tool Use is Fragile

3. Context Window Limits

4. Compounding Errors

What Actually Works (Sort Of)

1. Constrained Tool Use

2. Human-in-the-Loop

3. Structured Workflows

4. ReAct Pattern

My Agent Architecture

Key Design Decisions

What I'm Actually Using Agents For

Use Case 1: Code Review Assistant

Use Case 2: Documentation Researcher

Use Case 3: Data Analysis Pipeline

The Frameworks (Honest Review)

LangChain

AutoGPT/AgentGPT

CrewAI

My recommendation

What's Coming (My Predictions)

Should You Build Agents?

The Bottom Line

Written by Jose Viscasillas

Recommended Reads

Clawdbot: Your AI Assistant in Your Pocket (Not Just Another Chatbot)

Fine-Tuning Models Is a Waste of Time (Usually)

AI Code Review: My Bot Found 47 Bugs (12 Were Real)

Subscribe to the Newsletter