Stop Calling Every Workflow an Agent
Autonomy, recovery, and access boundaries matter. Without them, it is automation wearing new language.
Position
A sharper argument against a common default assumption.
Autonomy, recovery, and access boundaries matter. Without them, it is automation wearing new language.
Language models alone do not make a workflow agentic.
State, recovery, and constrained action are part of the definition.
Loose language leads to weak architecture and weak controls.
Watch the video
Stop Calling Every Workflow an Agent
The word agent is becoming a way to make ordinary automation sound more advanced than it is. That may be useful for demos. It is not useful for architecture.
I've been watching teams deploy "AI agents" that are really just Claude or GPT-4 wrapped in a Python script with hardcoded prompts. When these systems fail (and they always do), the post-mortem reveals what everyone suspected: there was no agent architecture at all. Just a workflow with a language model attached.
This matters because we're about to spend billions of dollars building the wrong infrastructure. If we can't distinguish between a simple automation and an autonomous agent, we'll under-invest in the control planes, observability, and governance these systems actually need.
The Architecture Gap Between Workflows and Agents
Real agent systems require fundamentally different architecture than workflow automation. I've seen this gap destroy projects at three different Fortune 500 companies in the last year alone.
A workflow orchestrator like Temporal or Apache Airflow executes predetermined steps. It might branch based on conditions, but the paths are known at design time. When you add an LLM to call functions, you haven't created an agent. You've created a workflow with natural language parsing.
Consider what LangGraph actually provides versus what teams think it provides. LangGraph gives you a way to define state machines where an LLM can choose transitions. That's powerful, but it's not autonomous agency. The system can't discover new states, modify its own objectives, or reason about resource constraints beyond what you've explicitly programmed.
Here's what I see in production agent architectures that's missing from most "agentic" workflows:
Persistent State Management: Real agents maintain context across sessions, learn from interactions, and build internal models. AutoGPT keeps conversation history, task progress, and discovered resources in vector databases. Most LLM workflows reset completely between runs.
Dynamic Goal Decomposition: Agents break down objectives into subtasks they discover at runtime. When CrewAI agents collaborate, they're not following a predefined DAG. They're negotiating responsibilities based on capabilities and current state.
Resource Awareness: Production agents understand compute costs, API limits, and time constraints. They make tradeoffs. I worked with a team using Anthropic's Claude to build a research agent that would actually stop pursuing low-value paths when approaching token limits.
Recovery and Adaptation: When a workflow fails, it stops or retries the same action. When an agent fails, it should diagnose why, adjust its approach, and try alternative strategies. Microsoft's Semantic Kernel includes planning components specifically for this kind of adaptive behavior.
Why Teams Keep Making This Mistake
I think there are three reasons teams conflate workflows with agents:
First, vendor marketing deliberately blurs the distinction. Every RPA tool, iPaaS platform, and workflow orchestrator now claims "agentic capabilities" because they added LLM function calling. Zapier calls their LLM-powered workflows "AI agents." Make.com does the same thing. These are useful features, but they're not agents.
Second, the tooling makes it too easy to build the wrong thing. You can get a demo working in LangChain in 30 minutes that looks like an agent. It calls tools, responds in natural language, even appears to make decisions. But deploy that to production and you'll discover it has no memory, no recovery logic, no way to handle edge cases beyond what you explicitly coded.
Third, teams underestimate the operational complexity of real agents. They see OpenAI's Assistants API or AWS Bedrock Agents and think that's the whole solution. But those are just the inference layer. You still need:
- Approval workflows for high-stakes actions
- Audit logs that capture reasoning, not just outcomes
- Resource governance to prevent runaway costs
- Rollback mechanisms when agents make bad decisions
- Security boundaries between agent capabilities
The Real Test for Agent Architecture
If a system has no state, no recovery pattern, no scoped access model, and no meaningful decision latitude, it is probably not an agent. It is a workflow with a language model attached.
I prefer a simpler test: can the system interpret context, choose a path, use tools under constraints, recover from failure, and produce auditable actions? If not, call it what it is.
Let me make this concrete with a comparison:
| Capability | Workflow with LLM | True Agent System |
|---|---|---|
| **State Management** | Stateless between runs | Persistent memory across sessions |
| **Decision Making** | Follows predefined paths | Discovers new approaches at runtime |
| **Tool Usage** | Calls predetermined functions | Selects and combines tools dynamically |
| **Failure Handling** | Retry or halt | Diagnose, adapt strategy, try alternatives |
| **Resource Awareness** | None | Tracks costs, time, API limits |
| **Goal Management** | Fixed objectives | Can decompose and modify goals |
| **Observability** | Logs function calls | Captures reasoning and decision process |
This distinction matters because the operating burden is different. Real agents need runtime controls, observability, approval boundaries, and clear definitions of what counts as success or failure.
Building Blocks of Production Agent Systems
When I evaluate agent platforms, I look for five core components that separate toys from production systems:
1. State and Memory Architecture
Google's Vertex AI Agents provides conversation memory and context management out of the box. But that's table stakes. Production systems need hierarchical memory: short-term working memory, episodic memory for specific interactions, and semantic memory for learned concepts. Pinecone or Weaviate for vector storage isn't enough. You need structured state management.
2. Planning and Reasoning Engines
This is where most "agent" frameworks fall apart. They can execute single LLM calls but can't plan multi-step operations. AutoGen from Microsoft Research shows what's possible: agents that generate plans, critique them, revise based on constraints, then execute with monitoring. The planning layer is what enables true autonomy.
3. Tool Integration with Constraints
It's not enough to let an agent call any API. Production systems need:
- Capability-based access control (this agent can read but not write)
- Rate limiting and budget controls
- Approval workflows for sensitive operations
- Sandboxed execution environments
AWS Bedrock Agents gets this partially right with their action group permissions, but you still need to build the approval and monitoring layer yourself.
4. Observability Beyond Logging
Agent observability isn't just about tracking API calls. You need to capture:
- Decision rationale at each step
- Alternative paths considered but not taken
- Confidence scores and uncertainty measures
- Resource consumption patterns
- Deviation from expected behavior
Tools like Langfuse or Helicone provide LLM observability, but agent systems need deeper introspection. The agent should be able to explain why it made specific choices.
5. Governance and Control Planes
This is what separates enterprise-ready agent systems from demos. You need:
- Runtime policy enforcement
- Dynamic capability adjustment based on context
- Audit trails that would satisfy compliance
- Circuit breakers for runaway agents
- Human-in-the-loop integration points
What This Means for Your Architecture Decisions
Loose language creates bad design. Teams under-invest in orchestration and governance because they assume any tool-calling flow is already agentic.
If you're building automation today, be honest about what you need. Most use cases don't require true agents. A well-designed workflow with LLM-powered steps might be exactly right. Temporal with GPT-4 function calling could solve your problem with 10% of the complexity of an agent system.
But if you need actual autonomous behavior, invest in proper agent architecture from the start:
- Choose frameworks that support stateful operation. LangGraph and AutoGen are better starting points than basic LangChain. But even then, plan to build substantial infrastructure around them.
- Design for observability first. Before you write any agent logic, implement comprehensive tracking of decisions, state changes, and resource usage. You'll need this for debugging and compliance.
- Build incremental autonomy. Start with human-in-the-loop approval for all actions. Gradually expand agent authority as you gain confidence in its decision-making. Amazon's approach with Alexa Skills is instructive: constrained capabilities that expand based on proven reliability.
- Implement proper boundaries. Every agent needs clearly defined:
- Action permissions (what it can do)
- Resource limits (how much it can spend)
- Goal constraints (what constitutes success)
- Failure modes (when to stop and ask for help)
- Test adversarially. Agents will find creative ways to achieve goals that violate your intentions. Red team your agent systems like you would any autonomous system. Microsoft's experience with Tay should be burned into our collective memory.
The Path Forward for Agent Infrastructure
I see three trends that will shape agent architecture over the next 18 months:
Standardized Control Planes: Just as Kubernetes standardized container orchestration, we'll see control plane standards emerge for agents. Early versions like SuperAGI and AgentOps are primitive, but they're moving in the right direction.
Agent-Specific Observability: Generic LLM monitoring isn't enough. We need tools that understand agent concepts: goals, plans, beliefs, and intentions. Expect new players in this space by mid-2024.
Formal Verification Methods: As agents handle more critical tasks, we'll need ways to prove they'll behave within bounds. Research from DeepMind and Anthropic on constitutional AI and debate methods will become practical engineering tools.
Naming matters because architecture follows language. If the label is sloppy, the implementation usually will be too.
Don't call your workflow an agent unless it can actually act with autonomy. And if you need true agent behavior, don't settle for a workflow with an LLM bolted on. The difference will become painfully clear the first time your system encounters a scenario you didn't anticipate.
The companies that get this distinction right will build AI systems that actually work. The ones that don't will spend 2024 debugging "agents" that are really just brittle workflows with expensive API calls.
Continue reading