In recent years, the rapid development of artificial intelligence (AI) technology has attracted widespread attention and discussion. And now, we are standing at the threshold of a completely new era, facing the brand-new future brought by AI Agents. Building agents with LLM (large language models) as their core controller is a very cool concept. It simulates human workflows and is capable of autonomously searching, analyzing, and utilizing information to accomplish goals.
OpenAI co-founder Andrej Karpathy metaphorically described the process of AI Agents on Twitter: each invocation of GPT is like one thought. By chaining them together, you can create Agent systems that can perceive, think, and act.
There have been many cases of AI Agent systems, such as AutoGPT, BabyAGI, Camel, Jarvis, AgentGPT, etc.
How Do AI Agent Systems Work?
AI Agent systems are so powerful, so how exactly do they work?
In AI Agent systems, the LLM engine acts as its brain, providing powerful processing capabilities and intelligent thinking. In addition, it also relies on several key components:
- Task decomposition and Self-reflection: Provides the AI Agent system with the ability to decompose tasks and self-reflect.
- Memory: Provides the AI Agent system with the ability to store and recall additional information over long periods of time.
- Tool: Allows the AI Agent to take actions externally to genuinely impact the real world.
1. Task Decomposition and Self-reflection
Task Decomposition
In the task decomposition phase, AI robots (AI-BOTs) typically use techniques like Chain of Thoughts (CoT) and Tree of Thoughts (ToT).
- CoT breaks down complex tasks into smaller, simpler steps through "think step by step". It divides big tasks into multiple achievable smaller ones and illustrates the LLM's reasoning process.
- ToT, on the other hand, tries to consider multiple potential viable plans at once. It explores more possibilities at each step, first breaking down the problem into multiple thinking steps, and generating multiple thoughts in each step, thus creating a tree of thoughts.
Self Reflection
In the self-reflection phase, the AI Agent reviews past actions and decisions, and corrects previous mistakes to iteratively improve itself. Commonly used self-reflection techniques include ReAct, Reflextion, Chain of Hindsight, etc.
- ReAct tracks the LLM's reasoning process to optimize, trace, and update action plans, and handle edge cases.
- Reflextion goes a step further than ReAct by adding reasoning evaluation in ReAct's process to try and improve inference results.
- Chain of Hindsight learns and optimizes inference results through massive feedback.
2. Memory
Human memory can be divided into three types: sensory, short-term, and long-term.
Sensory memory comes from visual, auditory, tactile, etc. feedback and usually only lasts seconds. Short-term memory involves memories relevant to the cognitive tasks currently being performed and usually lasts tens of seconds. Long-term memory consists of experiences and recollections from the past, usually lasting tens of years. Our brains automatically extract the corresponding memories from long-term memory when needed.
AI Agents also simulate the human memory usage process. For shorter sensory and short-term memories, the AI Agent can directly put them in context. For long-term memories, the AI Agent stores them externally and extracts relevant memories as needed.
We currently often use vector databases to store and search external memories. They use maximum inner product search (MIPS) techniques to search memories for relevance. Commonly used MIPS algorithms include LSH, ANNOY, HNSW, FAISS, ScaNN, etc.
3. Tool
Tool use is also a very important part. Task planning, reflection, and memory only provide the AI Agent with the ability to think. It still needs tools to take concrete actions. Equipping the AI Agent with tools is like giving it limbs, allowing it to complete tasks by leveraging various tools and resources.
Current ChatGPT plugins and OpenAI API function calling are excellent examples of LLMs using tools. In addition, there are also tools using methods like MRKL, TALM, Toolformer, HuggingGPT, and API Bank.
Classic Case of AI Agents: AutoGPT
AutoGPT is an experimental open-source AI agent program. It uses GPT-4 to autonomously manage tasks such as creating websites, writing articles, generating logos, or promoting products. It can access the internet and collect and analyze various kinds of information, learning from the web and completing tasks.
The amazing thing about AutoGPT is its autonomy. It operates completely independently without requiring additional intervention from users. It also has long-term and short-term memory systems, allowing it to remember what it has done in the past, learn from experience, and optimize decisions based on past actions autonomously, enabling it to continuously self-improve over time.
Unresolved Problems of AI Agents
The development and application of AI Agents show great potential and prospects in many fields. However, like any other technology, AI Agents also have some limitations:
- Limited context length: The limited context capacity constrains the effects of AI Agent systems, especially mechanisms like task planning and self-reflection. Although vector storage and retrieval provide access to external information, their representational power is not as strong as full attention.
- Challenges in long-term planning and task decomposition: Although AI Agents may excel at specific tasks, compared to humans, they still have significant gaps in long-term planning and task decomposition.
- Reliability of natural language interfaces: Current AI Agent systems rely on natural language as the interface between LLMs and external components. However, large language model outputs are not completely reliable, as they may occasionally have formatting errors or exhibit rebellious behaviors.
Of course, with the rapid evolution of AI technology, we believe these limitations will soon be overcome in the near future.
How to Implement AI Agents?
Although current AI Agents are not yet mature enough to fully delegate tasks given the stage of technological development, we can still implement agent capabilities in pragmatic and reasonable intermediate solutions on the GPTBots platform.
Flow BOT — Visually Planning Task Flows
Task planning is a key component of AI Agents. The GPTBots platform provides the ability to create AI-BOTs by visually assembling task flows with components — Flow BOT. The platform has abstracted common, generic AI-BOT development modules into multiple components. Developers can easily "plan" and splice different components into a "task" through simple drag and drop on the interface, according to their own business needs, and define it as an AI-BOT to solve specific problems.
Flow BOT not only has flexible task flow configuration capabilities, but also provides many configurable options within different development modules, such as input, output, plugins, knowledge base, conditional logic, etc., to help developers flexibly cope with various business scenarios.
Plugins — Enable AI-BOTs to Perform Any Task
AI Agents need tools to perform various tasks, and the plugins provided by the GPTBots platform serve that purpose.
The GPTBots platform provides public plugins covering academic, business, life, work and many other fields, for developers to incorporate into AI-BOTs, enabling AI-BOTs to communicate with the outside world and perform corresponding tasks.
At the same time, the GPTBots platform also empowers developers to develop plugins according to their own needs, integrate them into AI-BOTs for invocation, and meet the requirements of their own business scenarios.
Short-term and Long-term Memory — Make Decisions with More Abundant Information
Memory configuration is another functionality provided by GPTBots platform that is conducive to achieving agent capabilities. Developers can expand the problem-solving capabilities of AI-BOTs by configuring long-term and short-term memories.
For large-scale problems with substantial amounts of information, long-term memory capabilities are especially critical. For general problems, short-term memory is often sufficient. For single-turn QA problems, long-term and short-term memories may not even be needed.
This also gives the functionality another layer of value: developers can rationally define the memory configuration of AI-BOTs based on their own needs, because longer memories mean more cost consumption. The long-term and short-term memory functionalities provide developers with a way to control AI-BOT costs.
The Future of AI Agents
The powerful capabilities of AI Agents will make them ubiquitous assistants in our future, providing assistance and support for our lives and work. Whether in family life helping us manage daily affairs and household chores, or in the workplace assisting us in processing data and making decisions, AI Agents will play an important role.
In family life, AI Agents can become smart home butlers, learning our preferences and habits, automatically adjusting temperature, lighting and music, providing a personalized living experience. They can also help us manage shopping lists, schedules, and reminders, making our lives more convenient and efficient.
In the workplace, AI Agents become our intelligent assistants and data analysts. They can quickly process large amounts of data, provide accurate analytics and predictions to help us make wiser decisions. AI Agents can also automate tedious tasks, improve work efficiency, and reduce people's work pressure.
Of course, as an emerging technology, AI Agents also face some challenges and risks. We need to ensure the safety and reliability of AI Agent assistants, and avoid accidents and adverse consequences. At the same time, we also need to formulate relevant regulations and industry norms to clarify responsibilities and regulatory mechanisms to ensure the reasonable use and development of AI Agents.
Current AI Agents are still in their initial stage, perhaps not yet perfect, but if this direction maintains the same development speed as generative AI, we may soon see commercialized AI Agent assistants appear around us. That day may come very soon.