Enhancing agent capabilities with generative AI
Generative AI is transforming the development of intelligent agents by enhancing learning efficiency, improving their understanding of environments, and enabling more complex interactions through generative models. Some of the major developments in ushering generative AI in the space of intelligent agents are as follows:
- Data augmentation: Creating synthetic training data with generative models supplements datasets, improving the robustness and efficiency of machine learning agents. For example, self-driving car agents can use generated scene images to learn better object detection and navigation policies.
- Understanding of context: Generative AI constructs simulations modeling real-world complexities in fine detail, aiding agents in contextual understanding for informed decisions. For example, virtual assistants such as chatbots can use generative AI to simulate conversations in diverse contexts, helping them better understand user intent and provide more accurate, context-aware responses before interacting with real users.
- Natural language processing: Generative language models ease human-agent interaction by improving understanding and generation capabilities. Virtual assistants such as Alexa and chatbots leverage generative NLP for natural conversations.
- Creative problem solving: By generating diverse possible solutions, generative AI allows agents to explore creative ideas and evaluate their feasibility. This could allow AI architects to creatively design innovative building layouts while adhering to structural constraints.
The deep integration of generative AI with knowledge representation, learning mechanisms, and decision-making processes yields highly responsive and adaptive intelligent agents capable of operating effectively in dynamic, complex environments. Some examples of how this synergistic combination can enable advanced capabilities are as follows:
- Learning: Agents can gather data from various sources such as sensors, human interactions, or simulations to build models based on their operating environment through machine learning techniques such as reinforcement learning
- Knowledge representation: The learned environmental data is structured into usable representations such as semantic networks, logical rules, or probabilistic graphical models to capture relationships, constraints, and uncertainties
- Decision processes: Based on the represented knowledge, agents use planning and decision-making algorithms (for example, Markov decision processes and MCTS) to derive sequences of actions aiming to achieve their objectives optimally
- Generative models: Provide contextual simulations to enhance agents’ understanding through generated scenarios accounting for complexities such as noisy sensor data, stochastic dynamics, or extraneous factors absent from training data
- Feedback loops: Allow continuous adaptation by feeding real-world interaction outcomes back into the learning mechanisms to refine the agent’s knowledge and decision models based on experience
Start building agentic AI
We have learned quite a lot about the characteristics of intelligent agents, how they are built, how they work with different algorithms, and their essential components. It is now time for a gentle introduction to the world of agentic AI and to start building applications using different frameworks.
In subsequent chapters of this book, we will make extensive use of several open source frameworks. The most popular framework for building agentic and multi-agent AI systems is LangChain’s LangGraph framework, although some of the other noteworthy frameworks (as of this writing) include AutoGen, CrewAI, and MetaGPT. This is not an exhaustive list of open source frameworks; these are only the most popular frameworks that allow you to build agentic and multi-agent systems with LLMs. Note that although some of these frameworks support different programming languages, we will primarily use Python programming language for our purposes. For consistency, we will use LangGraph and OpenAI GPT models throughout the book; however; there are a number of other LLMs that can be used with agentic AI frameworks.
Important note
Although the code samples are created specifically with OpenAI GPT models, you can use any model of your choice that is supported by LangGraph. LangGraph also works with LLMs offered via several cloud providers such as Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). Using AI models or cloud platforms may incur some costs. Refer to the respective AI model documentation for more details.
Now that we have the overview of frameworks and LLMs out of the way, let’s start with building our basic travel agent booking. At this stage, we only want the model to respond back with greetings and any follow-up questions. For example, if we ask the agent to “Book a flight for me”, then we want the model to respond back with a follow-up question about travel cities, dates, and so on. For the following code, we will directly use OpenAI’s Python SDK to build this functionality and use its function calling feature, that is the LLM model’s ability to call a function on the user’s behalf. Here’s the code snippet:
1 import openai from OpenAI 2 3 def book_flight(passenger_name: str, 4 from_city: str, 5 to_city: str, 6 travel_date: str) -> str: 7 return "A flight has been booked" 8 9 tools = [{ "type":"function", 10 "function":{ "name": "book_flight", … }}] 11 12 def travel_agent(user_message: str, messages: list) -> str: 13 messages.append({"role": "user", "content": user_message}) 14 try: 15 response = openai.chat.completions.create( 16 model="gpt-4-turbo", 17 messages=messages, 18 tools=tools) 19 if response.choices[0].message.content: 20 return response.choices[0].message.content 21 elif response.choices[0].message.tool_calls: 22 [ … ] 23 confirmation = book_flight(…) 24 [ … ] 25 response = openai.chat.completions.create( 26 model="gpt-4-turbo", 27 messages=messages) 28 return response.choices[0].message.content
Let us break down what is happening in this code snippet. We first define a book_flight
function in line 3 – at the moment, this function just returns a message that says that the flight booking is complete. The travel_agent
function in line 12 is where we call the LLM, in this case, OpenAI’s gpt-4-turbo
model. We call the LLM’s API using the OpenAI SDK in line 15, which is where we pass in the user’s message, the model’s name, and a set of tools. Note that we are using our book_flight
function as a tool for our intelligent agent and the API takes tools
as a parameter.
We will discuss tools in greater detail in the subsequent chapters, but for now, it is sufficient to understand that tools are a mechanism by which your intelligent agent can interact with the external world (or external systems) to complete a task. In this case, the task is booking a flight ticket. The LLM is smart enough to indicate to us when to call the book_flight
tool function when it has all the details from the passenger. In a more complete solution as we will see in future chapters, functions such as book_flight
will be used to interact with external systems, such as calling APIs to complete the flight booking and so on. Here’s how a possible conversation using this code looks:

Figure 3.7 – A sample conversation with the AI agent
A few things to note here: after the first user message, our agent doesn’t directly call the book_flight
function because it doesn’t have all the parameter values to call the function successfully. In a typical heuristics-based approach, you could use string parsing to find out whether the user has provided their name, travel cities, and date of travel. But such logic can be overly complicated and error-prone. This is where the beauty of an intelligent agent comes in. The LLM has better language understanding capabilities and can know when to call the book_flight
function during the conversation, and if the required values are not provided by the user, it can prompt them to provide these values, that is, their name, travel cities, and date of travel. It can also accurately extract these values from the user’s response, which allows us to call the book_flight
function. For the full code of the intelligent agent, refer to the Chapter_03.ipynb
Python notebook in the GitHub repository.