Ambient Adventures: Teaching ChatGPT on Developing Complex Stories
Zexin Chen* , Eric Zhou* , Kenneth Eaton, Xiangyu Peng, Mark Riedl
Georgia Institute of Technology, Atlanta, GA, 30332, USA
arXiv:2308.01734v1 [cs.CL] 3 Aug 2023
Abstract In this paper, we are focusing on exploring how to guide
an agent to execute imaginary play with large language mod-
Imaginative play is an area of creativity that could allow els such as ChatGPT (OpenAI 2022). Text adventure games
robots to engage with the world around them in a much more
serve as useful test beds because they have also been demon-
personified way. Imaginary play can be seen as taking real ob-
jects and locations and using them as imaginary objects and strated to transfer to visual and real-world domains (Wang
locations in virtual scenarios. We adopted the story genera- et al. 2022; Shridhar et al. 2021; Peng, Riedl, and Am-
tion capability of large language models (LLMs) to obtain the manabrolu 2022).
stories used for imaginary play with human-written prompts.
Those generated stories will be simplified and mapped into 2 Related Work
action sequences that can guide the agent in imaginary play.
To evaluate whether the agent can successfully finish the ext games are turn-based games where players read descrip-
imaginary play, we also designed a text adventure game to tions of the current scene for information and interact with
simulate a house as the playground for the agent to interact. short descriptions of actions (Côté et al. 2018; Wang et al.
2022; Shridhar et al. 2021). During the designing of the text
adventure game, we follow the text game structure (Peng
1 Introduction et al. 2023) to create a house consisting of rooms that con-
In recent years, the domain of agents has experienced ex- tain objects as the realistic mappings of the imaginary play.
traordinary progress, driving the creation of intelligent ma- The difference is that our adventure games don’t have NPCs
chines that connect the realms of science fiction and reality. since we want to focus on whether the story of imaginary
As researchers, engineers, and innovators collaborate, the play can guide the agent instead of elevating the difficulty of
evolution of agents keeps pushing the limits of technology. interaction in the game.
However, how do we ensure that agents have a persistent, yet ChatGPT is an LLM chatbot developed by OpenAI that
non-intrusive presence in the household? Considering kids: can be interacted with in multiple ways, including giving
they are never idle — they find ways to occupy their time prompts to write stories (OpenAI 2022). Many pre-trained
through play and if that play is imaginative play, then the en- LLMs have had success in story generation; by decomposing
tire home becomes a playground. We propose to develop the textual story data into a series of events, it has been found
computational capability for agents to engage in imaginative that these models can generate stories from these events that
play and link that play to navigation through the home. This are more coherent and logical (Martin et al. 2018; Peng et al.
will increase the presence of the agent in the home without 2022b,a). We aim to adopt this ability to train the model with
directly demanding attention from people, but also using cu- prompting and generate stories that can guide the agent in
riosity to invite engagement. imaginary play with ChatGPT.
Imaginative play is an exemplar of everyday human cre-
ativity in which real-world, mundane objects and locations 3 Story Generation
act as substitutes for imaginary objects and locations as part A natural language story is generated automatically as an
of a pretend scenario(Zook, Magerko, and Riedl 2011). A exemplar of the behavior the agent is to enact in imaginary
terrarium can be a garden for growing magic seeds, a kitchen play. Given the topic of the imaginary play, Large Language
can be a laboratory, or a broom handle can be a light saber. Models (LLMs) such as ChatGPT and GPT-4 are used to
Imaginative play is fundamental to human creativity. Com- create the imaginary story, and the real-world objects are
putational systems that can engage in imaginative play can transformed into similar objects in an imaginary world —
create a sense of presence and persona and provide opportu- imaginary objects, to facilitate the imaginary play.
nities for improvisational interactions.
* These
3.1 Imaginary Story Generation
authors contributed equally.
Copyright © 2023, Association for the Advancement of Artificial Firstly, the agent scans the layout of each house to obtain the
Intelligence (www.aaai.org). All rights reserved. real-world objects and rooms, as well as their respective lo-
Figure 1: Pipeline Architecture for Text Game and ChatGPT. A sample iteration is demonstrated.
cations in the house. In this process, LLMs (ChatGPT) gen- samples given, ChatGPT is prompted to at most add one new
erate imaginary objects which match the setting of the imag- item in each sentence. For example, if ChatGPT is given 5
inary world (whether that be magical, horror, etc.) that have magical items and a topic to save a princess, it will initially
similar characteristics to the object it was mapped from. For get all 5 items in the first sentence and then save the princess
example, a “broom” can be transformed into a “wand” in in the second. With this restriction, ChatGPT has to find a
the imaginary setting because they have similar shapes, are logical way to use objects to get others and continue until it
both made of wood and can be held. See Fig 1, where the has enough objects to save the princess.
first prompt by ChatGPT is used to map each original object LLMs may sometimes be prompted to rewrite the story
to an imaginary one. – instead of entirely rewriting the story ChatGPT will be
With each real-world object, we also obtain the admissi- prompted to continue it – refer to Table 1 for the second iter-
ble actions for each one – Ao – an admissible action refers ation. A sample iteration would involve the last sentence of
to one that can be performed with that object. Let there be a the original imaginary story from Table 1 being removed and
set of real-world objects, such as ⟨“broom”, “dresser”, ChatGPT prompted to generate new sentences starting from
“mug”⟩. The set Oo denotes the set of real-world objects. For this point to reach the intended topic. At this point, ChatGPT
example, a “broom” in the house may have the set of admis- has successfully generated a story, and now this story needs
sible actions ⟨“sweep”, “pick up”⟩. With this, LLMs are to be distilled and translated back into admissible actions in
prompted to find the closest, most similar imaginary object the text game.
for each item in Oo , based on the setting of the story. By this,
we map all of our objects from the original Oo to the set of 3.2 Mapping and Filtering
imaginary objects — On . Within any story, a topic, such as The reason for distilling the story is so that these actions can
”saving a princess”, is required for LLMs (ChatGPT) to aim be given for the text game to easily understand. Each sen-
to complete. A wide variety of topics were given based on tence in the imaginary story is distilled into a phrase by tak-
the setting. For example, a magical setting could entail sav- ing the one imaginary object in each sentence and the action
ing a princess, or a horror story would entail finding a key to verb that is associated with it. Refer to Figure 1 and the third
escape. ChatGPT prompt, as well as the Simplified Story in Table 1.
To ensure that the generated story by ChatGPT can be If there is more than one object in a sentence, the newly ob-
simplified into phrases later, several training samples were tained object is chosen. For example, if a sentence is “open
given to it to ensure that it would be easy to do so. These chest to reveal staff”, the distilled phrase will be
training samples are short, 5-7 sentence imaginary stories “reveal staff”, not “open chest”. Once we have
that are concise and contain several random imaginary ob- all of the phrases, we now want to map these phrases back
jects and a topic (such as ”defeating the dragon”). Refer to into admissible actions that can be performed in the real
Fig 1 and the second ChatGPT prompt which uses these world.
training samples and a list of the imaginary objects to gener- Remember that Ao refers to the admissible actions for
ate new samples to use. A common limitation of ChatGPT’s original objects, such as ⟨“sweep”, “use”⟩ for a broom.
story generation is that it will simply create a story where the ChatGPT can then identify the most similar admissible ac-
agent immediately obtains every imaginary item in one sen- tion in Ao that best matches the action performed in the
tence and follows by completing the topic. While this indeed imaginary world. For example, a “broom” in the real world
works logically, it is far from interesting. With the training is mapped to a “wand” – see Table 1 for an example
Table 1: Magical Story Example the score of the preset win state, the agent successfully fin-
ishes all actions given in the sequence.
Topic: Magical World - Saving a Princess We developed our text game in TextWorld (Côté et al.
Imaginary Story (First Iteration): Whisperweaver 2018), an open-source, extensible engine that both gen-
discovers hidden passage. Uncover ancient chest in erates and simulates text games. In the game, we mimic
hidden passage. Open chest to reveal enchanted staff. the physical environment by mapping out the house floor
Also find Crescent Mirror in chest. Wield enchanted plan and including pre-scripted interactions with each ob-
staff for enhanced spellcasting. Use Crescent Mirror ject in the room to give guidance under different use cases
for scrying and divination. Harness the power of the (Narasimhan, Kulkarni, and Barzilay 2015).
enchanted staff and mirror to defeat evil forces and
save princess. 4.1 Game Design Details
Simplified Story: 1. Discovers Whisperweaver 2. Un- We design a game by inserting a base map that records the
cover Ancient Chest 3. Reveal Enchanted Staff 4. Find location of each room and logic objects related to differ-
Crescent Mirror 5. Wield Enchanted Staff 6. Use Cres- ent rooms (See Fig 2). Each room has its own furniture
cent Mirror 7. Harness Enchanted Staff. and appliances, some of which are required, like a light.
Each object Oi has its respective action set Ai which can
Real-World Translation: 1. Wear clothes 2. Open change the states of itself and the game. For instance, the
nightstand 3. Use broom. 4. Open dresser 5. Use object Clothes, Oclo , has two states: ”washed” and ”not
broom. 6. Open dresser 7. Use broom. washed”. The action wash in an action set Aclo can convert
Imaginary Story (Second Iteration): Whisper- the state of Oclo from ”not washed” to ”washed” when the
weaver discovers hidden passage. Uncover ancient agent successfully finishes the action — “wash cloth”.
chest in hidden passage. Open chest to reveal en- To execute this action, the agent starts in the parentBedroom
chanted staff. Also find Crescent Mirror in chest. to grab the dirty clothes. When it moves to the laundry and
Wield enchanted staff for enhanced spellcasting. Use finishes washing the clothes, the agent gains 2 points for
Crescent Mirror for scrying and divination. Discover wash clothes.
recipe for elixir with Crescent Mirror. Brew elixir in For the agent to distinguish similar verbs with similar
the cauldron. Use enchanted staff to activate the elixir. meanings and reactions, for instance, “wash cloth” and
Use transformed abilities from elixir to defeat the evil “clean cloth” should be taken as the same thing to do.
threat. In that way, we ensure the agent will take the same reaction
every time with synonyms.
of an imaginary story. If the “wand” is used to cast a
spell, ChatGPT would determine which admissible action
would be most similar to the action “cast a wand”. If
“sweep” is chosen, then “cast a wand” will be mapped
into “sweep broom” – see Table 1 once again and the
mapping from Simplified Story to Real-World Translation.
The agent can then use these mapped admissible actions to
interact with the real-world environment.
4 Text Adventure Games
Text Adventure Game is the testbed to show how the agent
does imaginary play in the real world. Text games show the Figure 2: Layout of game “Housework”.
event happening within the current scene by depicting the
existing objects and happened actions in short sentences.
Objects taken as entities in the game contain the states and
actions used in presetting the interactions with the agent. 4.2 Reinforcement Learning Agent in Text Game
The story might ignore details when events happen, but the The agent has an action sequence that needs to be finished in
text game can record the hidden state changes with words. the text game and obtains rewards for successfully changing
For this reason, we utilize text games as test playgrounds. the state of the objects or the game. The game process is as
We evaluate the performance of an imaginary story by follows: the agent always starts in a fixed room with a given
whether the agent can perform all input actions sequentially, input action sequence. The agent will obtain a reward when
which indicates that the model-translated action sequences it finishes the input action by interacting with the surround-
can function as guidance to the agent in imaginary play. The ing environment. From section 4.1, if the agent successfully
reward of each round of the game equals the score of the last finishes the action “wash cloth”, it will obtain 2 points
action. To tell the game result directly, we set the last action as a reward. The reward the agent will gain depends on the
in the sequence as the win state. Thus, if the reward equals difficulty of the action. We categorize the activities into three
levels: stand-alone, interactive, and win, corresponding to 2, instructions might not give the expected results every time,
3, and 5 points. we still are able to catch the pattern and re-prompting Chat-
1. We define stand-alone actions as actions that the agent GPT to better train in zero-shot learning.
is able to finish without using any other objects.For ex-
ample, when the agent takes the action “turn on the 6 Conclusions
light”, the Light can be turned on directly after locat- Imaginary play is a creative direction for developing agent
ing the Light. learning abilities. With the help of story generation from
2. The definition of interactive actions is the actions that LLMs (such as ChatGPT (OpenAI 2022)), we can tell the
the agent is able to interact with other objects. For example, model to generate imaginary play stories that guide the
when the agent takes action “water plant” and sees the agent’s interactions through prompts. Story generation al-
Plant, it cannot directly take action if there is no water lows the agent to develop interesting imaginative stories
in its hands. The agent needs to get the kettle, then check with the objects and topic given, allowing the agent to en-
whether the kettle is full of water. If not, it will fill the kettle, gage in imaginative play in the real world.
then carry it to water plant. We use text games to model what happens within a given
3. Win action is the last action that the agent needs to fin- story and the interactions the agent generates with the set-
ish. When the agent takes the last action in the given action ting, making the interaction controllable and explainable.
sequence, “clean the oven”, it comes to the kitchen, Through mapping imaginative play to real-world scenarios
locates the Oven, and then clean it. The game ends when through text games, we figured out how to use rewards to
the agent successfully goes through the whole action se- better prompt the model and construct the stories that can
quence and finishes all actions in it. guide the agent in imaginary play.
5 Findings
Story Generation of Large Language Model:
We used ChatGPT as the LLM to generate stories dur-
ing our experiment (See Section 3). Most stories required
several iterations of revision (Refer to Table 1) until they in-
cluded the win state in the action sequence. Two limitations
of the current model are limited prompting formats and dif-
ficulty in understanding interactive actions in the text game.
The first relates to the drawback of the language model
is that the generation is uncontrolled. Aside from an initial
prompt, generative language models are guided by word co-
occurrence, which can lead to repetition, as well as a ten-
dency to focus on descriptive details that do not move a story
forward. To solve the problem, we kept crafting prompts to
direct the model to create coherent and executable stories
with a clear goal and formed a fixed prompting format. The
format limits the adaptivity of the agent to varied types of
imaginary play. If the setting in imaginary play is modified,
the model needs new prompts for the changes.
The other limitation is the difficulty forChatGPT to un-
derstand connections between objects in the text game. The
generated story cannot associate the objects picked from a
previous room with those in the current room if no detailed
prompts. That may lead to generating actions not allowed
within the text game in the action sequence. The solution to
alleviate such problems is to introduce the missing connec-
tions into the prompting and have more iterations of story
generation to update the prompt with the generated output.
Game Results with Promptings. Results from our sam-
ple stories indicate that the agent cannot determine the fi-
nal win state by itself. To increase the possibility of win,
we record the result and feed it back to the language model
(ChatGPT). The model knows whether the agent success-
fully reaches the win state from the previous round’s score.
If the agent doesn’t win, the prompt will tell the model to
generate more descriptions of directional information in new
stories to guide the agent in the next round. Although new
References
Côté, M.-A.; Kádár, A.; Yuan, X.; Kybartas, B.; Barnes, T.;
Fine, E.; Moore, J.; Hausknecht, M.; Asri, L. E.; Adada, M.;
et al. 2018. Textworld: A learning environment for text-
based games. In Workshop on Computer Games, 41–75.
Springer.
Martin, L.; Ammanabrolu, P.; Wang, X.; Hancock, W.;
Singh, S.; Harrison, B.; and Riedl, M. 2018. Event Repre-
sentations for Automated Story Generation with Deep Neu-
ral Nets. Proceedings of the AAAI Conference on Artificial
Intelligence, 32(1).
Narasimhan, K.; Kulkarni, T.; and Barzilay, R. 2015. Lan-
guage Understanding for Text-based Games Using Deep Re-
inforcement Learning. arXiv:1506.08941.
OpenAI. 2022. ChatGPT: A Large-Scale Open-Domain
Chatbot. https://openai.com/blog/chatgpt/.
Peng, X.; Cui, C.; Zhou, W.; Jia, R.; and Riedl, M. 2023.
Story Shaping: Teaching Agents Human-like Behavior with
Stories. arXiv:2301.10107.
Peng, X.; Li, S.; Wiegreffe, S.; and Riedl, M. 2022a. Infer-
ring the Reader: Guiding Automated Story Generation with
Commonsense Reasoning. In Findings of the Association
for Computational Linguistics: EMNLP 2022, 7008–7029.
Peng, X.; Riedl, M.; and Ammanabrolu, P. 2022. In-
herently explainable reinforcement learning in natural lan-
guage. Advances in Neural Information Processing Systems,
35: 16178–16190.
Peng, X.; Xie, K.; Alabdulkarim, A.; Kayam, H.; Dani, S.;
and Riedl, M. 2022b. Guiding Neural Story Generation with
Reader Models. In Findings of the Association for Compu-
tational Linguistics: EMNLP 2022, 7087–7111.
Shridhar, M.; Yuan, X.; Côté, M.-A.; Bisk, Y.; Trischler, A.;
and Hausknecht, M. 2021. ALFWorld: Aligning Text and
Embodied Environments for Interactive Learning. In Pro-
ceedings of the International Conference on Learning Rep-
resentations (ICLR).
Wang, R.; Jansen, P.; Côté, M.-A.; and Ammanabrolu, P.
2022. ScienceWorld: Is your Agent Smarter than a 5th
Grader?
Zook, A.; Magerko, B.; and Riedl, M. 2011. Formally mod-
eling pretend object play. In Proceedings of the 8th ACM
Conference on Creativity and Cognition, 147–156.