ICCC2013 Proceedings
ICCC2013 Proceedings
creativity
international conference on
proceedings of the fourth
ICCC 2013
sydney australia
editors
mary lou maher
tony veale
rob saunders
oliver bown
Proceedings of the Fourth International Conference on
Computational Creativity
edited by
Mary Lou Maher, Tony Veale, Rob Saunders, Oliver Bown
ICCC 2013
sydney australia
Sydney, New South Wales, Australia
June 2013
Faculty of Architecture, Design and Planning
The University of Sydney
New South Wales
Australia
http://www.computationalcreativity.net/iccc2013/
First published 2013
TITLE: PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON
COMPUTATIONAL CREATIVITY
EDITORS: MARY LOU MAHER, TONY VEALE, ROB SAUNDERS, OLIVER BOWN
ISBN: 978-1-74210-317-4
About the cover: Designed by Rob Saunders. Made with Processing.
About the logo: Designed by Oliver Bown and Rob Saunders
About the photo: "Sydney Opera House HDR Sydney Australia" Hai Linh Truong, used under a
Creative Commons Attribution license: http://creativecommons.org/licenses/by/2.0/
Preface
The Fourth International Conference on Computational Creativity 2013 represents a growth and
maturity of a conference series that builds on a series of workshops held over ten years and the first
three international conferences: the first held in Portugal in 2010, the second held in Mexico in 2011,
and the third held in Ireland in 2012. The purpose of this conference series is to make a scientific
contribution to the field of computational creativity through discussion and publication on progress
in fully autonomous creative systems, modeling human and computational creativity, computational
support for human creativity, simulating creativity, and human/machine interaction in creative
endeavors. Contributions come from many relevant disciplines, including computer science,
artificial intelligence, engineering design, cognitive science, psychology, and art.
This year the conference received 65 paper submissions and 11 demonstration submissions. The
peer review process for paper submissions has two stages: In the first stage, all paper submissions
were reviewed by three members of the Program Committee. In the second stage, the anonymous
reviews were available for comment by all members of the Program Committee and the authors.
Decisions about paper acceptances were reviewed and approved by the Steering Committee and
decisions about demonstration acceptances were approved by the Organizing Committee. The
committees accepted 32 papers and included 8 Demonstrations from authors representing 13
countries: Australia, Canada, Finland, Germany, Ireland, Malta, Mexico, Portugal, Singapore, Spain,
Sweden, UK, and USA.
In order to provide a snapshot of current progress in computational creativity and a glimpse of
next steps, the conference invites and encourages two kinds of paper submissions: regular papers
addressing foundational issues, describing original research on creative systems development and
modeling, and position papers describing work-in-progress or research directions for computational
creativity. The conference includes a balance of the two: 20 regular papers and 12 position papers.
As in previous years, the conference also includes demonstrations in which conference attendees can
play with specific implementations of computational creativity. The conference is organized into
sessions that reflect the topics of interest this year: areas of creativity in which computation is
playing a significant role: music, visual art, poetry, and narrative; and theoretical contributions to
computational creativity: metaphor, computational evolution, creative processes, evaluating
computational creativity, collective and social creativity, and embodied creativity.
The collection of papers in this conference proceedings shows a maturity in the field through new
examples of computational creativity and theoretical advances in understanding generative systems
and evaluation of computational creativity. The conference series demonstrates success as we see
publications that build on the advances of previous years through references to papers published in
this conference series. We look forward to this publication providing the foundation for future
developments in computational creativity.
Mary Lou, Tony, Rob, and Ollie
June 2013
Conference Chairs
General Chair: Tony Veale, University College, Dublin, Ireland
Local Co-Chairs: Rob Saunders & Oliver Bown, University of Sydney, Australia
Program Chair: Mary Lou Maher, University of North Carolina, Charlotte, USA
Local Organizing Committee
Kazjon Grace, University of North Carolina, Charlotte, USA
Roger Dean, University of Western Sydney, Australia
Steering Committee
Amlcar Cardoso, University of Coimbra, Portugal
Simon Colton, Imperial College London, UK
Pablo Gervs, Universidad Complutense de Madrid, Spain
Nick Montfort, Massachusetts Institute of Technology, USA
Alison Pease, University of Edinburgh, UK
Rafael Prez y Prez, Autonomous Metropolitan University, Mxico
Graeme Ritchie, University of Aberdeen, UK
Rob Saunders, University of Sydney, Australia
Dan Ventura, Brigham Young University, USA
Tony Veale, University College, Dublin, Ireland
Geraint A. Wiggins, Queen Mary, University of London, UK
Program Committee
Alison Pease, University of Edinburgh, UK
Amlcar Cardoso, University of Coimbra, Portugal
Andres Gomez De Silva, Instituto Tecnolgico Autnomo de Mxico
Anna Jordanous, Kings College London, UK
Ashok Goel, Georgia Institute of Technology, USA
Dan Ventura, Brigham Young University, USA
David Brown, Worcester Polytechnic Institute, USA
David Moffat, Glasgow Caledonian University, UK
Diarmuid ODonoghue, National University of Ireland, Maynooth, Ireland
Douglas Fisher, Vanderbilt University, USA
Geraint Wiggins, Goldsmiths College, University of London, UK
Graeme Ritchie, University of Aberdeen, UK
Hannu Toivonen, University of Helsinki, Finland
Henry Lieberman, Massachusetts Institute of Technology, USA
John Barnden, The University of Birmingham, UK
John Gero, Krasnow George Mason University, USA
Jon McCormack, Monash University, Australia
Kazjon Grace, University of Sydney, Australia
Kyle Jennings, University of California, Berkeley
Mark Riedl, Georgia Institute of Technology, USA
Nick Bryan-Kinns, Queen Mary University of London, UK
Nick Montfort, Massachusetts Institute of Technology, USA
Oliver Bown, University of Sydney, Australia
Oliviero Stock, European Alliance for Innovation, Italy
Pablo Gervas, Universidad Complutense de Madrid, Spain
Paulo Gomes, University Of Coimbra, Portugal
Paulo Urbano, University of Lisboa, Portugal
Philippe Pasquier, Simon Frasier University, Canada
Rafael Prez Y Prez, Universidad Autnoma Metropolitana, Mexico
Ramon Lpezdemntaras, Spanish Council for Scientific Research, Spain
Ricardo Sosa, Singapore University of Technology and Design (SUTD), Singapore
Rob Saunders, University of Sydney, Australia
Robert Keller, Harvey Mudd College, USA
Ruli Manurung, University of Indonesia, Indonesia
Sarah Rauchas, Goldsmiths, University of London, UK
Simon Colton, Department of Computing, Imperial College, London, UK
Tony Veale, University College Dublin, Ireland
Win Burleson, Arizona State University, USA
Contents
Keynote: The Mechanics of Thought Trials
Arne Dietrich
Professor of Psychology
Department of Psychology
American University of Beirut
Session 1 Metaphor in Computational Creativity
Computationally Created Soundscapes with Audio Metaphor, Miles Thorogood and Philippe
Pasquier .............................................................................................................................................. 1
Generating Apt Metaphor Ideas for Pictorial Advertising, Ping Xiao and Josep Blat ...................... 8
Once More, With Feeling! Using Creative Affective Metaphors to Express Information Needs,
Tony Veale ....................................................................................................................................... 16
Session 2 Creativity via Computational Evolution
Evolving Figurative Images Using ExpressionBased Evolutionary Art, Joo Correia, Penousal
Machado, Juan Romero and Adrian Carballal ................................................................................. 24
Fitness Functions for Ant Colony Paintings, Penousal Machado and Hugo Amaro ....................... 32
Adaptation of an Autonomous Creative Evolutionary System for RealWorld Design Application
Based on Creative Cognition, Steve Dipaola, Kristin Carlson, Graeme McCaig, Sara Salevati and
Nathan Sorenson .............................................................................................................................. 40
Session 3 Creative Processes
A Computational Model of Analogical Reasoning in Dementia Care, Konstantinos Zachos and
Neil Maiden ...................................................................................................................................... 48
Transforming Exploratory Creativity with DeLeNoX, Antonios Liapis, Hctor P. Martnez, Julian
Togelius and Georgios N. Yannakakis ............................................................................................. 56
A Discussion on Serendipity in Creative Systems, Alison Pease, Simon Colton, Ramin Ramezani,
John Charnley and Kate Reed .......................................................................................................... 64
Session 4 Music
Considering Vertical and Horizontal Context in Corpusbased Generative Electronic Dance
Music, Arne Eigenfeldt and Philippe Pasquier ................................................................................. 72
Harmonising Melodies: Why Do We Add the Bass Line First? Raymond Whorley, Christophe
Rhodes, Geraint Wiggins and Marcus Pearce .................................................................................. 79
Automatical Composition of Lyrical Songs, Jukka M. Toivanen, Hannu Toivonen and Alessandro
Valitutti ............................................................................................................................................. 87
Implications from Music Generation for Music Appreciation, Amy K. Hoover, Paul A. Szerlip
and Kenneth O. Stanley .................................................................................................................... 92
Session 5 Visual Art
Autonomously Communicating Conceptual Knowledge Through Visual Art, Derrall Heath, David
Norton and Dan Ventura .................................................................................................................. 97
A Computer Model for the Generation of Visual Compositions, Rafael Perez Y Perez, Maria
Gonzalez de Cossio and Ivan Guerrero .......................................................................................... 105
Session 6 Computational Processes for Creativity
Learning How to Reinterpret Creative Problems, Kazjon Grace, John Gero and Rob
Saunders ......................................................................................................................................... 113
Computational Creativity in Naturalistic DecisionMaking, Magnus Jndel ............................... 118
Session 7 Evaluating Computational Creativity
Nobody's A Critic: On The Evaluation Of Creative Code Generators A Case Study In Video
Game Design, Michael Cook, Simon Colton and Jeremy Gow ..................................................... 123
A Model for Evaluating Interestingness in a ComputerGenerated Plot, Rafael Perez Y Perez and
Otoniel Ortiz ................................................................................................................................... 131
A Model of Heteroassociative Memory: Deciphering Surprising Features and Locations,
Shashank Bhatia and Stephan Chalup ............................................................................................ 139
Computational Models of Surprise as a Mechanism for Evaluating Creative Design, Mary Lou
Maher, Douglas Fisher and Kate Brady ......................................................................................... 147
Session 9 Poetry
Less Rhyme, More Reason: Knowledgebased Poetry Generation with Feeling, Insight and Wit,
Tony Veale ..................................................................................................................................... 152
Harnessing Constraint Programming for Poetry Composition, Jukka M. Toivanen, Matti
Jrvisalo and Hannu Toivonen ....................................................................................................... 160
Session 10 Narrative
Slant: A Blackboard System to Generate Plot, Figuration, and Narrative Discourse Aspects of
Stories, Nick Montfort, Rafael Prez Y Prez, D. Fox Harrell and Andrew Campana ................. 168
Using Theory Formation Techniques for the Invention of Fictional Concepts, Flaminia Cavallo,
Alison Pease, Jeremy Gow and Simon Colton ............................................................................... 176
e-Motion: A System for the Development of Creative Animatics, Santiago Negrete-Yankelevich
and Nora Morales-Zaragoza ........................................................................................................... 184
Session 11 Collective and Social Creativity
An Emerging Computational Model of Flow Spaces in Social Creativity and Learning, Shiona
Webster, Konstantinos Zachos and Neil Maiden ........................................................................... 189
Idea in a BottleA New Method for Creativity in Open Innovation, Matthias R. Guertler,
Christopher Muenzberg and Udo Lindemann ................................................................................ 194
Multilevel Computational Creativity, Ricardo Sosa and John Gero .............................................. 198
Session 12 Embodied Creativity
HumanRobot Interaction with Embodied Creative Systems, Rob Saunders, Emma Chee and
Petra Gemeinboeck ........................................................................................................................ 205
The Role of Motion Dynamics in Abstract Painting, Alexander Schubert and Katja
Mombaur ........................................................................................................................................ 210
Creative Machine Performance: Computational Creativity and Robotic Art, Petra Gemeinboeck
and Rob Saunders ........................................................................................................................... 215
Demonstrations
An Artificial Intelligence System to Mediate the Creation of Sound and Light Environments,
Claudio Benghi and Gloria Ronchi ................................................................................................ 220
Controlling Interactive Music Performance (CIM), Andrew Brown, Toby Gifford and Bradley
Voltz ............................................................................................................................................... 221
A Flowcharting System for Computational Creativity, Simon Colton and John Charnley ............ 222
A Rogue Dream: Web-Driven Theme Generation for Games, Michael Cook ............................... 223
A Puzzling Present: Code Modification for Game Mechanic Design, Michael Cook and Simon
Colton ............................................................................................................................................. 224
A Metapianist Serial Music Comproviser, Roger T. Dean .......................................................... 225
AssimilateCollaborative Narrative Construction, Damian Hills ................................................ 226
Breeding On Site, Tatsuo Unemi .................................................................................................... 227
A Fully Automatic Evolutionary Art, Tatsuo Unemi ...................................................................... 228
Computationally Created Soundscapes with Audio Metaphor
Miles Thorogood and Philippe Pasquier
School of Interactive Art and Technology
Simon Fraser University
Surrey, BC V3T0A3 CANADA
[email protected]
Abstract
Soundscape composition is the creative practice of pro-
cessing and combining sound recordings to evoke audi-
tory associations and memories within a listener. We
present Audio Metaphor, a system for creating novel
soundscape compositions. Audio Metaphor processes
natural language queries derived from Twitter for re-
trieving semantically linked sound recordings from on-
line user-contributed audio databases. We used a sim-
ple natural language processing to create audio le
search queries, and we segmented and classied au-
dio les based on general soundscape composition cate-
gories. We used our prototype implementation of Audio
Metaphor in two performances, seeding the system with
keywords of current relevance, and found that the sys-
tem produced a soundscape that reected Twitter activ-
ity and kept audiences engaged for more than an hour.
1 Introduction
Creativity is a preeminent attribute of the human condition
that is being actively explored in articial intelligence sys-
tems aiming at endowing machines with creative behaviours.
Articial creative systems have simulated or been inspired
by human creative processes, including, painting, poetry,
and music. The aim of these systems is to produce artifacts
that humans would judge as creative. Much of the successful
research in musical creative systems has focussed on sym-
bolic representations of music, often with corpora of musi-
cal scores. Alternatively, non-symbolic forms of music have
been little explored in as much detail.
Soundscape composition is a type of non-symbolic mu-
sic aimed to rouse listeners memories and associations of
soundscapes using sound recordings. A soundscape is the
audio environment perceived by a person in a given locale
at a given moment. A listener brings a soundscape to mind
with higher cognitive functions like template matching of
the perceived world with known sound environments and
deriving meaning from the triggered associations (Bottel-
dooren et al. 2011). People communicate their subjective
appraisal of soundscapes using natural language descrip-
tions, revealing the semiotic cues of soundscape experiences
(Dubois and Guastavino 2006).
Soundscape composition is the creative practice of pro-
cessing and combining sound recordings to evoke auditory
associations and memories within a listener. It is positioned
along a continuum with concrete music that uses found
sound recordings, and electro-acoustic music that uses more
abstracted types of sounds. Central to soundscape compo-
sition, is processing sound recordings. There are a range of
approaches to using sound recordings. One approach is to
portray a realistic place and time by using untreated audio
recordings, or, recordings with only minor editing (such as
cross-fades). Another is to evoke imaginary circumstances
by applying more intensive processing. In some cases,
these manufactured sound environments appear imaginary,
by the combination of largely untreated with more highly
processed sound recordings. For example, the soundscape
composition Island, by Canadian composer Barry Truax
(Truax 2009), adds a mysterious quality to a recognizable
sound environment by contrasting clearly discernible wave
sounds against less-recognizable background drone and tex-
ture sounds.
Soundscape composition requires many decisions about
selecting and cutting audio recordings and their artistic com-
bination. These processes become exceedingly time con-
suming for people when large amounts of audio data are
available, as is now the case with online databases. As such,
different generative soundscape composition systems have
automated many sub-procedures of the composition process,
but we have not found any systems in the literature to date
that use natural language processing for generative sound-
scape composition. Likewise, automatic audio segmentation
for soundscape composition specic categories is an area not
yet explored.
The system described here searches online for the most
recent Twitter posts about a small set of themes. Twitter pro-
vides an accessible platform for millions of discussions and
shared experiences through short text-based posts (Becker,
Naaman, and Gravano 2010). In our research, audio le
search queries are generated from natural language queries
derived from Twitter. However, these requests could be a
memory described by a user, a phrase from a book, or a sec-
tion of a research paper.
Audio Metaphor accepts a natural language query (NLQ),
which is made into audio le search queries by our algo-
rithm. The system searches online for audio les semanti-
cally related to word features in the NLQ. The resulting au-
dio le recommendations are classied and segmented based
Proceedings of the Fourth International Conference on Computational Creativity 2013 1
upon the soundscape categories background, foreground,
and background with foreground. A composition engine au-
tonomously processes and combines segmented audio les.
The title of Audio Metaphor refers to the idea that audio
representations of NL queries that the system generates may
not have literal associations. Although, in some cases, an
object referenced in the NL query may have a direct refer-
ential sound such as with raining outside that results in a
type of audio analogy. However, an example that is not as
direct such as, A brooding thought struck me down has no
such direct referent to an object in the world. In this latter
case, Audio Metaphor would create a composition by pro-
cessing sound recordings that have some semantic relation-
ship with words in the NL query. For example, the sound of
a storm and the percussive striking of an object are the types
of sounds that would be processed in this case.
Margret A. Boden actively proposes types of creativity
being synthesized by computational means (Boden 1998).
She states, that combinatorial type creativity involves novel
(improbable) combinations of familiar ideas ... wherein
newly associated ideas share some inherent conceptual
structure. The articial creative system here uses semantic
inference driven by NLQs as a way to frame the soundscape
composition and make use of semantic structures inherent
in crowdsourced systems. Further to this, the system asso-
ciates words with sound recordings for combining into novel
representations of texts. For this reason, the system is con-
sidered to exhibit combinatorial creative behaviour.
Our contribution is a creative and autonomous soundscape
composition system with a novel method of generating com-
positions from natural language input and crowd-sourced
sound recordings. Furthermore, we present a method of au-
dio le segmentation based on soundscape categories, and a
soundscape composition engine that contrasts sound record-
ing segments with different levels of processing.
We outline our research in the design of an autonomous
soundscape composition system called Audio Metaphor. In
the next section, we show the related works in the domains
of soundscape studies and generative soundscape composi-
tion. We go on to describe the system architecture, includ-
ing natural language processing, classication and segmen-
tation, and the soundscape composition engine. The system
is then disused in terms of a number of performances and
presentations. We conclude with our ideas for future work.
2 Related Work
Bircheld, Mattar, and Sundaram (2005) describe a system
that uses an adaptive user model for context-aware sound-
scape composition. In their work, the system has a small
set of hand-selected and hand-labelled audio recordings that
were autonomously mixed together with minimal process-
ing. Similarly, Eigenfeldt and Pasquier (2011) employ a
set of hand-selected and hand-labelled environmental sound
recordings for the retrieval of sounds from a database by au-
tonomous software agents. In their work, agents analyze au-
dio when selecting sounds to mix based on low-level audio
features. In both cases, listening and searching for selecting
audio les is very time consuming.
Search Query Generator
Audio File Segmentation
Soundscape Engine
Audio File
Recommendations
Freesound
Twitter NLQ User
Sourcing
Processing
Figure 1: Audio Metaphor system architecture overview.
A different approach to selecting and labelling sound
recordings is to take advantage of collaborative tagging
of online user-contributed collections of sound recordings.
This is a crowdsourcing process where a body of tags is pro-
duced collaboratively by human users connecting terms to
documents (Halpin, Robu, and Shepherd 2007). In online
environments, collaborative tags are part of a shared lan-
guage made manifest by users (Marlow et al. 2006). On-
line audio repositories such as pdSounds (Mobius 2009) and
Freesound (Akkermans et al. 2011) demonstrate collabora-
tive tagging systems applied to sound recordings.
A system that uses collaborative tags to retrieve sound
recordings is described by Janer, Roma, and Kersten (2011).
In their work, a user denes a soundscape composition by
entering locations on a map that has sounds tags associated
with various locations. As the user navigates the map, a
soundscape is produced. In related research, the locations
on a map are used as a composition environment (Finney and
Janer 2010). Their compositions use hand-selected sounds,
which are placed in close and far proximity based upon se-
mantic identiers derived from tags.
3 System Architecture
Audio Metaphor creates unique soundscape compositions
that represent the words in an NLQ using a series of pro-
cesses as follows:
Receive a NLQ from a user, or Twitter;
Transforms a NLQ into audio le search queries;
Search online for audio le recommendations;
Segment audio les into soundscape regions;
Process and combine audio segments for soundscape
composition.
In the Audio Metaphor system, these processes are han-
dled by sequentially as is shown in Figure 1.
1
1
A modular approach was taken for the system design. Ac-
cordingly, the system is exible to be used for separate objectives,
including, making audio le recommendations to a user from an
NLQ, and deriving a corpus of audio segments.
Proceedings of the Fourth International Conference on Computational Creativity 2013 2
rainy autumn day vancouver
rainy autumn day
autumn day vancouver
rainy autumn
autumn day
day vancouver
rainy
autumn
day
vancouver
Table 1: All sub-lists generated from a word-feature list
from the query On a rainy autumn day in Vancouver.
3.1 Audio File Retrieval Using Natural Language
Processing
The audio le recommendation module creates audio le
search queries given a natural language request and a maxi-
mumnumber of audio le recommendations for each search.
The Twitter web API (Twitter API ) is used to retrieve the
10 most recent posts related to a theme to nd current asso-
ciations. The longest of these posts is then used as a natural
language query. To generate audio le search queries, a list
of word features is extracted from the input text and gener-
ates a queue of all unique sublists. These sublists are used as
search queries, starting with the longest rst. The aim of the
algorithm is to minimize the number of audio les returned
and still represent all the word features in the list. When a
search query returns a positive result, all remaining queries
that contain any of the successful word features are removed
from the queue.
To extract the word features from the natural language
query, we use essentially the same method as that proposed
by Thorogood, Pasquier, and Eigenfeldt (2012), but with
some modications. The algorithm rst removes common
words listed in the Oxford English Dictionary Corpus, leav-
ing only nouns, verbs, and adjectives. Words are kept in
order and treated as a list. For example, with the word fea-
ture list from the natural language query The angry dog bit
the crying man, angry dog bit crying man, is more valid
than angry man bit crying dog.
The algorithm for generating audio le queries essentially
extracts all the sublists from the NLQ that have a length
greater than or equal to 1. For example, a simple request
such as On a rainy autumn day in Vancouver is rst pro-
cessed to extract the word feature list: rainy, autumn, day,
vancouver. After that, sub-lists are generated as shown in
Table 1.
Audio Metaphor accesses the Freesound audio reposi-
tory for audio les with the Freesound API. Freesound is
an online collaborative database with over 120,000 audio
clips. The indexed data includes user-entered descriptions
and tags. The content of the audio le is inferred from user-
contributed commentary and social tags. Although there is
no explicit user rating of audio les, a download counter for
each le provides a measure of its popularity, and search re-
sults are presented by descending popularity count.
The sublists are used to search online for semantically re-
lated audio les using an exclusive keyword search. Sub-
lists are used in the order created, from largest to small-
est. A search is considered successful when it returns one
or more recommendations. Additionally, the algorithm op-
timizes audio le recommendations by ignoring future sub-
lists that contain word features from a previously success-
ful search. The most favourable result is a recommendation
for the longest sub-list, with the worst case being no rec-
ommendations. In practice, the worst case is, typically, a
recommendation for each singleton word feature.
For each query, the URLs of the recommendations are
logged in a separate list. The list is constrained to a num-
ber specied at the system startup. Furthermore, if a list
has less than the number of les requested it is considered
sparsely populated and no further modication made to its
items. For example, if the maximum number of recommen-
dations specied for each query is ve, and there are two
queries where one returns nine recommendations and the
other three, the longer list will be constrained to ve, and
the empty items of the second list are ignored.
The separate lists of audio le recommendations are then
presented to the audio segmentation module.
3.2 Audio File Classication and Segmentation
Audio segmentation is an essential preprocessing step in
many audio applications (Foote 2000). In soundscape com-
position, a composer will choose background and fore-
ground sound regions to combine into new soundscapes.
Background and foreground sounds are general categories
that refer to a signals perceptual class. Background sounds
seem to come from farther away than foreground sounds or
occur often enough to belong to the aggregate of all sounds
that make up the background texture of a soundscape. This
is synonymous with a ubiquitous sound (Augoyard and
Torgue 2006): a sound that is diffuse, omnidirectional, con-
stant, and prone to sound absorption and reection factors
having an overall effect on the quality of the sound. Ur-
ban drones and the purring of machines are two examples
of ubiquitous or background sound. Conversely, foreground
sounds are typically heard standing out clearly against the
background. At any moment in a sound recording, there may
be either background sound, foreground sound, or a combi-
nation of both.
Segmenting an audio le is a process of listening to the
recording for salient features and cutting regions for later
use. To automate this process, we have designed an algo-
rithm to classify segments of an audio le and concatenate
neighbouring segments with the same label. An established
technique for classication of an audio recording is to use
a supervised machine learning algorithm trained with exam-
ples of classied recordings.
3.3 Audio Features Used for Segmentation
The classier models the generic soundscape categories
background, foreground, and background with foreground.
We use a vector of the low-level audio features total-
loudness, and the rst three mel-frequency cepstral coef-
cients (MFCC). These features reect the behaviour of the
human auditory system, which is an important aspect of
Proceedings of the Fourth International Conference on Computational Creativity 2013 3
soundscape studies. They are extracted at a frame-level from
an audio signal with a window of 23 ms and a step size of
11.5 ms using the Yaafe audio feature extraction software
package (Mathieu et al. 2010).
MFCC audio features represent the spectral characteris-
tics of a sound by a small number of coefcients calcu-
lated by the logarithm of the magnitude of a triangular lter
bank. We use an implementation of MFCC that builds a log-
arithmically spaced lter bank according to 40 coefcients
mapped along the perceptual Mel-scale by:
M(f) = 1127 log
1 +
f
700
(1)
where f is the frequency in Hz.
Total loudness is the characteristic of a sound associated
with the sensation of intensity. The human auditory system
affects the perception of intensity of different frequencies.
One model of loudness (Zwicker 1961) takes into account
the disparity of loudness at different frequencies along the
Bark scale, which corresponds to the rst 24 critical bands
of hearing. Bands near human speech frequencies have a
lower threshold than those of low and high frequencies. The
conversion from a frequency in Hz f to the equivalent fre-
quency in the Bark scale B is calculated with the following
formula (Traunmuller 1990).
B(f) = 13 arctan(0.00076f) +3.5 arctan
f
7500
2
(2)
Where f is the frequency in Hz. A specic loudness is
the loudness calculated at each Bark band; the total loud-
ness is the sum of individual specic loudnesses over all
bands. Because a soundscape is perceived by a human not
at the sample level, but over longer time periods, we use a
so called bag of frames approach (Aucouturier and Defre-
ville 2007) to account for longer signal durations. Essen-
tially, this kind of approach considers frames that represent
a signal have possibly different values, and the density dis-
tribution of frames provides a more effective representation
than a singular frame. Statistical methods, such as the mean
and standard deviation of features, recapitulate the texture of
an audio signal, and provides a more effective representation
than a single frame.
In our research, audio segments are represented with an
eight-dimensional feature vector of the means and standard
deviations from the total loudness and the rst 3 MFCC. The
mean and standard deviation of the feature vector models the
background, foreground, and background with foreground
soundscape categories well. For example, sounds distant
from the listener and considered background sound will typ-
ically have a smaller mean total loudness. Sounds that occur
often enough will have a smaller standard deviation of those
in foreground listening. MFCC takes into account the spec-
trum of the sound affected by its source placement in the
environment.
3.4 Supervised Classier Used for Segmentation
We used a Support Vector Machine classier (SVM) to
classify audio segments. SVMs have been used in envi-
ronmental sound classication problems, and consistently
demonstrated good classication accuracy. A SVM is a non-
probabilistic classier that learns optimal separating hyper-
planes in a higher dimensional space from the input. Typi-
cally, classication problems present non-linearly separable
data that can be mapped to a higher-dimensional space with
a kernel function. We use the C-support vector classication
(C-SVC) algorithm shown by Chang and Lin (2011). This
algorithm uses a radial basis function as a kernel, which is
suited to a vector with a small number of features and takes
into account the relation between class labels and attributes
being non-linear.
Training Corpus The classier was trained using feature
vectors from a pre-labelled corpus of audio segments. The
training corpus consists of 30 segments between 2 and 7 sec-
onds long. Audio segments were labelled from a consen-
sus vote by human subjects in an audio segment classica-
tion study. The study was conducted online through a web
browser. Audio was played to participants using an HTML5
audio player object. This player allowed participants to re-
peatedly listen to a segment. Depending on the browser soft-
ware, the audio format of segments was either MP3 at 196
kps, or Vorbis at an equivalent bit rate. Participants selected
a category from a set of radio buttons and each selection was
conrmed when the participant pressed a button to listen to
the next segment.
There were 15 unique participants in the study group from
Canada and the United States. Before the study started, an
example for each of the categories, background, foreground,
and background with foreground, was played, and a short
description of the categories was displayed. Participants
were asked to use headphones or audio monitors to listen
to segments. Each participant was asked to listen to the ran-
domly ordered soundscape corpus. On completing the study,
the participants classication results were uploaded into a
database for analysis.
The results of the study were used to label the record-
ings by a majority vote. Figure 2 shows the results of the
vote. Results of the vote gave the labelling to the recordings.
There are a total of 10 recordings for each of the categories.
A quantitative analysis of the voter results shows the av-
erage agreement of recordings for each category as fol-
lows: background 84.6% (SD=18.6%); foreground 77.0%
(SD=10.4%), and; background with foreground 76.2%
(SD=13.4%). The overall agreement was shown to be 79.3%
(SD=4.6%).
Classier Evaluation We evaluated the classier, using
the training corpus, with a 10-fold cross validation. The re-
sults summary is shown in Table 2. The classier achieved
an overall sample accuracy of 80%, which shows that the
classier was human competitive against the overall human
agreement statistic of 79.3%.
The kappa statistic is a chance-corrected measure show-
ing the accuracy of prediction among each k-fold model. A
kappa score of 0 means the classier is performing only as
well as chance; 1 implies a perfect agreement; and a kappa
score of .7 is generally considered satisfactory. The kappa
score of .7 in the results shows a good classication accuracy
was achieved using the described method.
Proceedings of the Fourth International Conference on Computational Creativity 2013 4
Figure 2: Audio classication vote results from human
participants for 30 sound recordings with three categories:
Background, Foreground, and Background with Foreground
(BaForound) sound.
Table 2: Summary of SVM classier with the mean and
standard deviation for features total loudness and 3 MFCC.
Correctly classied instances 24 80%
Incorrectly classied instances 6 20%
Kappa statistic 0.7
These performance measures are reected by the con-
fusion matrix in Table 3. All 10 of the audio segments
labelled background from the study were classied cor-
rectly. The remaining audio segments, labelled fore-
ground and background with foreground, were correctly
classied 7 out of 10 times, with the highest level of confu-
sion between these latter categories.
3.5 Background-Foreground Segmentation
In our segmentation method, we use a 500 ms sliding analy-
sis window with a hop size of 250 ms. We found that for our
application an analysis window of this length provided rea-
sonable information for the bag of frames approach and ran
with satisfactory computation time. The resulting feature
vector is classied and labelled as belonging to one of the
three categories. In order to create labelled regions of more
than one window, neighbouring windows with the same la-
bel are concatenated and the start and end time of the new
window are logged.
To demonstrate the segmentation algorithm, we used a 9
second audio le containing a linear combination of back-
ground, foreground, and background with foreground re-
gions. Figure 3. shows the ground truth with the solid
black line, and algorithm segmentation of the audio le with
background, foreground, and background with foreground
labelled regions applied. We use the SuperCollider3 soft-
ware package for visualizing the segmented waveform sc3.
This example shows concatenated segments labelled as re-
Table 3: Confusion matrix of SVM classier for the cate-
gories background (BG), foreground (FG), and background
with foreground (BgFg).
Bg Fg BgFg
10 0 0 Bg
0 7 3 Fg
1 2 7 BgFg
Figure 3: Segmentation of the audio le with ground-truth
regions (black line) and segmented regions Background
(dark-grey), Foreground (mid-grey), and Background with
Foreground (light-grey).
gions. One of the background with foreground segments
was misclassied resulting in a slightly longer foreground
region than the ground truth classication.
The audio les and the accompanying segmentation data
are then presented to the composition module.
3.6 Composition
The composition module creates a layered two-channel
soundscape composition by processing and combining clas-
sied audio segments. Each layer in the composition con-
sists of processed background, foreground, and background
with foreground sound recordings. Moreover, an agent-
based model is used in conjunction with a heuristic in or-
der to handle different sound recordings and mimic the de-
cisions of a human composer. Specically, we based this
heuristic from production notes for the soundscape compo-
sition Island, by Canadian composer Barry Truax. In these
production notes, Truax gives detailed information on how
sound recordings are effected, and the temporal arrangement
of sounds.
In our modelling of these processes, we chose to use
the rst page of the production notes, which corresponds
to around 2 minutes of the composition. Furthermore, we
framed the model to comply with the protocol of the seg-
Proceedings of the Fourth International Conference on Computational Creativity 2013 5
mentation labels and aesthetic evaluations by the authors. A
summary of the model is as follows:
Regions labelled background are played sequentially in
the order presented by the segmentation. They are pro-
cessed to form a dramatic textured background. This pro-
cessing is carried out by rst playing the region at 10% of
its original speed and applying a stereo time domain gran-
ular pitch shifter with ratios 1:0.5 (down an octave) and
1:0.667 (down a 5th). We added a Freeverb reverb (Smith
2010) with a room size of 0.25 to give the texture a more
spacious quality. A low pass lter with a cutoff frequency
at 800 Hz is used to obscure any persistent high end de-
tail. Finally, a slow spatialization is applied in the stereo
eld at a rate of 0.1 Hz.
Regions labelled foreground are chosen from the fore-
ground pool by a roll of the dice. They are played individ-
ually, separated by a period proportional to the duration of
the current region played t = d
.75
+d +C, where t is the
time between playing the next region, d is the duration
of the current region, and C is a constant controlling the
minimum duration between regions. In order to separate
them from the background texture, foreground regions are
processed by applying a band pass lter with a resonant
frequency 2,000 Hz and high Q value of 0.5. Finally, a
moderate spatialization is applied in the stereo eld at a
rate of .125 Hz.
Regions labelled background with foreground are slowly
faded in and out to evoke a mysterious quality to the
soundscape. They are chosen from the pool of regions
by a roll of the dice and are played for an arbitrarily cho-
sen duration of between 10 and 20 seconds. Regions with
a length less than the chosen duration are looped. In order
to achieve a separation from the background texture and
foreground sounds, regions are processed by applying a
band pass lter with a resonant frequency 8,000 Hz and
high Q value of 0.1. The addition of a Freeverb reverb
with a room size of 0.125 and a relatively fast spatial-
ization at a rate of 1 Hz was used to further add to the
mysterious quality of the sound.
This composition model is deployed individually by each
of agents of the system, who are responsible for process-
ing a different audio le. An agents decisions are, choosing
labelled regions of an audio recording, processing and com-
bining them in a layered soundscape composition according
to the composition model.
Because of the potentially large number of audio les
available to the system, and in order to limit the acoustic
density of a composition, a maximum number of agents are
specied on system start-up. If there are more audio le re-
sults than there are agents to handle them, the extra results
are ignored. Equally, if the number of results is smaller then
the number of agents, agents without tasks are temporarily
ignored.
An agent uses the region labels of the audio le to decide
which region to process. An audio le may have a number
of labelled regions. If there is no region of a type then that
type is ignored. The agent can play one of each types of
region simultaneously.
4 Qualitative Results
Audio Metaphor has been used in performance environ-
ments. In one case, the system was seeded with the words
nature, landscape, and environment. There were
roughly 150 people in the audience. They were told that
the system was responding to live Twitter posts and shown
the console output of the search results. During the per-
formance, there was an earthquake off the coast of British
Columbia, Canada, and the current Twitter posts focused
on news of the earthquake. Audio Metaphor used these as
natural language requests, searched online for sound record-
ings related to earthquakes, and created a soundscape com-
position. The sound recordings processed by the system in-
cluded an earthquake warning announcement, the sound of
alarms, and a background texture of heavy destruction. The
audience reacted by checking to see if this event was indeed
real. This illustrated how the semantic space of the sound-
scape composition effectively maps to the concepts of a nat-
ural language request.
In a separate performance, Audio Metaphor was presented
to a small group of artists and academics. This took place
during the height of the 2012 conict in Syria, and the sys-
tem was seeded with the words Syria, Egypt, and con-
ict. The soundscape composition presented segments of
spoken word, traditional instruments, and other sounds. The
audience listened to the composition for over an hour with-
out losing its engagement with the listening experience. One
comment was, It was really good, and we didnt get bored.
The sounds held peoples attention because they were linked
to current events, and the processing of sound recordings
added to the interest of the composition.
Because the composition model deployed in Audio
Metaphor is based of a relatively short section of a composi-
tion, there was not a great deal of variation in the processing
of sound recordings. The fact that people were engaged for
such long periods of time suggests that other factors con-
tributed to the novel stimulus. Our nascent hypothesis is
that the dynamic audio signal of recordings, in addition to
the processing of audio les contributed to listeners ongoing
engagement.
2
5 Conclusions and Future Work
We describe a soundscape composition engine that chooses
audio segments using natural language queries, segments
and classies the resulting les, processes them, and com-
bines them into a soundscape composition at interactive
speeds. This implementation uses current Twitter posts as
natural language queries to generate search queries and re-
trieves audio les that are semantically linked to queries
from the Freesound audio repository.
The ability of Audio Metaphor to respond to current
events was shown to be a strong point in audience engage-
ment. The presence of signier sounds evoked listeners as-
sociations of concepts. Listener engagement was further re-
inforced through the artistic processing and combination of
sound recordings.
2
Sound examples of Audio Metaphor using the composition en-
gine can be found at http://www.audiometaphor.ca/aume
Proceedings of the Fourth International Conference on Computational Creativity 2013 6
Audio Metaphor can be used to help sound artists and
autonomous systems retrieve and cut sound eld record-
ings from online audio repositories. Although, its pri-
mary function, as we have demonstrated, is autonomous
machine generated soundscapes for performance environ-
ments and installations. In the future, we will evaluate peo-
ples response to these compositions by distributing them to
user-contributed music repositories and analyzing user com-
ments. These comments can then be used to inform the Au-
dio Metaphor soundscape composition engine.
Although the systemgenerates engaging and novel sound-
scape compositions, the composition structure is tightly reg-
ulated by the handling of background and foreground seg-
ments. In future work, we aim toward equipping our sys-
tem with the ability to evaluate its audio output, in order to
make more in-depth composition decisions. By developing
these methods, Audio Metaphor will be not only be capable
of processing audio les to create novel compositions, but,
additionally, be able to respond to the compositions it has
made.
6 Acknowledgments
This research was funded by a grant from the Natural Sci-
ences and Engineering Research Council of Canada. The
authors would also like to thank Barry Truax for his compo-
sition and production documentation.
References
Akkermans, V.; Font, F.; Funollet, J.; de Jong, B.; Roma, G.;
Togias, S.; and Serra, X. 2011. Freesound 2: An Improved
Platform for Sharing Audio Clips. In International Society
for Music Information Retrieval Conference.
Aucouturier, J.-J., and Defreville, B. 2007. Sounds like a
park: A computational technique to recognize soundscapes
holistically, without source identication. 19th International
Congress on Acoustics.
Augoyard, J., and Torgue, H. 2006. Sonic Experience:
A Guide to Everyday Sounds. McGill-Queens University
Press.
Becker, H.; Naaman, M.; and Gravano, L. 2010. Learning
similarity metrics for event identication in social media. In
Proceedings of the third ACM international conference on
Web search and data mining, WSDM 10, 291300. New
York, NY, USA: ACM.
Bircheld, D.; Mattar, N.; and Sundaram, H. 2005. Design
of a generative model for soundscape creation. In Interna-
tional Computer Music Conference.
Boden, M. A. 1998. Creativity and articial intelligence.
Articial Intelligence 103(12):347 356.
Botteldooren, D.; Lavandier, C.; Preis, A.; Dubois, D.; As-
puru, I.; Guastavino, C.; Brown, L.; Nilsson, M.; and An-
dringa, T. C. 2011. Understanding urban and natural sound-
scapes. In ForumAcusticum, 20472052. European Accous-
tics Association (EAA).
Chang, C.-C., and Lin, C.-J. 2011. LIBSVM: A library for
support vector machines. ACM Transactions on Intelligent
Systems and Technology 2:27:127:27. Software available
at http://www.csie.ntu.edu.tw/ cjlin/libsvm.
Dubois, D., and Guastavino, C. 2006. In search for sound-
scape indicators : Physical descriptions of semantic cate-
gories.
Eigenfeldt, A., and Pasquier, P. 2011. Negotiated content:
Generative soundscape composition by autonomous musi-
cal agents in coming together: Freesound. In Proceedings
of the Second International Conference on Computational
Creativity, 2732. M exico City, M exico: ICCC.
Finney, N., and Janer, J. 2010. Soundscape Generation
for Virtual Environments using Community-Provided Au-
dio Databases. In W3C Workshop: Augmented Reality on
the Web.
Foote, J. 2000. Automatic audio segmentation using a mea-
sure of audio novelty. In Multimedia and Expo, 2000. ICME
2000. 2000 IEEE International Conference on, volume 1,
452 455 vol.1.
Halpin, H.; Robu, V.; and Shepherd, H. 2007. The complex
dynamics of collaborative tagging. In Proceedings of the
16th international conference on World Wide Web, WWW
07, 211220. New York, NY, USA: ACM.
Janer, J.; Roma, G.; and Kersten, S. 2011. Authoring aug-
mented soundscapes with user-contributed content. In IS-
MAR Workshop on Authoring Solutions for Augmented Re-
ality.
Marlow, C.; Naaman, M.; Boyd, D.; and Davis, M. 2006.
Ht06, tagging paper, taxonomy, ickr, academic article, to
read. In Proceedings of the seventeenth conference on Hy-
pertext and hypermedia, HYPERTEXT 06, 3140. New
York, NY, USA: ACM.
Mathieu, B.; Essid, S.; Fillon, T.; J.Prado; and G.Richard.
2010. YAAFE, an Easy to Use and Efcient Audio Feature
Extraction Software. In Proceedings of the 2010 Interna-
tional Society for Music Information Retrieval Conference
(ISMIR). Utrecht, Netherlands: ISMIR.
Mobius, S. 2009. pdSounds. Available online at
http://www.pdsounds.org/; visited on April 12th 2012.
Smith, J. O. 2010. Physical Audio Signal Processing. W3K
Publishing. online book.
Thorogood, M.; Pasquier, P.; and Eigenfeldt, A. 2012. Au-
dio metaphor: Audio information retrieval for soundscape
composition. In Proceedings of the 6th Sound and Music
Computing Conference.
Traunmuller, H. 1990. Analytical expressions for the tono-
topic sensory scale. The Journal of the Acoustical Society of
America 88(1):97100.
Truax, B. 2009. Island. In Soundscape Composition DVD.
DVD-ROM (CSR-DVD 0901). Cambridge Street Publish-
ing.
Twitter API. Available online at
https://dev.twitter.com/docs/; visited on April 12th 2012.
Zwicker, E. 1961. Subdivision of the Audible Frequency
Range into Critical Bands (Frequenzgruppen). The Journal
of the Acoustical Society of America 33(2):248.
Proceedings of the Fourth International Conference on Computational Creativity 2013 7
Generating Apt Metaphor Ideas for Pictorial Advertisements
Ping Xiao, Josep Blat
Department of Information and Communication Technologies
Pompeu Fabra University
C./ Tnger, 122-140 Barcelona 08018 Spain
{ping.xiao, josep.blat}@upf.edu
Abstract
Pictorial metaphor is a popular way of expression in
creative advertising. It attributes certain desirable quali-
ty to the advertised product. We adopt a general two-
stage computational approach in order to generate apt
metaphor ideas for pictorial advertisements. The first
stage looks for concepts which have high imageability
and the selling premise as one of their prototypical
properties. The second stage evaluates the aptness of
the candidate vehicles (found in the first stage) in re-
gard to four metrics, including affect polarity, salience,
secondary attributes and similarity with tenor. These
four metrics are conceived based on the general charac-
teristics of metaphor and its specialty in advertisements.
We developed a knowledge extraction method for the
first stage and utilized an affect lexicon and two seman-
tic relatedness measures to implement the aptness me-
trics of the second stage. The capacity of our computer
program is demonstrated in a task of reproducing the
pictorial metaphor ideas used in three real advertise-
ments. All the three original metaphors were replicated,
as well as a few other vehicles recommended, which,
we consider, would make effective advertisements as
well.
Introduction
A pictorial advertisement is a short discourse about the
advertised product, service or idea (all referred to as prod-
uct afterwards). Its core message, namely the selling pre-
mise, is a proposition that attributes certain desirable quali-
ty to the product (Maes and Schilperoord 2008). A single
proposition can be expressed virtually in an unlimited
number of ways, among which some are more effective
than the others. The how to say of an ad is conventionally
called the idea. Pictorial metaphor is the most popular
way of expression in creative advertising (Goldenberg,
Mazursky and Solomon 1999). A pictorial metaphor in-
volves two dimensions, structural and conceptual
(Forceville 1996; Phillips and McQuarrie 2004; Maes and
Schilperoord 2008). The structural dimension concerns
how visual elements are arranged in a 2D space. The con-
ceptual dimension deals with the semantics of the visual
elements and how they together construct a coherent mes-
sage. We see that the operations in the structural and con-
ceptual dimensions are quite different issues. In any of
these two dimensions, computational creativity is not a
trivial issue. In this paper, we are focusing on only one
dimension, the conceptual one.
The conceptual dimension of pictorial metaphors is not
very different from verbal metaphors (Foss 2005). A meta-
phor involves two concepts, namely tenor and vehicle.
The best acknowledged effect of metaphor is highlighting
certain aspect of the tenor or introducing some new infor-
mation about the tenor. Numerous theories have been pro-
posed to account for how metaphors work. The interaction
view is the dominant view of metaphor, which we also
follow. It was heralded by Richards (1936) and further
developed by Black (1962). According to Black, the prin-
cipal and subsidiary subjects of metaphor are regarded as
two systems of associated commonplaces (commonsense
knowledge about the tenor and vehicle). Metaphor works
by applying the system of associated commonplaces of the
subsidiary subject to the principal subject, to construct a
corresponding system of implications about the principal
subject. Any associated commonplaces of the principal
subject that conform the system of associated common-
places of the subsidiary subject will be emphasized, and
any that does not will be suppressed. In addition, our view
of the subsidiary subject is also altered.
Besides theories, more concrete models have been pro-
posed, mainly the salience imbalance model (Ortony
1979), the domain interaction model (Tourangeau and
Sternberg 1982), the structure mapping model (Gentner
1983; Gentner and Clement 1988), the class inclusion
model (Gluckberg and Keysar 1990, 1993) and the con-
ceptual scaffolding and sapper model (Veale and Keane
1992; Veale, ODonoghue and Keane 1995). Further-
more, these models suggest what make good metaphors,
i.e. metaphor aptness, which is defined as the extent to
which a comparison captures important features of the top-
ic (Chiappe and Kennedy 1999).
In the rest of this paper, we first specify the problem of
generating apt metaphor ideas for pictorial advertisements.
Then, the relevant computational approaches in the litera-
ture are reviewed. Next, we introduce our approach to the
stated problem and the details of our implementation. Sub-
sequently, an experiment with the aim of reproducing three
pictorial metaphors used in real advertisements and the
Proceedings of the Fourth International Conference on Computational Creativity 2013 8
results generated by our computer program are demon-
strated. In the end, we conclude the work presented in this
paper and give suggestion about future work.
Problem Statement
The whole range of non-literal comparison, from mere-
appearance to analogy (in the terms of Gentner and Mark-
man (1997)), is featured in pictorial advertisements. But,
analogies are rare. What appear most frequently are meta-
phors with the mapping of a few attributes or relations.
This type of pictorial metaphors is the target of this paper.
To generate pictorial metaphors for advertisements, our
specific problem is searching for concepts (vehicles), given
the product (tenor), its selling premise (the property con-
cept) and some other constraints specified in an advertising
brief. The metaphor vehicles generated have to be easy to
visualize and able to establish or strengthen the connection
between the product and the selling premise.
There are two notes specific to advertisements that we
would like to mention. One is about the tenor of metaphor.
In pictorial ads, not only the product, but also the internal
components of the product and the objects that interact
with it are often used as tenors (Goldenberg, Mazursky
and Solomon 1999). The other note is about the selling
premise. Metaphors in advertisements are more relevant to
communicating intangible, abstract qualities than talking
about concrete product facts (Phillips and McQuarrie
2009). Therefore, we are primarily considering abstract
selling premises in this paper. In the next section, we re-
view the computational approaches to metaphor generation
that are related to the problem just stated.
Computational Approaches to Metaphor
Generation
Abe, Sakamoto and Nakagawa (2006) employed a three-
layer feedforward neural network to transform adjective-
modified nouns, e.g. young, innocent, and fine character
into A like B style metaphors, e.g. the character is like a
child. The nodes of the input layer correspond to a noun
and three adjectives. The nodes of the hidden layer corres-
pond to the latent semantic classes obtained by a probabil-
istic latent semantic indexing method (Kameya and Sato
2005). A semantic class refers to a set of semantically re-
lated words. Activation of the input layer is transferred to
the semantic classes (and the words in each class) of the
hidden layer. In the output layer, the words that receive
most activation (from different semantic classes) become
metaphor vehicles. In effect, this method outputs concepts
that are the intermediates between the semantic classes to
which the input noun and three adjectives are strongly as-
sociated. If they are associated to different semantic
classes, this method produces irrelevant and hard to visual-
ize vehicles.
A variation of the above model was created by Terai and
Nakagawa (2009), who made use of a recurrent neural
network to explicitly implement feature interaction. It dif-
fers with the previous model at the input layer, where each
feature node has bidirectional edge with every other feature
node. The performance of these two models was compared
in an experiment of generating metaphors for two tenors.
The model with feature interaction produced better results.
Besides, Terai and Nakagawa (2010) proposed a method
of evaluating the aptness of metaphor vehicles generated
by the aforementioned two computational models. A can-
didate vehicle is judged based on the semantic similarity
between the corresponding generated metaphor and the
input expression. A candidate vehicle is more apt when the
meaning of the corresponding metaphor is closer to the
input expression. The semantic similarity is calculated
based on the same language model used in the metaphor
generation process. The proposed aptness measure was
tested in an experiment of generating metaphors for one
input expression, which demonstrated that it improved the
aptness of generated metaphors.
Veale and Hao (2007) created a system called Sardoni-
cus which can both understand and generate property-
attribution metaphors. Sardonicus takes advantage of a
knowledge base of entities (nouns) and their most salient
properties (adjectives). This knowledge base is acquired
from the web using linguistic patterns like as ADJ as *
and as * as a/an NOUN. To generate metaphors, Sardoni-
cus searches the knowledge base for nouns that are asso-
ciated with the intended property. The aptness of the found
nouns is assessed according to the category inclusion
theory, i.e. only those noun categories that can meaning-
fully include the tenor as a member should be considered
as potential vehicles. For each found noun, a query in the
format vehicle-like tenor is sent through a search engine.
If there are more than zero results returned, the noun is
considered an apt vehicle. Otherwise, it is considered not
apt or extremely novel.
The above reviewed effort of generating metaphor con-
verges at a two-stage approach. These two stages are:
Stage 1: Search for concepts that are salient in the
property to be highlighted
Stage 2: Evaluate the aptness of the found concepts
as metaphor vehicles
This two-stage approach of metaphor generation is adopted
by us. We provide methods of searching and evaluating
metaphor vehicles, which are different from the literature.
In addition, special consideration is given to the aptness of
metaphor in the advertising context.
An Approach of Generating Apt Metaphor
Ideas for Pictorial Advertisements
We adopt a general two-stage computational approach of
metaphor generation (as introduced in the last section) to
generate apt metaphor ideas for pictorial advertisements.
At the first stage, we look for concepts which have high
Imageability (Paivio, Yuille and Madigan 1968; Toglia,
and Battig 1978) and the selling premise as one of their
prototypical properties. At the second stage, we evaluate
the aptness of the candidate vehicles using four metrics,
including affect polarity, salience, secondary attributes and
Proceedings of the Fourth International Conference on Computational Creativity 2013 9
similarity with tenor. Vehicles that are validated by all the
four metrics are considered apt for a specific advertising
task. In the following sections, we explain the rationale of
our approach and its computational details.
Stage 1: Search Candidate Metaphor Vehicles
To find entities which have the selling premise as one of
their prototypical properties, our strategy is searching for
concepts that are strong semantic associations of the selling
premise. One note to mention is that the concepts sought-
after do not need to be the absolute associations, because
the meaning of a metaphor, i.e. which aspect of the tenor
and vehicle becomes prominent, does not only depend on
the vehicle, but on the interaction between the tenor and
vehicle. In the past, we developed an automatic knowledge
extraction system, namely VRAC (Visual Representations
for Abstract Concepts), for providing concepts of physical
entities to represent abstract concepts (Xiao and Blat
2011). Here we give a brief introduction of this work.
We look for semantic associations in three knowledge
bases, including word association databases (Kiss,
Armstrong, Milroy and Piper 1973; Nelson, McEvoy and
Schreiber 1998), a commonsense knowledge base called
ConceptNet (Liu and Singh 2004) and Rogets Thesaurus
(Roget 1852). The reason for using three of them is that we
want to take use of the sum of their capacity, in terms of
both the vocabulary and the type of content. The nature of
these three knowledge bases ensures that the retrieved con-
cepts have close association with the selling premise.
Vehicles of pictorial metaphors should have high im-
ageability, in order to be easily visualized in advertise-
ments. Imageability refers to how easy a piece of text eli-
cits mental image of its referent. It is usually measured in
psychological experiments. The available data about word
imageability, at the scale of thousands, does not satisfy our
need of handling arbitrary words and phrases. As imagea-
bility is highly correlated with word concreteness, we de-
veloped a method of estimating concreteness using the
ontological relations in WordNet (Fellbaum 1998), as an
approximation of imageability.
To evaluate the capacity of VRAC, we collected thirty-
eight distinct visual representations of six abstract concepts
used in past successful advertisements. These abstract con-
cepts have varied parts of speech and word usage frequen-
cy. We checked if these visual representations were in-
cluded in the concepts output by VRAC, with the corres-
ponding abstract concept as input. On average, VRAC
achieved a hit rate of 57.8%. The concepts suggested by
VRAC are mostly single objects. It lacks the concepts of
scenes or emergent cultural symbols, which also play a role
in mass visual communication.
Stage 2: Evaluate the Aptness of Candidate
Vehicles
The aptness of the candidate vehicles generated in Stage 1
is evaluated based on four metrics, including affect polari-
ty, salience, secondary attributes and similarity with tenor.
Affect Polarity Most of the time, concepts with negative
emotions are avoided in advertising (Kohli and Labahn,
1997; Amos, Holmes and Strutton 2008). Even in provoca-
tive advertisements, negative concepts are deployed with
extreme caution (De Pelsmacker and Van Den Bergh 1996;
Vzina and Paul 1997; Andersson, Hedelin, Nilsson and
Welander 2004). In fact, negative concepts are often dis-
carded at the first place (Kohli and Labahn 1997). There-
fore, we separate candidate vehicles having negative impli-
cation from the ones having positive or neutral implication.
For this purpose, affective lexicons, which provide affect
polarity values of concepts, come in handy. We decided to
use SentiWordNet 3.0 (Baccianella, Esuli and Sebastiani
2010), due to its big coverage (56,200 entries) and fine
grained values. It provides both the positive and negative
valences, which are real values ranging from 0.0 to 1.0. If a
candidate vehicle is found in SemtiWordNet 3.0, its affect
polarity is calculated by subtracting the negative valence
from the positive valence. The candidate vehicles which
are not included in SemtiWordNet 3.0 are considered being
emotionally neutral.
Salience Salience refers to how strongly a symbol evokes
certain meaning in humans mind. The candidate vehicles
found by VRAC have varying association strength with the
selling premise, from very strong to less. The vehicle of a
metaphor has to be more salient in the intended property
than the tenor (Ortony 1979; Glucksberg and Keysar
1990). We interpret salience as a kind of semantic related-
ness (Budanitsky and Hirst 2006), which reflects how far
two concepts are in the conceptual space of a society. We
calculate the semantic relatedness between each candidate
vehicle and the selling premise, and between the product
and the selling premise. Candidate vehicles that are more
remote from the selling premise than the product are dis-
carded. We will talk more about semantic relatedness and
the specific measures we used in a later section.
Secondary Attributes Metaphors that capture the appro-
priate number of relevant features are considered especially
apt (Glucksberg and Keysar 1990, 1993; Chiappe and
Kennedy 1999). Phillips (1997) found that strong implica-
tures as well as weak implicatures were drawn from pic-
torial advertisements. Strong implicatures correspond to
the selling premise of an ad, while we use secondary
attributes for referring to the weak implicatures. We have
not seen literature on the salience of the secondary
attributes in metaphor vehicles. We think the candidate
vehicles should, at least, not contradict the secondary
attributes prescribed to a product. For this end, we use a
semantic relatedness measure to filter candidate vehicles
that are very distant from the secondary attributes. This is
soft filtering, in contrast to the hard filtering used in the
previous two metrics, i.e. affect polarity and salience, in
the sense that the current criterion might need be tighten in
order to ensure the aptness of generated metaphors.
We compare the above approach with an alternative,
which is using both the selling premise and the secondary
attributes to search for candidate vehicles. This alternative
Proceedings of the Fourth International Conference on Computational Creativity 2013 10
method indeed looks for concepts that are salient in all
these properties. This is possible, but rare. Most of the
time, no result will be returned. On the other hand, there is
a natural distinction of priority in the attributes (for a prod-
uct) desired by advertisers (recall the strong and weak im-
plicatures just mentioned). To represent this distinction,
weighting of attributes is necessary.
The computational model proposed by Terai and Naka-
gawa (2009) also uses multiple features to generate meta-
phors. The weights of the edges connecting the feature
nodes in the input layer vary with the tenor. Specifically,
the weight of an edge equals to the correlation coefficient
between the two features respecting the tenor. The calcula-
tion is based on a statistic language model built on a Japa-
nese corpus (Kameya and Sato 2005), which means the
weighting of features (of a tenor) is intended to be near
reality. However, this idea does not suit advertising, be-
cause the features attributed to a product are much more
arbitrary. Very often, a product is not thought possessing
those features before the appearance of an advertisement.
Similarity with Tenor Good metaphors are those whose
tenor and vehicle are not too different yet not too similar to
each other (Aristotle 1924; Tourangeau and Sternberg
1981; Marschark, Kats and Paivio 1983). For this reason,
we calculate the semantic relatedness between the product
and each candidate vehicle. Firstly, candidate vehicles
which have zero or negative semantic relatedness values
are discarded, because they are considered too dissimilar to
the product. Then, the candidate vehicles with positive
relatedness values are sorted in the descending order of
relatedness. Among this series of values, we look for val-
ues that are noticeably different from the next value, i.e.
turning points. Turning points divide relatedness values
into groups. We use the discrete gradient to measure the
change of value, and take the value with the biggest change
as the turning point. Candidate vehicles with their related-
ness value bigger than or equal to the turning point are
abandoned, for being too similar to the tenor. Figure 1
shows the sorted relatedness values between the candidate
vehicles and the tenor child in the ad of the National Mu-
seum of Science and Technology. The turning point in this
graph corresponds to the concept head.
Semantic Relatedness Measures In general, semantic
relatedness is measured through distance metrics in certain
materialized conceptual space, such as knowledge bases
and raw text. A number of semantic relatedness measures
have been proposed. Each measure has its own merits and
weakness. We employed two different measures in the
current work, including PMI-IR (Pointwise Mutual Infor-
mation and Information Retrieval) (Turney 2001) and LSA
through Random Indexing (Kanerva, Kristofersson and
Holst 2000). PMI-IR is used to compute salience, because
we found it gives more accurate results than other available
measures when dealing with concept pairs of high semantic
relatedness. The relatedness between the selling premise
and candidate vehicles is deemed high. Therefore, we use
Similarity between Candidate Vehicles and 'Child'
0
0,02
0,04
0,06
0,08
0,1
0,12
0,14
0,16
h
a
l
f
h
e
a
d
c
a
r
u
p
t
a
k
e
E
i
n
s
t
e
i
n
l
o
a
f
b
u
t
t
o
n
h
e
a
d
p
i
e
c
e
m
a
n
k
i
n
d
a
l
i
e
n
s
a
g
e
b
r
a
i
n
p
a
n
h
i
g
h
b
r
o
w
c
h
e
s
s
o
w
l
r
e
a
d
e
r
s
e
r
p
e
n
t
c
e
r
e
b
r
u
m
p
r
o
f
e
s
s
o
r
Candidate Vehicles
S
i
m
i
l
a
r
i
t
y
w
i
t
h
'
C
h
i
l
d
'
Figure 1: Similarity between candidate vehicles and 'Child'
PMI-IR to give a delicate ordering of their association
strength. LSA is employed for the metrics of secondary
attributes and similarity with tenor. The motivation behind
this choice is to capitalize on LSAs ability of indirect
inference (Landauer and Dumais 1997), i.e. discovering
connection between terms which do not co-occur. Recall
that candidate vehicles are assumed to have strong associa-
tion with the selling premise, but not necessarily the sec-
ondary attributes. In most cases, the association between a
candidate vehicle and a secondary attribute is not high.
Thus, we need a measure which is sensitive to the low-
range semantic relatedness. LSA has demonstrated capaci-
ty in this respect (Waltinger, Cramer and Wandmacher
2009). For LSA, values close to 1.0 indicate very similar
concepts, while values close to 0.0 and under 0.0 indicate
very dissimilar concepts. In our computer program, we
utilize the implementation of Random Indexing provided
by the Semantic Vectors package
1
. Two-hundred term vec-
tors are acquired from the LSA process for computing se-
mantic relatedness. In the present work, both PMI-IR and
LSA are based on the Wikipedia corpus, an online encyc-
lopedia of millions of articles. We obtained the English
Wikipedia dumps, offered by the Wikimedia Foundation
2
on October 10th, 2011. The compressed version of this
resource is about seven gigabytes.
An Example
We intend to evaluate our approach of generating apt me-
taphor ideas for pictorial advertisements based on checking
whether this approach can reproduce the pictorial meta-
phors used in past successful advertisements. We have
been collecting a number of real ads and the information
about the product, selling premise, secondary attributes,
and the tenor and vehicle of metaphor in these ads. None-
theless, it is a tedious process.
1
http://code.google.com/p/semanticvectors/
2
http://download.wikipedia.org/
Proceedings of the Fourth International Conference on Computational Creativity 2013 11
Rank Vehicle Rank Vehicle
1 IQ 19 reader
2 Mensa 20 child
3 brain 21 sage
4 computer 22 serpent
5 cerebrum 23 owl
6 alien 24 car
7 mankind 25 whale
8 highbrow 26 horse
9 Einstein 27 pig
10 head 38 half
11 professor 29 needle
12 dolphin 30 button
13 chess 31 table
14 lecturer 32 uptake
15 geek 33 storey
16 headpiece 34 loaf
17 newspaper 35 brainpan
18 atheist 36 latitudinarian
In this paper, we use the information of three real ads to
show what our computer program generates. These three
ads are for the Volvo S80 car, The Economist newspaper
and the National Museum of Science and Technology in
Stockholm respectively. Each of them has a pictorial meta-
phor as its center of expression. All the three ads have the
same selling premise: 'intelligence'. However, three differ-
ent vehicles are used, including chess, brain and Eins-
tein respectively. The selection of these particular ads
aims at testing whether our aptness metrics are able to dif-
ferentiate different tenors.
Table 1 summarizes the three aspects of the three ads,
including product, secondary attributes and the tenor of
metaphor. For both of the car and newspaper ads, the te-
nors of metaphor are the products. For the museum ad, the
tenor is the target consumer, children.
We found the secondary attributes of the Volvo S80 car
in its product introduction
3
. For the other two ads, the
Economist newspaper and the National Museum of
Science and Technology, we have not found any secondary
attributes specified. Instead, their subject matter is used to
distinguish them from the products of the same categories
Furthermore, we think it is more accurate to use the
Boolean operations AND and OR in describing the rela-
tion between multiple secondary attributes. As conse-
quence, candidate vehicles have to be reasonably related to
both attributes at the two sides of AND; at least one of the
two attributes connected by OR.
Product Secondary Attributes Tenor
car
4
elegance AND luxury AND
sophisticated
car
newspaper
5
international politics OR
business news
newspaper
museum
6
science OR technology child
Table 1: Information about the three real ads
For the concept 'intelligence', VRAC provides eighty-
seven candidate vehicles, including single words and
phrases. We keep the single-word concepts and extract the
core concept of a phrase, in order to reduce the complexity
of calculating the aptness metrics at the later stage. An
example of the core concept of a phrase is the word owl
in the phrase wise as an owl. The core concepts are ex-
tracted automatically based on syntactic rules. This process
introduces noise, i.e. concepts not related to 'intelligence',
such as 'needle' of the phrase 'sharp as a needle' and 'button'
of the phrase 'bright as a button'. In total, there are thirty-
four candidate vehicles of single words. All the three me-
taphor vehicles used in the three real ads are included.
3
http://www.volvocars.com/us/all-cars/volvo-s80/pages/5-
things.aspx, retrieved on April 1st, 2012.
4
http://adsoftheworld.com/media/print/volvo_s80_iq
5
http://adsoftheworld.com/media/print/the_economist_brain
6
http://adsoftheworld.com/media/print/the_national_museum_of_
science_and_technology_little_einstein
As to affect polarity, the majority of the candidate ve-
hicles, thirty out of thirty-four, are emotionally neutral.
Besides, highbrow is marked as positive, while geek
and serpent as negative.
The ranking of candidate vehicles by its salience in the
selling premise is shown in Table 2. The semantic related-
ness calculated by PMI-IR correctly captured the main
trend of salience. IQ, Mensa and brain are ranked top,
while needle, button and table, which are the noise
introduced by the core concept extraction method, are
ranked very low. The positions of the products are marked
in italic. Only candidate vehicles having higher salience
than a product are seen as valid. For instance, horse,
ranked the twenty-sixth, is not selected for the Volvo S80
car ad, since car is judged as more intelligent than horse by
PMI-IR. On the other hand, all the metaphor vehicles used
in the original ads, i.e. chess, brain and Einstein, have
higher rankings than the corresponding tenors, which sup-
ports Ortonys salience imbalance theory.
Table 2: Candidate vehicles sorted in the descending order of
salience
Table 3 shows how candidate vehicles are filtered by the
secondary attributes of products, where candidate vehicles
that are not contradictory to the secondary attributes are
presented. Table 4 shows the candidate vehicles that are
not too different yet not too similar with the tenors of the
three ads respectively. For both results, the metaphor ve-
hicles used in the original ads survived the filtering, which
gives support to the domain interaction theory proposed by
Tourangeau and Sternberg. Nevertheless, there is also flaw
in the results produced by the LSA-IR measure. For in-
stance, regarding the fourth column of Table 3, we suspect
Proceedings of the Fourth International Conference on Computational Creativity 2013 12
brain should not have nothing to do with science and
consulted several other semantic relatedness measures,
which confirmed our skepticism.
Product car newspaper museum
Secondary
Attributes
elegance AND
luxury AND
sophisticated
international
politics OR
business news
science OR
technology
Candidate
Vehicle
chess
half
geek
IQ
brain
computer
cerebrum
mankind
highbrow
head
professor
dolphin
chess
lecturer
geek
headpiece
atheist
reader
sage
owl
car
whale
horse
half
needle
button
table
uptake
storey
brainpan
IQ
Mensa
computer
cerebrum
alien
mankind
highbrow
Einstein
head
professor
chess
lecturer
headpiece
atheist
reader
sage
owl
whale
half
needle
button
table
storey
loaf
brainpan
Table 3: Candidate vehicles NOT contradictory to the secondary
attributes of the three products respectively
Tenor car newspaper
child
(museum)
Candidate
Vehicle
pig
storey
mankind
uptake
button
half
serpent
whale
lecturer
chess
latitudinarian
sage
professor
alien
horse
IQ
professor
loaf
whale
table
atheist
geek
mankind
brainpan
head
Mensa
button
dolphin
brain
sage
pig
headpiece
uptake
storey
car
uptake
Einstein
loaf
button
headpiece
mankind
alien
sage
brainpan
highbrow
chess
owl
reader
serpent
cerebrum
professor
Table 4: Candidate vehicles that are not too different yet not too
similar with the tenors of the three ads respectively
We show in Table 5 the metaphor vehicles suggested by
our computer program for each of the three ads after apply-
ing all the four aptness metrics. For all the three ads, the
vehicles used in the original ads are included in the ve-
hicles suggested by our computer program, as marked in
italic. For the Volvo S80 car ad, the original metaphor ve-
hicle is the only one recommended by our program. For the
other two ads, our program also proposed other five and
eight vehicles respectively. Considering that there are thir-
ty four candidate vehicles input to the second stage, we
think the four aptness metrics together did an acceptable
job.
Regarding the generated vehicles other than the one used
in the original ad: are they equally effective? We will have
a closer look at the metaphor vehicles generated for the ad
of the National Museum of Science and Technology, since
it has the most suggested vehicles. It is easy to spot a se-
mantic cluster among these eight vehicles. Five out of eight
are humans or human-like entities bearing high intellect,
including Einstein, mankind, alien, highbrow and
professor. Einstein, as the most prototypical within this
cluster, fits best this specific advertising task. Besides, oth-
er vehicles in this cluster are also highly relevant to a set-
ting like museum for people, especially children, to in-
crease knowledge and encounter inspiration. They may be
optimal for other advertising tasks with slightly different
focus. The only exception is mankind, which is a very
general concept. As to the rest of the suggested metaphor
vehicles, certain headpiece is possibly kind of symbol of
intelligence; playing chess shows someone is intelligent,
and cerebrum is strongly associated with intelligence. It
is not difficult to imagine a picture of juxtaposing a head-
piece and a child, a child playing chess or a child whose
cerebrum is emphasized, all of which would be effective to
associate a child with intelligence. However, strictly speak-
ing, they are not metaphors.
On the other hand, the existence of candidate vehicles
other than the ones used in the original ads may suggest,
firstly, our implementation of the four aptness metrics may
not sufficiently reduce inapt vehicles. Secondly, more me-
trics, representing other factors that affect metaphor apt-
ness, may be necessary.
Ad Tenor Vehicle
Volvo S80 car car chess
The Economist newspaper
newspaper
professor
mankind
head
dolphin
brain
headpiece
National Museum of Science
and Technology
child
Einstein
headpiece
mankind
alien
highbrow
Proceedings of the Fourth International Conference on Computational Creativity 2013 13
chess
cerebrum
professor
Table 5: Metaphor vehicles considered apt for the three ads re-
spectively
Conclusions
In the work presented in this paper, we adopted a general
two-stage computational approach to generate apt meta-
phor ideas for pictorial advertisements. The first stage
looks for concepts which have high imageability and the
selling premise as one of their prototypical properties. The
second stage evaluates the aptness of the candidate ve-
hicles (found in the first stage) with regard to four aspects,
including affect polarity, salience, secondary attributes and
similarity with tenor. These four metric are conceived
based on the general characteristics of metaphor and its
specialty in advertising. For the first stage, we developed
an automatic knowledge extraction method to find con-
cepts of physical entities which are strongly associated
with the selling premise. For the second stage, we utilized
an affect lexicon and two semantic relatedness measures to
implement the four aptness metrics. The capacity of our
computer program is demonstrated in a task of reproducing
the pictorial metaphors used in three real advertisements.
All the three original metaphors were replicated, as well as
a few other vehicles recommended, which, we consider,
would make effective advertisements, though less optimal.
In short, our approach and implementation are promising
in generating diverse and apt pictorial metaphors for adver-
tisements.
On the other hand, to have a more critical view of our
approach and implementation, larger scale evaluation is in
need. Continuing the evaluation design introduced in this
paper, more examples of pictorial metaphors used in real
advertisements have to be collected and annotated. This
corpus would not only contribute to building our metaphor
generator, but also be an asset for the research on metaphor
and creativity in general.
Moreover, the results provided by our aptness metrics
support both the salience imbalance theory and the domain
interaction theory.
Future Work
We intend to compute more ways of expression appeared
in pictorial advertisements. Firstly, our current implemen-
tation can be readily adapted to generate visual puns. In a
pun, the product (or something associated to it) also has the
meaning of the selling premise. An example is an existing
ad which uses the picture of an owl to convey the message
zoo is a place to learn and gain wisdom. As we all know,
owl is both a member of the zoo and a symbol of wisdom.
Secondly, we found some other fields of study are very
relevant to computing advertising expression, such as the
research and computational modeling of humor (Raskin
1985; Attardo and Raskin 1991; Ritchie 2001; Binsted,
Bergen, Coulson, Nijholt, Stock, Strapparava, Ritchie, Ma-
nurung, Pain, Waller and OMara, 2006). Finally, we are
especially interested in investigating hyperbole. Hyperbole
has nearly universal presence in advertisements, but its
theoretic construction and computational modeling are
minimal. There exist some ad-hoc approaches: for in-
stance, we can find the exaggeration of the selling proposi-
tion by the AlsoSee relation in WordNet; or, we should
first think about a cognitive or linguistic model of hyper-
bole instead.
References
Abe, K., Sakamoto, K., and Nakagawa, M. 2006. A computation-
al model of metaphor generation process. In Proceedings of the
28th Annual Meeting of the Cognitive Science Society, 937942.
Amos, C., Holmes, G., and Strutton, D. 2008. Exploring the rela-
tionship between celebrity endorser effects and advertising effec-
tiveness: A quantitative synthesis of effect size. International
Journal of Advertising 27(2):209234.
Andersson, S., Hedelin, A., Nilsson, A., and Welander, C. 2004.
Violent advertising in fashion marketing. Journal of Fashion
Marketing and Management 8:96-112.
Aristotle. 1924. The Art of Rhetoric. Translated by W. Rhys Ro-
berts. The Internet Classics Archive, accessed October 1, 2012.
http://classics.mit.edu//Aristotle/rhetoric.html.
Attardo, S., and Raskin, V. 1991. Script theory revis(it)ed: Joke
similarity and joke representation model. Humor: International
Journal of Humor Research 4(3-4):293-347.
Baccianella, S., Esuli, A., and Sebastiani, F. 2010. SentiWordNet
3.0: An enhanced lexical resource for sentiment analysis and
opinion mining. In Proceedings of the 7th Conference on Lan-
guage Resources and Evaluation (LREC10), Valletta, MT.
Binsted, K., Bergen, B., Coulson, S., Nijholt, A., Stock, O.,
Strapparava, C., Ritchie, G., Manurung, R., Pain, H., Waller, A.,
and OMara, D. 2006. Computational humor. IEEE Intelligent
Systems March-April.
Black, M. 1962. Models and metaphors. NY: Cornell University
Press.
Budanitsky, A., and Hirst, G. 2006. Evaluating WordNet-based
measures of semantic distance. Journal of Computational Lin-
guistics 32(1):1347.
Chiappe, D., and Kennedy, J. 1999. Aptness predicts preference
for metaphors or similes, as well as recall bias. Psychonomic
Bulletin and Review 6:668676.
De Pelsmacker, P., and Van Den Bergh, J. 1996. The communi-
cation effects of provocation in print advertising. International
Journal of Advertising 15(3):20322.
Fellbaum, C. D., ed. 1998. WordNet: An Electronic Lexical Da-
tabase. Cambridge: MIT Press.
Forceville, C. 1996. Pictorial Metaphor in Advertising. London:
Routledge.
Foss, S. K. 2005. Theory of Visual Rhetoric. In Smith, K., Mo-
riarty, S., Barbatsis, G., and Kenney, K., eds., Hand- book of
Visual Communication: Theory, Methods, and Media. Mahwah,
New Jersey: Lawrence Erlbaum. 141-152.
Gentner, D. 1983. Structure-mapping: A theoretical frame-
work for analogy. Cognitive Science 7:155-170.
Proceedings of the Fourth International Conference on Computational Creativity 2013 14
Gentner, D., and Clements, C. 1988. Evidence for relational se-
lectivity in the interpretation of analogy and metaphor. In Bower,
G., ed., The Psychology of Learning and Motivation, Vol. 22.
Orlando, FL: Academic Press.
Gentner, D., and Markman, A. B. 1997. Structure mapping in
analogy and similarity. American Psychologist 52:45-56.
Glucksberg, S., and Keysar, B. 1990. Understanding Metaphori-
cal comparisons: beyond similarity. Psychological Review 97:3-
18.
Glucksberg, S., and Keysar, B. 1993. How metaphors work. In
Ortony, A. ed., Metaphor and Thought, 2nd ed. Cambridge:
Cambridge University Press. 401- 424.
Goldenberg, J., Mazursky, D., and Solomon, S. 1999. Creativity
templates: towards identifying the fundamental schemes of quali-
ty advertisements. Marketing Science 18(3):333-351.
Kameya, Y., and Sato, T. 2005. Computation of probabilistic
relationship between concepts and their attributes using a statis-
tical analysis of Japanese corpora. In Proceedings of Symposium
on Large-Scale Knowledge Resources, 65-68.
Kanerva, P., Kristofersson, J., and Holst, A. 2000. Random index-
ing of text samples for latent semantic analysis. In Proceedings of
the 22nd Annual Conference of the Cognitive Science Society
(CogSci'00), Erlbaum.
Kiss, G. R., Armstrong, C., Milroy, R., and Piper, J. 1973. An
associative thesaurus of English and its computer analysis. In
Aitken, A. J., Bailey, R. W., and Hamilton-Smith, N. eds., The
Computer and Literary Studies, Edinburgh: University Press.
153-165.
Kohli, C., and Labahn, D. W. 1997. Creating effective brand
names: A study of the naming process. Journal of Advertising
Research 37(1):6775.
Landauer, T. K., and Dumais, S. T. 1997. A solution to Plato's
problem: The latent semantic analysis theory of the acquisition,
induction, and representation of knowledge. Psychological Re-
view 104:211-240.
Liu, H., and Singh, P. 2004. ConceptNet: A practical common-
sense reasoning toolkit. BT Technology Journal 22(4):211-26.
Maes, A., and Schilperoord, J. 2008. Classifying visual rhetoric:
Conceptual and structural heuristics. In McQuarrie, E. F., and
Phillips, B. J., eds., Go Figure: New Directions in Advertising
Rhetoric. Armonk, New York: Sharpe. 227-257.
Marschark, M., Katz, A. N., and Paivio, A. Dimensions of meta-
phor, Journal of Psycholinguistic Research 12(1):17.
Nelson, D. L., McEvoy, C. L., and Schreiber, T. A. 1998. The
university of south Florida word association, rhyme, and word
fragment norms. University of South Florida Website, accessed
January 13, 2012. http://www.usf.edu/FreeAssociation/.
Ortony, A. 1979. Beyond literal similarity. Psychological Review
86:161-180.
Paivio, A., Yuille, J. C., and Madigan, S. 1968. Concreteness,
imagery, and meaningfulness values of 925 nouns. Journal of
Experimental Psychology 76:1-25.
Phillips, B. J. 1997. Thinking into it: Consumer interpretation of
complex advertising images. Journal of Advertising 26(2):7787.
Phillips, B. J., and McQuarrie, E. F. 2004. Beyond visual meta-
phor: A new typology of visual rhetoric in advertising. Marketing
Theory 4(1/2):111-134.
Phillips, B. J., and McQuarrie, E. F. 2009. Impact of advertising
metaphor on consumer beliefs: delineating the contribution of
comparison versus deviation factors. Journal of Advertising
38(1):49-61.
Raskin, V. 1985. Semantic Mechanisms of Humor. Dordrecht,
Boston, Lancaster: D. Reidel Publishing Company.
Richards, I. A. 1936. The Philosophy of Rhetoric. London: Ox-
ford University Press.
Ritchie, G. D. 2001. Current directions in computational humour.
Artificial Intelligence Review 16(2):119-135.
Roget, P. M. 1852. Rogets Thesaurus of English Words and
Phrases. Harlow, Essex, England: Longman Group Ltd.
Terai, A., and Nakagawa, M. 2009. A neural network model of
metaphor generation with dynamic interaction. In Alippi, C.,
Polycarpou, M., Panayiotou, C., Ellinas, G., eds., ICANN 2009,
Part I. LNCS, 5768, Springer, Heidelberg . 779788.
Terai, A., and Nakagawa, M. 2010. A computational system of
metaphor generation with evaluation mechanism. In Diamantaras,
K., Duch, W., Iliadis, L. S., eds., ICANN 2010, Part II. LNCS,
6353, Springer, Heidelberg. 142147.
Toglia, M. P., and Battig, W. F. 1978. Handbook of Semantic
Word Norms. Hillsdale, NJ: Erlbaum.
Tourangeau, R., and Sternberg, R. J. 1981. Aptness in metaphor.
Cognitive Psychology 13: 27-55.
Tourangeau, R., and Sternberg, R. J. 1982. Understanding and
appreciating metaphors. Cognition 11: 203-244.
Turney, P. D. 2001. Mining the Web for synonyms: PMI-IR ver-
sus LSA on TOEFL. In Proceedings of the Twelfth European
Conference on Machine Learning, 491-502. Berlin: Springer-
Verlag.
Veale, T., and Keane, M. T. 1992. Conceptual Scaffolding: A
spatially-founded meaning representation for metaphor compre-
hension. Computational Intelligence 8(3):494-519.
Veale, T., O Donoghue, D., and Keane, M. T. 1995. Epistemolog-
ical issues in metaphor comprehension: A comparative analysis of
three models of metaphor interpretation. In Proceedings of
ICLC95, the Fourth Conference of The International Cognitive
Linguistics Association, Albuquerque NM.
Veale, T., and Hao, Y. 2007. Comprehending and generating apt
metaphors: A Web-driven, case-based approach to figurative
language. In Proceedings of the Twenty-Second AAAI Conference
on Artificial Intelligence, Vancouver, Canada.
Vzina, R., and Paul, O. 1997. Provocation in advertising: A con-
ceptualization and an empirical assessment. International Journal
of Research in Marketing 14(2):177-192.
Waltinger, U., Cramer, I., and Wandmacher, T. 2009. From so-
cial networks to distributional properties: A comparative study on
computing semantic relatedness. In Proceedings of the Thirty-
First Annual Meeting of the Cognitive Science Society, CogSci
2009, 3016- 3021. Cognitive Science Society, Amsterdam, Neth-
erlands.
Xiao, P., and Blat, J. 2012. Image the imageless: Search for pic-
torial representations of abstract concepts. In Proceedings of the
Seventh International Conference on Design Principles and Prac-
tices, Chiba, Japan.
Proceedings of the Fourth International Conference on Computational Creativity 2013 15
Once More, With Feeling!
Using Creative Affective Metaphors to Express Information Needs
Tony Veale
Web Science & Technology Division, KAIST / School of Computer Science and Informatics, UCD
Korean Advanced Institute of Science & Technology, South Korea / University College Dublin, Ireland.
[email protected]
Abstract
Creative metaphors abound in language because they
facilitate communication that is memorable, effective and
elastic. Such metaphors allow a speaker to be maximally
suggestive while being minimally committed to any single
interpretation, so they can both supply and elicit information
in a conversation. Yet, though metaphors are often used to
articulate affective viewpoints and information needs in
everyday language, they are rarely used in information
retrieval (IR) queries. IR fails to distinguish between
creative and uncreative uses of words, since it typically
treats words as literal mentions rather than suggestive
allusions. We show here how a computational model of
affective comprehension and generation allows IR users to
express their information needs with creative metaphors that
concisely allude to a dense body of assertions. The key to
this approach is a lexicon of stereotypical concepts and their
affective properties. We show how such a lexicon is
harvested from the open web and from local web n-grams.
Creative Truths
Picasso famously claimed that art is a lie that tells the
truth. Fittingly, this artful contradiction suggests a
compelling reason for why speakers are so wont to use
artfully suggestive forms of creative language such as
metaphor and irony when less ambiguous and more
direct forms are available. While literal language commits
a speaker to a tightly fixed meaning, and offers little scope
to the listener to contribute to the joint construction of
meaning, creative language suggests a looser but
potentially richer meaning that is amenable to collaborative
elaboration by each participant in a conversation.
A metaphor X is Y establishes a conceptual pact between
speaker and listener (Brennan & Clark, 1996), one that
says let us agree to speak of X using the language and
norms of Y (Hanks, 2006). Suppose a speaker asserts that
X is a snake. Here, the stereotype snake conveys the
speakers negative stance toward X, and suggests a range
of talking points for X, such as that X is charming and
clever but also dangerous, and is not to be trusted (Veale
& Hao, 2008). A listener may now respond by elaborating
the metaphor, even when disagreeing with the basic
conceit, as in I agree that X can be charming, but I see no
reason to distrust him. Successive elaboration thus allows
a speaker and listener to arrive at a mutually acceptable
construal of a metaphorical snake in the context of X.
Metaphors achieve a balance of suggestiveness and
concision through the use of dense descriptors, familiar
terms like snake that evoke a rich variety of stereotypical
properties and behaviors (Fishelov, 1992). Though every
concept has the potential to be used creatively, casual
metaphors tend to draw their dense descriptors from a large
pool of familiar stereotypes shared by all speakers of a
language (Taylor, 1954). A richer, more conceptual model
of the lexicon is needed to allow any creative uses of
stereotypes to be inferred as needed in context. We will
show here how a large lexicon of stereotypes is mined
from the web, and how stereotypical representations can be
used selectively and creatively, to highlight relevant
aspects of a given target concept in a specific metaphor.
Because so many familiar stereotypes have polarizing
qualities think of the endearing and not-so-endearing
qualities of babies, for instance metaphors are ideal
vehicles for conveying an affective stance toward a topic.
Even stereotypes that are not used figuratively, as in the
claim Steve Jobs was a great leader, are likely to elicit
metaphors in response, such as yes, a true pioneer or
what an artist!, or even but he could be such a tyrant!.
Proper-names can also be used as evocative stereotypes, as
when Steve Jobs is compared to the fictional inventor Tony
Stark, or Apple is compared to Scientology, or Google to
Microsoft. We use stereotypes effortlessly, and their
exploitations are common currency in everyday language.
Information retrieval, however, is a language-driven
application where the currency of metaphor has little or no
exchange value, not least because IR fails to discriminate
literal from non-literal language (Veale 2004, 2011, 2012).
Speakers use metaphor to provide and elicit information in
Proceedings of the Fourth International Conference on Computational Creativity 2013 16
casual conversation, but IR reduces any metaphoric query
to literal keywords and key-phrases, which are matched
near-identically in texts (Salton, 1968; Van Rijsbergen
1979). Yet everyday language shows that metaphor is an
ideal form for expressing our information needs. A query
like Steve Jobs as a good leader can be viewed by an IR
system as a request to consider all the ways in which
leaders are stereotypically good, and to then consider all
the metaphors that are typically used to convey these
viewpoints. The IR staple of query expansion (Vernimb,
1977; Vorhees, 1994,1998; Navigli & Velardi, 2003; Xu &
Croft, 1996) can be made both affect-driven and metaphor-
aware. In this paper we show how an affective stereotype-
based lexicon can both comprehend and generate affective
metaphors that capture or shape a users feelings, and show
how this capability can lead to more creative forms of IR.
Related Work and Ideas
Metaphor has been studied within computer science for
four decades, yet it remains at the periphery of NLP
research. The reasons for this marginalization are, for the
most part, pragmatic ones, since metaphors can be as
varied and challenging as human creativity will allow. The
greatest success has been achieved by focusing on
conventional metaphors (e.g., Martin, 1990; Mason, 2004),
or on very specific domains of usage, such as figurative
descriptions of mental states (e.g., Barden, 2006).
From the earliest computational forays, it has been
recognized that metaphor is essentially a problem of
knowledge representation. Semantic representations are
typically designed for well-behaved mappings of words to
meanings what Hanks (2006) calls norms but metaphor
requires a system of soft preferences rather than hard (and
brittle) constraints. Wilks (1978) thus proposed his
preference semantics model, which Fass (1991,1997)
extended into a collative semantics. In contrast, Way
(1990) argues that metaphor requires a dynamic concept
hierarchy that can stretch to meet the norm-bending
demands of figurative ideation, though her approach lacks
computational substance.
More recently, some success has been obtained with
statistical approaches that side-step the problems of
knowledge representation, by working instead with implied
or latent representations that are derived from word
distributions. Turney and Littman (2005) show how a
statistical model of relational similarity can be constructed
from web texts for retrieving the correct answer to
proportional analogies, of the kind used in SAT tests. No
hand-coded knowledge is employed, yet Turney and
Littmans system achieves an average human grade on a
set of 376 real SAT analogies.
Shutova (2010) annotates verbal metaphors in corpora
(such as to stir excitement, where stir is used
metaphorically) with the corresponding conceptual
metaphors identified by Lakoff and Johnson (1980).
Statistical clustering techniques are then used to generalize
from the annotated exemplars, allowing the system to
recognize and retrieve other metaphors in the same vein
(e.g. he swallowed his anger). These clusters can also be
analyzed to identify literal paraphrases for a metaphor
(such as to provoke excitement or suppress anger).
Shutovas approach is noteworthy for operating with
Lakoff & Johnsons inventory of conceptual metaphors
without using an explicit knowledge representation.
Hanks (2006) argues that metaphors exploit
distributional norms: to understand a metaphor, one must
first recognize the norm that is exploited. Common norms
in language are the preferred semantic arguments of verbs,
as well as idioms, clichs and other multi-word
expressions. Veale and Hao (2007a) suggest that
stereotypes are conceptual norms that are found in many
figurative expressions, and note that stereotypes and
similes enjoy a symbiotic relationship that has some
obvious computational advantages. Similes use stereotypes
to illustrate the qualities ascribed to a topic, while
stereotypes are often promulgated via proverbial similes
(Taylor, 1954). Veale and Hao (2007a) show how
stereotypical knowledge can be acquired by harvesting
Hearst patterns of the form as P as C (e.g. as smooth
as silk) from the web (Hearst, 1992). They show in
(2007b) how this body of stereotypes can be used in a web-
based model of metaphor generation and comprehension.
Veale (2011) employs stereotypes as the basis of a new
creative information retrieval paradigm, by introducing a
variety of non-literal wildcards in the vein of Mihalcea
(2002). In this system, @Noun matches any adjective that
denotes a stereotypical property of Noun (so e.g. @knife
matches sharp, cold, etc.) while @Adj matches any noun
for which Adj is stereotypical (e.g. @sharp matches sword,
laser, razor, etc.). In addition, ?Adj matches any property
or behavior that co-occurs with, and reinforces, the
property denoted by Adj; thus, ?hot matches humid, sultry
and spicy. Likewise, ?Noun matches any noun that denotes
a pragmatic neighbor of Noun, where two words are
neighbors if they are seen to be clustered in the same ad-
hoc set (Hanks, 2005), such as lawyers and doctors or
pirates and thieves. The knowledge needed for @ is
obtained by mining text from the open web, while that for
? is obtained by mining ad-hoc sets from Google n-grams.
There are a number of shortcomings to this approach.
For one, Veale (2011) does not adequately model the
affective profile of either stereotypes or their properties.
For another, the stereotype lexicon is static, and focuses
primarily on adjectival properties (like sharp and hot). It
thus lacks knowledge of everyday verbal behaviors like
cutting, crying, swaggering, etc. So we build here on the
work of Veale (2011) in several important ways.
First, we enrich and enlarge the stereotype lexicon, to
include more stereotypes and behaviors. We determine an
affective polarity for each property or behavior and for
Proceedings of the Fourth International Conference on Computational Creativity 2013 17
each stereotype, and show how polarized +/- viewpoints on
a topic can be calculated on the fly. We show how proxy
representations for ad-hoc proper-named stereotypes (like
Microsoft) can be constructed on demand. Finally, we
show how metaphors are mined from the Google n-grams,
to allow the system to understand novel metaphors (like
Google is another Microsoft or Apple is a cult) as well as
to generate plausible metaphors for users affective
information needs (e.g., Steve Jobs was a great leader,
Google is too powerful, etc.).
Once more, with feeling!
If a property or behavior P is stereotypical of a concept
C, we should expect to frequently observe P in instances of
C. In linguistic terms, we can expect to see collocations of
P and C in a resource like the Google n-grams (Brants
and Franz, 2006). Consider these 3-grams for cowboy
(numbers in parentheses are Google database frequencies).
a lonesome cowboy 432
a mounted cowboy 122
a grizzled cowboy 74
a swaggering cowboy 68
N-gram patterns of the above form allow us to find
frequent ascriptions of a quality to a noun-concept, but
frequently observed qualities are not always noteworthy
qualities (e.g., see Almuhareb and Poesio, 2004,2005).
However, if we also observe these qualities in similes
such as "swaggering like a cowboy or as grizzled as a
cowboy this suggests that speakers see these as typical
enough to anchor a figurative comparison. So for each
hypothesis P is stereotypical of C that we derive from the
Google n-grams, we generate the corresponding simile
form: we use the like form for verbal behaviors such as
swaggering, and the as-as form for adjectival
properties such as lonesome. We then dispatch each
simile as a phrasal query to Google: a hypothesis is
validated if the corresponding simile is found on the web.
This mining process gives us over 200,000 validated
hypotheses for our stereotype lexicon. We now filter these
hypotheses manually, to ensure that the contents of the
lexicon are of the highest quality (investing just weeks of
labor produces a very reliable resource; see Veale 2012 for
more detail). We obtain rich descriptions for commonplace
ideas, such as the dense descriptor Baby, whose 163 highly
salient qualities a set denoted typical(Baby) includes
crying, drooling and guileless. After this manual phase, the
stereotype lexicon maps 9,479 stereotypes to a set of 7,898
properties / behaviors, to yield more than 75,000 pairings.
Determining Nuanced Affect
To understand the affective uses of a property or behavior,
we employ the intuition that those which reinforce each
other in a single description (e.g. as lush and green as a
jungle or as hot and humid as a sauna) are more likely
to have the same affect than those which do not. To
construct a support graph of mutually reinforcing
properties, we gather all Google 3-grams in which a pair of
stereotypical properties or behaviors X and Y are linked
via coordination, as in hot and spicy or kicking and
screaming. A bidirectional link between X and Y is added
to the graph if one or more stereotypes in the lexicon
contain both X and Y. If this is not so, we consider whether
both descriptors ever reinforce each other in web similes,
by posing the web query as X and Y as. If this query has
a non-zero hit set, we still add a link between X and Y.
Next, we build a reference set -R of typically negative
words, and a disjoint set +R of typically positive words.
Given a few seed members for -R (such as sad, evil,
monster, etc.) and a few seed members for +R (such as
happy, wonderful, hero, etc.), we use the ? operator of
Veale (2011) to successively expand this set by suggesting
neighboring words of the same affect (e.g., sad and
pathetic, happy and healthy). After three iterations in
this fashion, we populate +R and -R with approx. 2000
words each. If we can anchor enough nodes in the graph
with + or labels, we can interpolate a nuanced positive /
negative score for all nodes in the graph. Let N(p) denote
the set of neighboring terms to a property or behavior p in
the support graph. Now, we define:
(1) N
+
(p) = N(p) ! +R
(2) N
-
(p) = N(p) ! -R
We assign positive / negative affect scores to p as follows:
(3) pos(p) = |N
+
(p)|
|N
+
(p) " N
-
(p)|
(4) neg(p) = 1 - pos(p)
Thus, pos(p) estimates the probability that p is used in a
positive context, while neg(p) estimates the probability that
p is used in a negative context. The X and Y 3-grams
approximate these contexts for us.
Now, if a term S denotes a stereotypical idea that is
described in the lexicon with the set of typical properties
and behaviors denoted typical(S), then:
(5) pos(S) = #
p$typical(S)
pos(p)
|typical(S)|
(6) neg(S) = 1 - pos(S)
So we simply calculate the mean affect of the properties
and behaviors of s, as represented in the lexicon via
typical(s). Note that (5) and (6) are simply gross defaults.
Proceedings of the Fourth International Conference on Computational Creativity 2013 18
One can always use (3) and (4) to separate the elements of
typical(s) into those which are more negative than positive
(a negative spin on s) and those which are more positive
than negative (a positive spin on s). Thus, we define:
(7) posTypical(S) = {p $ typical(S) | pos(p) > neg(p)}
(8) negTypical(S) = {p $ typical(S) | neg(p) > pos(p)}
For instance, the positive stereotype of Baby contains the
qualities such as smiling, adorable and cute, while the
negative stereotype contains qualities such as crying,
wailing and sniveling. As well see next, this ability to
affectively spin a stereotype is key to automatically
generating affective metaphors on demand.
Generating Affective Metaphors, N-gram style
The Google n-grams is also a rich source of copula
metaphors of the form Target is Source, such as
politicians are crooks, Apple is a cult, racism is a
disease and Steve Jobs is a god. Let src(T) denote the
set of stereotypes that are commonly used to describe T,
where commonality is defined as the presence of the
corresponding copula metaphor in the Google n-grams. To
also find metaphors for proper-named entities like Bill
Gates, we analyse n-grams of the form stereotype First
[Middle] Last, such as tyrant Adolf Hitler. For example:
src(racism) = {problem, disease, joke, sin, poison,
crime, ideology, weapon}
src(Hitler) = {monster, criminal, tyrant, idiot, madman,
vegetarian, racist, }
We do not try to discriminate literal from non-literal
assertions, nor indeed do we try to define literality at all.
Rather, we assume each putative metaphor offers a
potentially useful perspective on a topic T.
Let srcTypical(T) denote the aggregation of all
properties ascribable to T via metaphors in src(T):
(9) srcTypical (T) =
M$src(T)
typical(M)
We can also use the posTypical and negTypical variants of
(7) and (8) to focus only on metaphors that place a positive
or negatve spin on a topic T. In effect, (9) provides a
feature representation for topic T as viewed through the
creative lens of metaphor. This is useful when the source S
in the metaphor T is S is not a stereotype in the lexicon, as
happens when one describes Rasputin as Karl Rove, or
Apple as Scientology. When the set typical(S) is empty,
srcTypical(S) may not be, so srcTypical(S) can act as a
proxy representation for S in these cases.
The properties and behaviors that are salient to the
interpretation of T is S are given by:
(10) salient (T,S) = [srcTypical(T) " typical(T)]
!
[srcTypical(S) " typical(S)]
In the context of T is S, the metaphorical stereotype M $
src(S)"src(T)"{S} is an apt vehicle for T if:
(11) apt(M, T,S) = |salient(T,S) ! typical(M)| > 0
and the degree to which M is apt for T is given by:
(12) aptness(M,T,S) = |salient(T, S) ! typical(M)|
|typical(M)|
We can now construct an interpretation for T is S by
considering the stereotypes in src(T) that are apt for T in
the context of T is S, and by also considering the
stereotypes that are commonly used to describe S that are
also potentially apt for T:
(13) interpretation(T, S)
= {M $ src(S)"src(T)"{S} | apt(M, T, S)}
In effect, the interpretation of the creative metaphor T is S
is itself a set of more conventional metaphors that are apt
for T and which expand upon S. The elements {M
i
} of
interpretation(T, S) can be sorted by aptness(M
i
T,S) to
produce a ranked list of interpretations (M
1
M
n
). For a
given interpretation M, the salient features of M are thus:
(14) salient(M, T,S) = typical(M) ! salient (T,S)
So if T is S is a creative IR query to find documents in
which T is viewed as S then interpretation(T, S) is an
expansion of T is S that includes the common metaphors
that are consistent with T viewed as S. In turn, for any
viewpoint M
i
in interpretation(T, S), then salient(M
i
, T, S)
is an expansion of M
i
that includes all of the qualities that
T is likely to exhibit when it behaves like M
i
.
A Worked Example: Metaphor Generation for IR
Consider the creative query Google is Microsoft, which
expresses a users need to find documents in which Google
exhibits qualities typically associated with Microsoft. Now,
both Google and Microsoft are complex concepts, so there
are many ways in which they can be considered similar or
dissimilar, whether in a good or a bad light. However, we
can expect the most salient aspects of Microsoft to be those
that underpin our common metaphors for Microsoft, i.e.,
the stereotypes in src(Microsoft). These metaphors will
provide the talking points for an interpretation.
The Google n-grams yield up the following metaphors,
57 for Microsoft and 50 for Google:
"
Proceedings of the Fourth International Conference on Computational Creativity 2013 19
src(Microsoft) = {king, master, threat, bully, giant,
leader, monopoly, dinosaur }
src(Google) = {king, engine, threat, brand, giant,
leader, celebrity, religion }
So the following qualities are aggregrated for each:
srcTypical(Microsoft) = {trusted, menacing, ruling,
threatening, overbearing,
admired, commanding, }
srcTypical(Google) = {trusted, admired, reigning,
lurking, crowned, shining,
ruling, determined, }
Now, the salient qualities highlighted by the metaphor,
namely salient(Google, Microsoft), are:
{celebrated, menacing, trusted, challenging, established,
threatening, admired, respected, }
Finally, interpretation(Google,Microsoft) contains:
{king, criminal, master, leader, bully, threatening, giant,
threat, monopoly, pioneer, dinosaur, }
Lets focus on the expansion Google is king, since
according to (12), aptness(king, Google, Microsoft) = 0.48
and this is the highest ranked element of the interpretation.
Now, salient(king, Google, Microsoft) contains:
{celebrated, revered, admired, respected, ruling,
arrogant, commanding, overbearing, reigning, }
Note that these properties / behaviours are already implicit
in our consensus perception of Google, insofar as they are
highly salient aspects of the stereotypical concepts to
which Google is frequently compared on the web. These
properties / behaviours can now be used to perform query
expansion for the query term Google, to find documents
where the system believes Google is acting like Microsoft.
The metaphor Google is Microsoft is diffuse and
lacks an affective stance. So lets consider instead the
metaphor Google is -Microsoft, where - is used to
impart a negative spin (and where + can likewise impart a
positive spin). In this case, negTypical is used in place of
typical in (9) and (10), so that:
srcTypical(-Microsoft) =
{menacing, threatening, twisted, raging, feared,
sinister, lurking, domineering, overbearing, }
and
salient(Google, -Microsoft) =
{menacing, bullying, roaring, dreaded}
Now, interpretation(Google, -Microsoft) becomes:
{criminal, giant, threat, bully, evil, victim, devil, }
In contrast, interpretation(Google, +Microsoft) is:
{king, master, leader, pioneer, classic, partner, }
More focus is achieved with this query in the form of a
simile: Google is as -powerful as Microsoft. For explicit
similes, we need to focus on just a sub-set of salient
properties, as in this varient of (10):
{p $ salient(Google, Microsoft) | p $ N
-
(powerful)}
In this case, the final interpretation becomes:
{bully, threat, giant, devil, monopoly, dinosaur, }
A few simple concepts can thus yield a wide range of
options for the creative IR user who is willing to build
queries around affective metaphors and similes.
Empirical Evaluation
The affective stereotype lexicon is the cornerstone of the
current approach, and must reliably assign meaningful
polarity scores both to properties and to the stereotypes
that exemplify them. Our affect model is simple in that it
relies principally on +/- affect, but as demonstrated above,
users can articulate their own expressive moods to suit
their needs: for Stereotypical example, one can express
disdain for too much power with the term -powerful, or
express admiration for guile with +cunning and +devious.
The Effect of Affect: Stereotypes and Properties
Note that the polarity scores assigned to a property p in (3)
and (4) do not rely on any prior classification of p, such as
whether p is in +R or -R. That is, +R and -R are not used
as training data, and (3) and (4) receive no error feedback.
Of course, we expect that for p $ +R that pos(p) > neg(p),
and for p $ -R that neg(p) > pos(p), but (3) and (4) do not
iterate until this is so. Measuring the extent to which these
simple intuitions are validated thus offers a good
evaluation of our graph-based affect mechanism.
Just five properties in +R (approx. 0.4% of the 1,314
properties in +R) are given a positivity of less than 0.5
using (3), leading those words to be misclassified as more
negative than positive. The misclassified property words
are: evanescent, giggling, licking, devotional and fraternal.
Just twenty-six properties in -R (approx. 1.9% of the
1,385 properties in -R) are assigned a negativity of less
than 0.5 via (4), leading these to be misclassified as more
positive than negative. The misclassified words are: cocky,
dense, demanding, urgent, acute, unavoidable, critical,
startling, gaudy, decadent, biting, controversial, peculiar,
disinterested, strict, visceral, feared, opinionated,
humbling, subdued, impetuous, shooting, acerbic,
heartrending, ineluctable and groveling.
Proceedings of the Fourth International Conference on Computational Creativity 2013 20
Because +R and -R have been populated with words
that have been chosen for their perceived +/- slants, this
result is hardly surprising. Nonetheless, it does validate the
key intuitions that underpin (3) and (4) that the affective
polarity of a property p can be reliably estimated as a
simple function of the affect of the co-descriptors with
which it is most commonly used in descriptive contexts.
The sets +R and -R are populated with adjectives, verbal
behaviors and nouns. +R contains 478 nouns denoting
positive stereotypes (such as saint and hero) while -R
contains 677 nouns denoting negative stereotypes (such as
tyrant and monster). When these reference stereotypes are
used to test the effectiveness of (5) and (6) and thus,
indirectly, of (3) and (4) and of the stereotype lexicon itself
96.7% of the positive stereotype exemplars are correctly
assigned a mean positivity of more than 0.5 (so, pos(S) >
neg(S)) and 96.2% of the negative exemplars are correctly
assigned a mean negativity of more than 0.5 (so, neg(S) >
pos(S)). Though it may seem crude to assess the affect of a
stereotype as the mean of the affect of its properties, this
does appear to be a reliable measure of polar affect.
The Representational Adequacy of Metaphors
We have argued that metaphors can provide a collective
representation of a concept that has no other representation
in a system. But how good a proxy is src(S) or
srcTypical(S) for an S like Karl Rove or Microsoft? Can we
reliably estimate the +/- polarity of S as a function of
src(S)? We can estimate these from metaphors as follows:
(15) pos(S) = #
M$src(S)
pos(M)
|src(S)|
(16) neg(S) = #
M$src(S)
neg(M)
|src(S)|
Testing this estimator on the exemplar stereotypes in +R
and -R, the correct polarity (+ or -) is estimated 87.2% of
the time. Metaphors in the Google n-grams are thus
broadly consistent with our perceptions of whether a topic
is positively or negatively slanted.
When we consider all stereotypes S for which |src(S)| >
0 (there are 6,904 in the lexicon), srcTypical(S) covers, on
average, just 65.7% of the typical properties of S (that is,
of typical(S)). Nonetheless, this shortfall is precisely why
we use novel metaphors. Consider this variant of (9) which
captures the longer reach of these novel metaphors:
(17) srcTypical
2
(T) =
S $ src(T)
srcTypical(S)
Thus, srcTypical
2
(T) denotes the set of qualities that are
ascribable to T via the expansive interpretation of all
metaphors T is S in the Google n-grams, since S can now
project onto T any element of srcTypical(S). Using macro-
averaging over all 6,904 cases where |src(S)| > 0, we find
that srcTypical
2
(S) covers 99.2% of typical(S) on average.
A well-chosen metaphor enables us to emphasize almost
any quality of a topic T we might wish to highlight.
Affective Text Retrieval with Creative Metaphors
Suppose we have a database of texts {D
1
D
n
} in which
each document D
i
offers a creative perspective on a topic
T. We might have texts that view politicians as crooks,
popes as kings, or hackers as heroes. So given a query +T,
can we retrieve only those texts that view T positively, and
given -T can we retrieve only the negative texts about T?
We first construct a database of artificial figurative
texts. For each stereotype S in the lexicon, and for each M
$ src(S)!(+R"-R), we construct a text D
SM
in which S is
viewed as M. The title of document D
SM
is S is M,
while the body of D
SM
contains all the words in src(M).
D
SM
uses the typical language of M to talk about S. For
each D
SM
, we know whether D
SM
conveys a positive or
negative viewpoint on S, since M sits in either in +R or -R.
The affect lexicon contains 5,704 stereotypes S for
which src(S)!(+R"-R) is non-empty. On average, each of
these stereotypes is described in terms of 14 other
stereotypes (5.8 are negative and 8.2 are positive,
according to +R and -R) and we construct a representative
document for each of these viewpoints. We construct a set
of 79,856 artificial documents in total, to convey figurative
perspectives on 5,704 different stereotypical topics:
Table 1. Macro-Average P/R/F1 scores for affective retrieval of
+ and - viewpoints for 5,704 topics.
Macro Average
(5704 topics)
Positive
viewpoints
Negative
viewpoints
Precision .86 .93
Recall .95 .78
F-Score .90 .85
For each document retrieved for T, we estimate its polarity
as the mean of the polarity of the words it contains. Table 1
presents the results of this experiment, in which we attempt
to retrieve only the positive viewpoints for T with a query
+T, and only the negative viewpoints for T using -T. The
results are sufficiently encouraging to support the further
development of a creative text retrieval engine that is
capable of ranking documents by the affective figurative
perspective that they offer on a topic.
"
Proceedings of the Fourth International Conference on Computational Creativity 2013 21
Concluding Thoughts: The Creative Web
Metaphor is a creative knowledge multiplier that allows us
to expand our knowledge of a topic T by using knowledge
of other ideas as a magnifying lens. We have presented
here a robust, stereotype-driven approach that embodies
this practical philosophy. Knowledge multiplication is
achieved using an expansionary approach, in which an
affective query is expanded to include all of the metaphors
that are commonly used to convey this affective viewpoint.
These viewpoints are expanded in turn to include all the
qualities that are typically implied by each. Such an
approach is ideally suited to a creative re-imagining of IR.
An implementation of these ideas is available for use
on the web. Named Metaphor Magnet, the system allows
users to enter queries of the form shown here (such as
Google is Microsoft, Steve Jobs as Tony Stark, Rasputin
as Karl Rove, etc.). Each query is expanded into a set of
apt metaphors mined from the Google n-grams, and each
metaphor is expanded into a set of contextually apt
qualities. In turn, each quality is expanded into an IR query
that is used to retrieve relevant hits from Google. In effect,
the system still an early prototype allows users to
interface with a search engine like Google using metaphor
and other affective language forms. The system can
currently be accessed at this URL:
http://boundinanutshell.com/metaphor-magnet
Metaphor Magnet is just one possible application of the
ideas presented here, which constitute not so much a
philosophical or linguistic theory of metaphor, but an
engineering-oriented toolkit of reusable concepts for
imbuing a wide range of text applications with a robust
competence in linguistic creativity. Human speakers do not
view metaphor as a problem but as a solution. It is time our
computational systems took a similarly constructive view
of this remarkably creative cognitive tool.
In this vein, Metaphor Magnet continues to evolve as a
creative web service. In addition to providing metaphors
on demand, the service now also provides a poetic framing
facility, whereby the space of possible interpretations for a
given metaphor is crystallized into a single poetic form.
More generally, poetry can be viewed as a means of
reducing information overload, by summarizing a complex
metaphor or the set of texts retrieved using that metaphor
via creative IR whose interpretation entails a rich space
of affective possibilities. A poem can thus be seen in
functional terms as both an information summarization tool
and as a visualization device. Metaphor Magnet adopts a
simple, meaning-driven approach to poetry generation:
given a topic T, a set of candidate metaphors with the
desired affective slant is generated. One metaphor is
chosen at random, and the elements of its interpretation are
sampled to produce different lines of the resulting poem.
Each element, and the sentiment it best evokes, is rendered
in natural language using one of a variety of poetic tropes.
For example, Metaphor Magnet produces the following
as a distillation of the space of feelings and associations
that arise from the interpretation of Marriage is a Prison:
The legalized regime of this marriage
My marriage is a tight prison
The most unitary federation scarcely organizes so much
Intimidate me with the official regulation of your prison
Let your close confines excite me
O Marriage, you disgust me with your undesirable security
Each time we dip into the space of possible interpretations,
a new poem is produced. One can use Metaphor Magnet to
sample the space at will, hopping from one interpretation
to the next, or from one poem to another. Here is an
alternate rendition of the same metaphor in poetic form:
The official slavery of this marriage
My marriage is a legitimate prison
No collective is more unitary, or organizes so much
Intimidate me with the official regulation of your prison
Let your sexual degradation charm me
O Marriage, you depress me with your dreary consecration
In the context of our earlier worked example, which
generated a space of metaphors to negatively describe
Microsofts perceived misuse of power, consider the
following, which distills the assertion Microsoft is a
Monopoly into an aggressive ode:
No Monopoly Is More Ruthless
Intimidate me with your imposing hegemony
No crime family is more badly organized,
or controls more ruthlessly
Haunt me with your centralized organization
Let your privileged security support me
O Microsoft, you oppress me with your corrupt reign
Poetry generation in Metaphor Magnet is a recent addition
to the service, and its workings are beyond the scope of the
current paper (though they may be observed in practice by
visiting the aforementioned URL). For details of a related
approach to poetry generation one that also uses the
stereotype-bearing similes described in Veale (2012) the
reader is invited to read Colton, Goodwin & Veale (2012).
Metaphor Magnet forms a key element in our vision of a
Creative Web, in which web services conveniently provide
creativity on tap to any third-party software application
that requests it. These services include ideation (e.g. via
metaphor generation & knowledge discovery), composition
(e.g. via analogy, bisocation & conceptual blending) and
framing (via poetry generation, joke & story generation,
etc.). Since CC does not distinguish itself through distinct
algorithms or representations, but through its unique goals
Proceedings of the Fourth International Conference on Computational Creativity 2013 22
and philosophy, such a pooling of services will not only
help the field achieve a much-needed critical mass, it will
facilitate a greater penetration of CC ideas and approaches
into the commercial software industry.
Acknowledgements
This research was supported by the WCU (World Class
University) program under the National Research
Foundation of Korea (Ministry of Education, Science and
Technology of Korea, Project no. R31-30007).
References
Almuhareb, A. and Poesio, M. 2004. Attribute-Based and Value-
Based Clustering: An Evaluation. In Proc. of EMNLP 2004.
Barcelona.
Almuhareb, A. and Poesio, M. 2005. Concept Learning and
Categorization from the Web. In Proc. of the 27
th
Annual
meeting of the Cognitive Science Society.
Barnden, J. A. 2006. Artificial Intelligence, figurative language
and cognitive linguistics. In: G. Kristiansen, M. Achard, R.
Dirven, and F. J. Ruiz de Mendoza Ibanez (Eds.), Cognitive
Linguistics: Current Application and Future Perspectives,
431-459. Berlin: Mouton de Gruyter.
Brants, T. and Franz, A. 2006. Web 1T 5-gram Ver. 1. Linguistic
Data Consortium.
Brennan, S. E. and Clark, H. H. 1996. Conceptual Pacts and
Lexical Choice in Conversation. Journal of Experimental
Psychology: Learning, Memory and Cognition, 22(6):1482-93.
Colton, S., Goodwin, J. and Veale, T. 2012. Full-FACE Poetry
Generation.In Proc. of ICCC 2012, the 3rd International
Conference on Computational Creativity. Dublin, Ireland.
Fass, D. 1991. Met*: a method for discriminating metonymy and
metaphor by computer. Computational Linguistics 17(1):49-90.
Fass, D. 1997. Processing Metonymy and Metaphor.
Contemporary Studies in Cognitive Science & Technology.
New York: Ablex.
Fishelov, D. 1992. Poetic and Non-Poetic Simile: Structure,
Semantics, Rhetoric. Poetics Today, 14(1), 1-23.
Hanks, P. 2005. Similes and Sets: The English Preposition like.
In: Blatn, R. and Petkevic, V. (Eds.), Languages and
Linguistics: Festschrift for Fr. Cermak. Charles Univ., Prague.
Hanks, P. 2006. Metaphoricity is gradable. In: Anatol
Stefanowitsch and Stefan Th. Gries (Eds.), Corpus-Based
Approaches to Metaphor and Metonymy,. 17-35. Berlin:
Mouton de Gruyter.
Hearst, M. 1992. Automatic acquisition of hyponyms from large
text corpora. In Proc. of the 14
th
Int. Conf. on Computational
Linguistics, pp 539545.
Martin, J. H. 1990. A Computational Model of Metaphor
Interpretation. New York: Academic Press.
Mason, Z. J. 2004. CorMet: A Computational, Corpus-Based
Conventional Metaphor Extraction System. Computational
Linguistics, 30(1):23-44.
Mihalcea, R. 2002. The Semantic Wildcard. In Proc. of the
LREC Workshop on Creating and Using Semantics for
Information Retrieval and Filtering. Spain, May 2002.
Navigli, R. and Velardi, P. 2003. An Analysis of Ontology-based
Query Expansion Strategies. Proc. of the workshop on
Adaptive Text Extraction and Mining (ATEM 2003), at
ECML, the 14
th
European Conf. on Machine Learning, 4249
Salton, G. 1968. Automatic Information Organization and
Retrieval. New York: McGraw-Hill.
Shutova, E. 2010. Metaphor Identification Using Verb and Noun
Clustering. In the Proc. of the 23
rd
International Conference
on Computational Linguistics, 1001-1010.
Taylor, A. 1954. Proverbial Comparisons and Similes from
California. Folklore Studies 3. Berkeley: University of
California Press.
Turney, P.D. and Littman, M.L. 2005. Corpus-based learning of
analogies and semantic relations. Machine Learning 60(1-
3):251-278.
Van Rijsbergen, C. J. 1979. Information Retrieval. Oxford:
Butterworth-Heinemann.
Veale, T. 2004. The Challenge of Creative Information Retrieval.
Computational Linguistics and Intelligent Text Processing:
Lecture Notes in Computer Science, Vol. 2945/2004, 457-467.
Veale, T. and Hao, Y. 2007a. Making Lexical Ontologies
Functional and Context-Sensitive. In Proc. of the 46
th
Annual
Meeting of the Assoc. of Computational Linguistics.
Veale, T. and Hao, Y. 2007b. Comprehending and Generating
Apt Metaphors: A Web-driven, Case-based Approach to
Figurative Language. In Proc. of AAAI 2007, the 22
nd
AAAI
Conference on Artificial Intelligence. Vancouver, Canada.
Veale, T. and Hao, Y. 2008. Talking Points in Metaphor: A
concise, usage-based representation for figurative processing.
In Proceedings of ECAI2008, the 18th European Conference
on Artificial Intelligence. Patras, Greece, July 2008.
Veale. T. 2011. Creative Language Retrieval: A Robust Hybrid of
Information Retrieval and Linguistic Creativity. In Proc. of
ACL2011, the 49
th
Annual Meeting of the Association for
Computational Linguistics: Human Language Technologies.
Veale, T. 2012. Exploding the Creativity Myth: The
Computational Foundations of Linguistic Creativity. London:
Bloomsbury Academic.
Vernimb, C. 1977. Automatic Query Adjustment in Document
Retrieval. Information Processing & Mgmt, 13(6):339-53.
Voorhees, E. M. 1994. Query Expansion Using Lexical-Semantic
Relations. In the proc. of SIGIR 94, the 17th International
Conference on Research and Development in Information
Retrieval. Berlin: Springer-Verlag, 61-69.
Voorhees, E. M. 1998. Using WordNet for text retrieval.
WordNet, An electronic lexical database, 285303. MIT Press.
Way, E. C. 1991. Knowledge Representation and Metaphor.
Studies in Cognitive systems. Holland: Kluwer.
Wilks, Y. 1978. Making Preferences More Active, Artificial
Intelligence 11.
Xu, J. and Croft, B. W. 1996. Query expansion using local and
global document analysis. In Proc. of the 19
th
annual
international ACM SIGIR conference on Research and
development in information retrieval.
Proceedings of the Fourth International Conference on Computational Creativity 2013 23
Evolving Figurative Images Using Expression-Based Evolutionary Art
Jo ao Correia and Penousal Machado
CISUC, Department of Informatics Engineering
University of Coimbra
3030 Coimbra, Portugal
[email protected], [email protected]
Juan Romero and Adrian Carballal
Faculty of Computer Science
University of A Coru na
Coru na, Spain
[email protected], [email protected]
Abstract
The combination of a classier system with an evolutionary
image generation engine is explored. The framework is com-
posed of an object detector and a general purpose, expression-
based, genetic programming engine. Several object detec-
tors are instantiated to detect faces, lips, breasts and leaves.
The experimental results show the ability of the system to
evolve images that are classied as the corresponding objects.
A subjective analysis also reveals the unexpected nature and
artistic potential of the evolved images.
Introduction
Expression based Evolutionary Art (EA) systems have, in
theory, the potential to generate any image (Machado and
Cardoso 2002; McCormack 2007). In practice, the evolved
images depend on the representation scheme used. As a con-
sequence, the results of expression-based EA systems tend
to be abstract images. Although this does not represent a
problem, there is a desire to evolve gurative images by evo-
lutionary means since the start of EA. An early example of
such an attempt can be found in the work of Steven Rooke
(World 1996).
McCormack (2005; 2007) identied the problem of nd-
ing a symbolic-expression that corresponds to a known tar-
get image as one of the open problems of EA. More ex-
actly, the issue is not nding a symbolic-expression, since
this can be done trivially as demonstrated by Machado and
Cardoso (2002), the issue is nding a compact expression
that provides a good approximation of the target image
and that takes advantage of its structure. We address this
open problem by generalizing the problem i.e., instead of
trying to match a target image we evolve individuals that
match a given class of images (e.g. lips).
The issue of evolving gurative images has been tackled
by two main types of approach: (i) Developing tailored EA
systems which resort to representations that promote the dis-
covery of gurative images, usually of a certain kind; (ii)
Using general purpose EA systems and developing tness
assignment schemes that guide the system towards gura-
tive images. In the scope of this paper we are interested in
the second approach.
Romero et al. (2003) suggest combining a general pur-
pose evolutionary art system with an image classier trained
to recognize faces, or other types of objects, to evolve
images of human faces. Machado, Correia, and Romero
(2012a) presented a system that allowed the evolution of
images resembling human faces by combining a general-
purpose, expression-based, EA system with an off-the-shelf
face detector. The results showed that it was possible
to guide evolution and evolve images evocative of human
faces.
Here, we demonstrate that other classes of object can
be evolved, generalizing previous results. The autonomous
evolution of gurative images using a general purpose EC
system has rarely been accomplished. As far as we know,
evolving different types of gurative images using the same
expression-based EC system and the same approach has
never been accomplished so far (with the exception of user-
guided systems).
We show that this can be attained with off-the-shelf clas-
siers classiers, which indicates that the approach is gen-
eralizable, and also with purpose-built ones, which indicates
that it is relatively straightforward to customize it to specic
needs. We chose a rather ad-hoc set of classiers in an at-
tempt to demonstrate the generality of the approach.
The remainder of this paper is structured as follows: A
brief overview of the related work is made in the next sec-
tion; Afterwards we describe the approach for the evolution
of objects describing the framework, the Genetic Program-
ming (GP) engine, the object detection system, and tness
assignment; Next we explain the experimental setup, the re-
sults attained and their analysis and; Finally we draw overall
conclusions and indicate future research.
Related Work
The use of Evolutionary Computation (EC) for the evolution
of gurative images is not new. Baker (1993) focuses on the
evolution of line drawings, using a GP approach. Johnston
and Caldwell (1997) use a Genetic Algorithm (GA) to re-
combine portions of existing face images, in an attempt to
build a criminal sketch artist. With similar goals, Frowd,
Hancock, and Carson (2004) use a GA, Principal Compo-
nents Analysis and eigenfaces to evolve human faces. The
evolution of cartoon faces (Nishio et al. 1997) and cartoon
face animations (Lewis 2007) through GAs has also been ex-
plored. Additionally, Lewis (2007) evolved human gures.
The previously mentioned approaches share two common
aspects: the systems have been specically designed for the
Proceedings of the Fourth International Conference on Computational Creativity 2013 24
evolution a specic type of image; the user guides evolu-
tion by assigning tness. The work of Baker (1993) is an
exception, the system can evolve other types of line draw-
ings, however it is initialized with hand-built line drawings
of human faces.
These approaches contrast with the ones where general
purpose evolutionary art tools, which have not been de-
signed for a particular type of imagery, are used to evolve
gurative images. Although the images created by their
systems are predominantly abstract, Steven Rooke (World
1996) and Machado and Romero (see, e.g., 2011), among
others, have successfully evolved gurative images using
expression-based GP systems and user guided evolution.
More recently, Secretan et al. (2011) created picbreeder, a
user-guided collaborative evolutionary engine. Some of the
images evolved by the users are gurative, resembling ob-
jects such as cars, butteries and owers.
The evolution of gurative images using hardwired t-
ness functions has also been attempted. The works of by
Ventrella (2010) and DiPaola and Gabora (2009) are akin
to a classical symbolic regression problem in the sense that
a target image exists and the similarity between the evolved
images and the target image is used to assign tness. In addi-
tion to similarity, DiPaola and Gabora (2009) also consider
expressiveness when assigning tness. This approach results
in images with artistic potential, which was the primary goal
of these approaches, but that would hardly be classied as
human faces. As far as we know, the difculty to evolve a
specic target image, using symbolic regression inspired ap-
proaches, is common to all classical expression-based GP
systems.
The concept of using a classier system to assign tness
is also a researched topic: in the seminal work of Baluja,
Pomerlau, and Todd (1994) an Articial Neural Network
trained to replicate the aesthetic assessments is used; Saun-
ders and Gero (2001) employ a Kohonen Self-Organizing
network to determine novelty; Machado, Romero, and Man-
aris (2007) use a bootstrapping approach, relying on a neural
network, to promote style changes among evolutionary runs;
Norton, Darrell, and Ventura (2010) train Articial Neural
Networks to learn to associate low-level image features to
synsets that function as image descriptors and use the net-
works to assign tness.
Overview of the Approach
Figure 1 depicts an overview of the framework, which is
composed of two main modules, an evolutionary engine and
a classier.
The approach can be summarized as follows:
1. Random initialization of the population;
2. Rendering of the individuals, i.e., genotype-phenotype
mapping;
3. Apply the classier to each phenotype;
4. Use the results of the classication to assign tness; This
may require assessing internal values and intermediate re-
sults of the classication;
Figure 1: Overview of the system.
5. Select progenitors; Apply genetic operators, create de-
scendants; Use the replacement operator to update the
current population;
6. Repeat from 2 until some stopping criterion is met.
The framework was instantiated with a general-purpose
GP-based image generation engine and with a Haar Cascade
Classier. To create a tness function able to guide evolu-
tion it is necessary to convert the binary output of the detec-
tor to one that can provide suitable tness landscape. This
is attained by accessing internal results of the classication
task that give an indication of the degree of certainty in the
classication. In the following sections we explain the com-
ponents of the framework, namely, the evolutionary engine,
the classier and the tness function.
Genetic Programming Engine
The EC engine used in these experiments is inspired by the
works of Sims (1991). It is a general purpose, expression-
based, GP image generation engine that allows the evolution
of populations of images. The genotypes are trees composed
of a lexicon of functions and terminals. The function set is
composed of simple functions such as arithmetic, trigono-
metric and logic operations. The terminal set is composed
of two variables, x and y, and randomly initialized constants.
The phenotypes are images that are rendered by evaluating
the expression-trees for different values of x and y, which
serve both as terminal values and image coordinates. In
other words, to determine the value of the pixel in the (0,0)
coordinates one assigns zero to x and y and evaluates the
expression-tree (see gure 2). A thorough description of the
GP engine can be found in (Machado and Cardoso 2002).
Figure 3 displays typical imagery produced via interactive
evolution using this EC system.
Object Detection
For classication purposes we use Haar Cascade classiers
(Viola and Jones 2001). The classier assumes the form of a
cascade of small and simple classiers that use a set of Haar
features (Papageorgiou, Oren, and Poggio 1998) in combi-
nation with a variant of the Adaboost (Freund and Schapire
1995), and is able to attain efcient classiers. This classi-
cation approach was chosen due to its state of the art rele-
vance and for its fast classication. Both code and executa-
bles are integrated in the OpenCV API
1
.
The face detection process can be summarized as follows:
1
OpenCV http://opencv.org/
Proceedings of the Fourth International Conference on Computational Creativity 2013 25
Figure 2: Representation scheme with examples of functions
and the corresponding images.
Figure 3: Examples of images generated by the evolutionary
engine using interactive evolution.
1. Dene a window of size w (e.g. 20 20).
2. Dene a scale factor s greater than 1. For instance 1.2
means that the window will be enlarged by 20%.
3. Dene W and H has the size of the input image.
4. From (0, 0) to (W, H) dene a sub-window with a start-
ing size of w for calculation.
5. For each sub-window apply the cascade classier. The
cascade has a group of stage classiers, as represented in
gure 4. Each stage is composed, at its lower level, of
a group of Haar features 5. Apply each feature of each
corresponding stage to the sub-window. If the resulting
value is lower than the stage threshold the sub-window is
classied as a non-object and the search terminates for the
sub-window. If it is higher continue to next stage. If all
cascade stages are passed, the sub-window is classied as
^t
&
&
E
d
&
&
E
d
&
K
^
^ E
&
E ^
E K
Figure 4: Cascade of classiers with N stages, adapted from
(Viola and Jones 2001).
Figure 5: The set of possible features, adapted from (Lien-
hart and Maydt 2002).
containing an object.
6. Apply the scale factor s to the window size w and repeat
5 until window size exceeds the image in at least one di-
mension.
Fitness Assignment
The process of tness assignment is crucial from an evolu-
tionary point of view, and therefore it holds a large impor-
tance for the success of the described system. The goal is to
evolve images that the object detector classies as an object
of the positive class. However, the binary output of the de-
tector is inappropriate to guide evolution. A binary function
gives no information of how close an individual is to being
a valid solution to the problem and, as such, the EA would
be performing, essentially, a random search. It is necessary
to extract additional information from the classication de-
tection process in order to build a suitable tness function.
This is attained by accessing internal results of the classi-
cation task that give an indication of the degree of certainty
in the classication. Based on results of past experiments
(Machado, Correia, and Romero 2012a; 2012b) we employ
the following tness function:
fitness(x) =
nstagesx
X
i
stagedif
x
(i)i+nstages
x
10 (1)
The underlying rational is the following: images that
go through several classication stages, and closer to be
classied as an object, have higher tness than those re-
jected in early stages. Variables nstages
x
and stagedif
x
(i)
Proceedings of the Fourth International Conference on Computational Creativity 2013 26
Table 1: Haar Training parameters.
Parameter Setting
Number of stages 30
Min True Positive rate per stage 99.9%
Max False Positive rate per stage 50%
Object Width 20 or 40(breasts,leaf)
Object Height 20 or 40(leaf)
Haar Features ALL
Number of splits 1
Adaboost Algorithm GentleAdaboost
are extracted from the object detection algorithm. Variable
nstages
x
, holds the number of stages that image, x, has suc-
cessfully passed. That is, an image that passes several stages
is likely to be closer of being recognized as having a object
than one that passes fewer stages. In other words, passing
several stages is a pre-condition to be classied as having
the object. Variable stagedif
x
(i) holds the maximum dif-
ference between the threshold necessary to overcome stage i
and the value attained by the image at the i
th
stage. Images
that are clearly above the thresholds are preferred over ones
that are only slightly above them. Obviously, this tness
function is only one of the several possible ones.
Experimentation
Within the scope of this paper we intend to evolve the fol-
lowing objects: faces, lips, breasts and leaves. For the rst
two we use off-the-shelf classiers that were already trained
and used by other researchers in different lines of investiga-
tion (Lienhart and Maydt 2002; Lienhart, Kuranov, and Pis-
arevsky 2003; Santana et al. 2008). For the last two we cre-
ated our own classiers, by choosing suitable datasets and
training the respective object classier.
In order to construct an object classier we need to con-
struct two datasets: (i) positive examples of images that
contain the object we want to detect; (ii) negative images
that do not contain the object. Furthermore, for the positive
examples, we must identify the location of the object in the
images (see gure 6) in order to build the ground truth le
that will be used for training.
For these experiments, the negative dataset was attained
by picking images from a random search using image search
engines, and from the Caltech-256 Object Category dataset
(Grifn, Holub, and Perona 2007). Figure 7 depicts some
of the images used as negative instances. In what concerns
the positive datasets: the breast object detector was built by
searching images on the web; the leaf dataset was obtained
from the Caltech-256 Object Category dataset and from web
searches. As previously mentioned, the face and lip detector
are off-the-shelf classiers. Besides choosing datasets we
must also dene the training parameters. Table 1 presents
the parameters used for training of the cascade classier.
The success of the approach is related to the performance
of the classier itself. By dening a high number of stages
we are creating several stages that the images must over-
come to be considered a positive example. The high true
positive rate ensures that almost every positive example is
Figure 6: Examples of images used to train a cascade classi-
er for leaf detection. On the top row the original image, on
the bottom row the croped example used for training.
learned per stage. The max false positive rate creates some
margin for error, allowing the training to achieve the mini-
mum true positive rate per stage and a low positive rate at
the end of the cascade. Similar parameters were used and
discussed in (Lienhart, Kuranov, and Pisarevsky 2003).
Once the classiers are obtained, they are used to assign
tness in the course of the evolutionary runs in an attempt
to nd images that are recognized as faces, lips, breasts and
leaves. We performed 30 independent evolutionary runs for
each of these classes. In summary we have 4 classiers, with
30 independent EC runs, totaling 120 EC runs.
The settings of the GP engine, presented in table 2, are
similar to those used in previous experimentation in different
problem domains. Since the classiers used only deal with
greyscale information, the GP engine was also limited to the
generation of greyscale images. The population size used
in this experiments 100 while in previous experiments we
used a population size of 50 (Machado, Correia, and Romero
2012a). This allows us to sample a larger portion of the
search space, contributing to the discovery of images that t
the positive class.
In all evolutionary runs the GP engine was able to evolve
images classied as the respective objects. Similarly to
the behavior reported by Machado, Correia, and Romero
(2012a; 2012a), the GP engine was able to exploit weak-
nesses of the classier, that is, the evolved images are classi-
ed as the object but, from a human perspective, they often
fail to resemble the object. In gure 8 we present exam-
ples of such failures. As it can be observed, it is hard to
recognize breasts, faces, leafs or lips in the presented im-
ages. It is important to notice that these weaknesses are not
a byproduct of the tness assignment scheme, as such they
cannot be solved by using a different tness function, nor
particular to the classiers used. Although different classi-
Proceedings of the Fourth International Conference on Computational Creativity 2013 27
Figure 7: Examples of images belonging to the negative
dataset used for training the cascade classiers.
Table 2: Parameters of the GP engine. See (Machado and
Cardoso 2002) for a detailed description.
Parameter Setting
Population size 100
Number of generations 100
Crossover probability 0.8 (per individual)
Mutation probability 0.05 (per node)
Mutation operators sub-tree swap, sub-tree
replacement, node insertion,
node deletion, node mutation
Initialization method ramped half-and-half
Initial maximum depth 5
Mutation max tree depth 3
Function set +, , , /, min, max, abs,
neg, warp, sign, sqrt,
pow, mdist, sin, cos, if
Terminal set x, y, random constants
ers have different weaknesses, we conrmed that several of
the evolved images that do not resemble faces are also rec-
ognized as faces by commercially available and widely used
classiers.
These results have opened a series of possibilities, includ-
ing the use of this approach to assess the robustness of object
detection systems, and also the use of evolved images as part
of the training set of these classiers in order to overcome
some of their shortcomings. Although we already are pur-
suing that line of research and promising results have been
obtained (Machado, Correia, and Romero 2012b), it is be-
yond the scope of the current paper.
When one builds a face detector, for instance, one is typ-
ically interested in building one that recognizes faces of all
types, sizes, colors, sexes, in different lighting conditions,
against clear and cluttered backgrounds, etc. Although the
inclusion of all these examples may lead to a robust clas-
(a) (b)
(c) (d)
Figure 8: Examples of evolved images identied as objects
by the classiers that do not resemble the corresponding ob-
jects from a human perspective. This images were recog-
nized as breasts (a), faces (b), leafs (c) and lips (d).
sier that is able to detect all faces present in an image, it
will also means that this classier will be prone to recognize
faces even when only relatively few features are present. In
contrast, when building classiers for the purpose described
in this paper, one may select for positive examples clear and
iconic images. Such classiers would probably fail to iden-
tify a large portion of real-world images containing the ob-
ject. However, they are would be extremely selective and,
as such, the evolutionary runs would tend to converge to im-
ages that clearly match the desired object. Thus, although
this was not explored, building a selective classier can sig-
nicantly reduce the number of runs that converge to atypi-
cal images such as the ones depicted in gure 8.
According to our subjective assessment, some runs were
able to nd images that actually resemble the object that we
are trying to evolve. These add up to 6 runs from the face
detector, 5 for the lip detector, 4 for the breast detector and
4 for the leaf detector.
In gures 9,10, 11 and 12 we show, according to our
subjective assessment, some of the most interesting images
evolved. These results allow us to state that, at least in some
instances, the GP engine was able to create gurative images
evocative of the objects that the object detector was design
to recognize as belonging to the positive class.
By looking at the faces, gure 9, we can observe the pres-
ence of at least 3 facial features per image (such as eyes,
lips, nose and head contour). The images from the rst row
have been identied by users as resembling wolverine. The
Proceedings of the Fourth International Conference on Computational Creativity 2013 28
Figure 9: Examples of some of the most interesting images
that have been evolved using face detection to assign tness.
ones of the second row, particularly the one on the left, have
been identied as masks (more specically african masks).
In what concerns the images from the last row, we believe
that their resemblance ghost-like cartoons is striking.
In what concerns the images resulting fromthe runs where
a lip detector was used to assign tness, we consider that
their resemblance with lips, caricatures of lips, or lip logos,
is self evident. The iconic nature of the images from the last
row is particularly appealing to us.
The results obtained with the breast detector reveal im-
ages with well-dened or exaggerated features. We found
little variety in these runs, with changes occurring mostly
at the pixel intensity and contrast level. As previously men-
tioned, most of these runs resulted in unrecognizable images
(see gure 8), which is surprising since the nature of the
function set would lead us to believe that it should be rela-
tively easy to evolve such images. Nevertheless, the success-
ful runs present images that are clearly evocative of breasts.
Finally the images from the leaf detector, vary in type and
shape. They share however a common feature they tend to
be minimalist, resembling logos. In each of the images of
the rst row the detector identied two leaf shapes. On the
Figure 10: Examples of some of the most interesting im-
ages that have been evolved using a detector of lips to assign
tness.
Figure 11: Examples of some of the most interesting images
that have been evolved using a detector of breasts to assign
tness.
Proceedings of the Fourth International Conference on Computational Creativity 2013 29
Figure 12: Examples of some of the most interesting images
that have been evolved using a detector of leafs to assign
tness.
others a single leaf shape was detected.
In general, when the runs successfully evolve images that
actually resemble the desired object, they tend to generate
images that exaggerate the key features of the class. This
is entirely consistent with the tness assignment scheme
that values images that are recognized with a high degree
of certainty. This constitutes a valuable side effect of the
approach, since the evolution of caricatures and logos ts
our intention to further explore these images from a artistic
and design perspective. The convergence to iconic, exagger-
ated instances of the class, may indicate the occurrence of
the Peak Shift Principle, but further testing is necessary to
conrm this interpretation of the results.
Conclusions
The goal of this paper was to evolve different gurative im-
ages by evolutionary means, using a general-purpose expres-
sion based GP image generation engine and object detec-
tors. Using the framework presented by Machado, Correia,
and Romero (2012a), several object detectors were used to
evolve images that resemble: faces, lips, breasts and leafs.
The results from 30 independent runs per each classier
shown that is possible to evolve images that are detected as
the corresponding objects and that also resemble that object
from a human perspective. The images tend to depict an
exaggeration of the key features of the associated object, al-
lowing the exploration of these images in design and artistic
contexts.
The paper makes 3 main contributions, addressing: (i) A
well-known open problem in evolutionary art; (ii) The evo-
lution of gurative images using a general-purpose expres-
sion based EC system; (iii) The generalization of previous
results.
The open problem of nding a compact symbolic expres-
sion that matches a target image is addressed by generaliza-
tion: instead of trying to match a target image we evolve
individuals that match a given class. Previous results (see
(Machado, Correia, and Romero 2012a)) concerned only the
evolution of faces. Here we demonstrate that other classes
of objects can be evolved. As far as we know, this is the
rst autonomous system that proved able to evolve differ-
ent types of gurative images. Furthermore the experimen-
tal results show that this is attainable with off-the-shelf and
purpose build classiers, demonstrating that the approach is
both generalizable and customizable.
Currently, we are performing additional tests with differ-
ent object detectors in order to expand the types of imagery
produced.
The next steps will comprise the following: combine, re-
ne and explore the evolved images, using them in user-
guided evolution and automatic tness assignment schemes;
combine multiple object detectors to help rene the evolved
images (for instance use a face detector rst and an eye or a
lip detector next); use the evolved examples that are seen as
shortcomings of the classier to rene the training set and
boost the existing detectors.
Acknowledgements
This research is partially funded by: the Portuguese Foun-
dation for Science and Technology in the scope of project
SBIRC (PTDC/EIAEIA/115667/2009) and of the iCIS
project (CENTRO-07-ST24-FEDER-002003), which is co-
nanced by QREN, in the scope of the Mais Centro Program
and European Unions FEDER; Xunta de Galicia Project
XUGA?PGIDIT10TIC105008PR.
References
Baker, E. 1993. Evolving line drawings. Technical Report
TR-21-93, Harvard University Center for Research in Com-
puting Technology.
Baluja, S.; Pomerlau, D.; and Todd, J. 1994. Towards au-
tomated articial evolution for computer-generated images.
Connection Science 6(2):325354.
DiPaola, S. R., and Gabora, L. 2009. Incorporating char-
acteristics of human creativity into an evolutionary art al-
gorithm. Genetic Programming and Evolvable Machines
10(2):97110.
Freund, Y., and Schapire, R. E. 1995. A decision-theoretic
generalization of on-line learning and an application to
Proceedings of the Fourth International Conference on Computational Creativity 2013 30
boosting. In Proceedings of the Second European Confer-
ence on Computational Learning Theory, EuroCOLT 95,
2337. London, UK, UK: Springer-Verlag.
Frowd, C. D.; Hancock, P. J. B.; and Carson, D. 2004.
EvoFIT: A holistic, evolutionary facial imaging technique
for creating composites. ACM Transactions on Applied Per-
ception 1(1):1939.
Grifn, G.; Holub, A.; and Perona, P. 2007. Caltech-256
object category dataset. Technical Report 7694, California
Institute of Technology.
Johnston, V. S., and Caldwell, C. 1997. Tracking a crimi-
nal suspect through face space with a genetic algorithm. In
B ack, T.; Fogel, D. B.; and Michalewicz, Z., eds., Hand-
book of Evolutionary Computation. Bristol, New York: In-
stitute of Physics Publishing and Oxford University Press.
G8.3:18.
Lewis, M. 2007. Evolutionary visual art and design. In
Romero, J., and Machado, P., eds., The Art of Articial
Evolution: A Handbook on Evolutionary Art and Music.
Springer Berlin Heidelberg. 337.
Lienhart, R., and Maydt, J. 2002. An extended set of haar-
like features for rapid object detection. In International Con-
ference on Image Processing, volume 1, I900 I903 vol.1.
Lienhart, R.; Kuranov, E.; and Pisarevsky, V. 2003. Empir-
ical analysis of detection cascades of boosted classiers for
rapid object detection. In DAGM 25th Pattern Recognition
Symposium, 297304.
Machado, P., and Cardoso, A. 2002. All the truth about
NEvAr. Applied Intelligence, Special Issue on Creative Sys-
tems 16(2):101119.
Machado, P., and Romero, J. 2011. On evolutionary
computer-generated art. The Evolutionary Review: Art, Sci-
ence, Culture 2(1):156170.
Machado, P.; Correia, J.; and Romero, J. 2012a. Expression-
based evolution of faces. In Evolutionary and Biologically
Inspired Music, Sound, Art and Design - First International
Conference, EvoMUSART 2012, M alaga, Spain, April 11-
13, 2012. Proceedings, volume 7247 of Lecture Notes in
Computer Science, 187198. Springer.
Machado, P.; Correia, J.; and Romero, J. 2012b. Improv-
ing face detection. In Moraglio, A.; Silva, S.; Krawiec, K.;
Machado, P.; and Cotta, C., eds., Genetic Programming -
15th European Conference, EuroGP 2012, M alaga, Spain,
April 11-13, 2012. Proceedings, volume 7244 of Lecture
Notes in Computer Science, 7384. Springer.
Machado, P.; Romero, J.; and Manaris, B. 2007. Exper-
iments in computational aesthetics: An iterative approach
to stylistic change in evolutionary art. In Romero, J., and
Machado, P., eds., The Art of Articial Evolution: A Hand-
book on Evolutionary Art and Music. Springer Berlin Hei-
delberg. 381415.
McCormack, J. 2005. Open problems in evolutionary mu-
sic and art. In Rothlauf, F.; Branke, J.; Cagnoni, S.; Corne,
D. W.; Drechsler, R.; Jin, Y.; Machado, P.; Marchiori, E.;
Romero, J.; Smith, G. D.; and Squillero, G., eds., EvoWork-
shops, volume 3449 of Lecture Notes in Computer Science,
428436. Springer.
McCormack, J. 2007. Facing the future: Evolutionary pos-
sibilities for human-machine creativity. In Romero, J., and
Machado, P., eds., The Art of Articial Evolution: A Hand-
book on Evolutionary Art and Music. Springer Berlin Hei-
delberg. 417451.
Nishio, K.; Murakami, M.; Mizutani, E.; and N., H. 1997.
Fuzzy tness assignment in an interactive genetic algorithm
for a cartoon face search. In Sanchez, E.; Shibata, T.; and
Zadeh, L. A., eds., Genetic Algorithms and Fuzzy Logic Sys-
tems: Soft Computing Perspectives, volume 7. World Scien-
tic.
Norton, D.; Darrell, H.; and Ventura, D. 2010. Establishing
appreciation in a creative system. In Proceedings of the First
International Conference Computational Creativity, 2635.
Papageorgiou, C. P.; Oren, M.; and Poggio, T. 1998. A gen-
eral framework for object detection. In Sixth International
Conference on Computer Vision, 555562.
Romero, J.; Machado, P.; Santos, A.; and Cardoso, A. 2003.
On the development of critics in evolutionary computation
artists. In G unther, R., et al., eds., Applications of Evolution-
ary Computing, EvoWorkshops 2003: EvoBIO, EvoCOM-
NET, EvoHOT, EvoIASP, EvoMUSART, EvoSTOC, volume
2611 of LNCS. Essex, UK: Springer.
Santana, M. C.; D eniz-Su arez, O.; Ant on-Canals, L.; and
Lorenzo-Navarro, J. 2008. Face and facial feature detec-
tion evaluation - performance evaluation of public domain
haar detectors for face and facial feature detection. In Ran-
chordas, A., and Ara ujo, H., eds., VISAPP (2), 167172.
INSTICC - Institute for Systems and Technologies of Infor-
mation, Control and Communication.
Saunders, R., and Gero, J. 2001. The digital clockwork
muse: A computational model of aesthetic evolution. In
Wiggins, G., ed., AISB01 Symposium on Articial Intelli-
gence and Creativity in Arts and Science, 1221.
Secretan, J.; Beato, N.; DAmbrosio, D. B.; Rodriguez, A.;
Campbell, A.; Folsom-Kovarik, J. T.; and Stanley, K. O.
2011. Picbreeder: A case study in collaborative evolution-
ary exploration of design space. Evolutionary Computation
19(3):373403.
Sims, K. 1991. Articial evolution for computer graphics.
ACM Computer Graphics 25:319328.
Ventrella, J. 2010. Self portraits with mandelbrot genet-
ics. In Proceedings of the 10th international conference
on Smart graphics, SG10, 273276. Berlin, Heidelberg:
Springer-Verlag.
Viola, P., and Jones, M. 2001. Rapid object detection using
a boosted cascade of simple features. Computer Vision and
Pattern Recognition, IEEE Computer Society Conference on
1:511.
World, L. 1996. Aesthetic selection: The evolutionary art of
steven Rooke. IEEE Computer Graphics and Applications
16(1).
Proceedings of the Fourth International Conference on Computational Creativity 2013 31
Fitness Functions for Ant Colony Paintings
Penousal Machado and Hugo Amaro
CISUC, Department of Informatics Engineering
University of Coimbra
3030 Coimbra, Portugal
[email protected], [email protected]
Abstract
A creativity-support tool for the creation of non-
photorealistic renderings of images is described. It
employs an evolutionary algorithm that evolves the parame-
ters governing the behavior of ant species, and the paintings
are produced by simulating the behavior of these articial
ants. The design of tness functions, using both behavioral
and image features is discussed, emphasizing the rationale
and intentions that guided the design. The analysis of the
experimental results obtained by using different tness
functions focuses on assessing if they convey the intentions
of the tness function designer.
Introduction
Machado and Pereira (2012) presented a non-photorealistic
rendering (NPR) algorithm inspired on ant colony ap-
proaches: the trails of articial ants were used to produce a
rendering of an original input image. One of the novel char-
acteristics of this algorithm is the adoption of scalable vector
graphics, which contrasts with the pixel based approaches
used in most ant painting algorithms, and enables the cre-
ation of resolution independent images. The trail of each ant
is represented by a continuous line of varying width, con-
tributing to the expressiveness of the NPRs.
In spite of the potential of this generative approach, the
number of parameters controlling the behavior of the ants
and their interdependencies was soon revealed to be too
large to allow their tuning by hand. The results of these at-
tempts revealed that only a small subset of the creative pos-
sibilities allowed by the algorithm was being explored.
To tackle this problem, Machado and Pereira (2012)
presented a human-in-the-loop Genetic Algorithm (GA) to
evolve the parameters, allowing the users to guide the algo-
rithm according to their preferences and avoiding the need to
understand the intricacies of the algorithm. Thus, instead of
being forced to perform low-level changes, the users of this
creativity-support tool become breeders of species of ants
that produce results that they nd valuable. The experimen-
tal results highlight the range of imagery that can be evolved
by the system showing its potential for the production of
large-format artworks.
This paper describes a further step in the automation of
the space exploration process and departure from low-level
modication and assessment. The users become designers
of tness functions, which are used to guide evolution, lead-
ing to results that are consistent with the user intentions. To
this end, while the ants paint, statistics describing their be-
havior are gathered. Once each painting is completed image
features are calculated. These behavioral and image features
are the basis for the creation of the tness functions.
Human-in-the-loop in evolutionary art systems are often
used as creativity-support tools and thought to have the po-
tential for exploratory creativity. Allowing the users to de-
sign tness functions by specifying desired combinations of
characteristics provides an additional level of abstraction,
enabling the users to focus on their intents and overcoming
the user fatigue problem. Additionally, this approach opens
the door for evaluating the system by comparing the intents
of the user with the outcomes of the process.
We begin with a short survey of related work. Next, in
the third section, we describe the system, focusing on the
behavior of the ants and on the evolutionary algorithm. In
the fourth section we present experimental results, making a
brief analysis. Finally, we draw some conclusions and dis-
cuss aspects to be addressed in future work.
State of the Art
In this section we make a survey of related works, focus-
ing on systems that use articial ants for image generation
purposes and on systems where evolutionary computation is
employed for NPR purposes.
Tzafestas (2000) presents a system where articial ants
pick-up and deposit food, which is represented by paint, and
studies the self-regulation properties and complexity of the
system and resulting images. Ramos and Almeida (2000)
explore the use of ant systems for pattern recognition pur-
poses. The articial ants successfully detect the edges of
the images producing stylized renderings of the originals
and smooth transitions between different images. The artis-
tic potential of these approaches is explored in later works
(Ramos 2002) and thorough his collaboration with the artist
Leonel Moura, resulting in several robotic swarm drawings
(Moura 2002). Urbano (2005; 2007; 2011) presents several
multi-agent systems based on articial ants.
Aupetit et al. (2003) introduce an interactive GA for the
creation of ant paintings. The algorithm evolves parameters
of the rules that govern the behavior of the ants. The arti-
cial ants deposit paint on the canvas as they move, thus pro-
Proceedings of the Fourth International Conference on Computational Creativity 2013 32
Figure 1: Screenshot of the graphic user interface. Control panel on the left and current population of ant paintings on the right.
ducing a painting. In a later study, Monmarch e et al. (2007)
rene this approach exploring different rendering modes.
Greeneld (2005) presents an evolutionary approach to the
production of ant paintings and explores the use of behav-
ioral statistics of the articial ants to automatically assign t-
ness. Later Greeneld (2006) adopted a multiple pheromone
model where ants movements and behaviors are inuenced
(attracted or repelled) by both an environmentally generated
pheromone and an ant generated pheromone.
The use of evolutionary algorithms to create image lters
and NPRs of source images has been explored by several re-
searchers. Focusing on the works where there was an artistic
goal, we can mention the research of: Ross et al. (2006)
and Neufeld et al. (2007), where Genetic Programming
(GP), multi-objective optimization techniques, and an em-
pirical model of aesthetics are used to automatically evolve
image lters; Lewis (2004), evolves live-video processing
lters through interactive evolution; Machado et al. (2002),
use GP to evolve image coloring lters from a set of ex-
amples; Yip (2004) employs GAs to evolve lters that pro-
duce images that match certain features of a target image;
Collomosse (2006; 2007) uses image salience metrics to de-
termine the level of detail for portions of the image, and
GAs to search for painterly renderings that match the de-
sired salience maps; Hewgill and Ross (2003) use GP to
evolve procedural textures for 3D objects; Machado and
Graca (2008) employ GP to evolve assemblages of 3D ob-
jects that are an artistic representation of an input image.
The Framework
The system is composed of two main modules: the evolu-
tionary engine and the painting algorithm. A graphic user
interface gives access to these modules (see Fig. 1). Each
genotype of the GA population encodes the parameters of
a species of ants. These parameters determine how that ant
species reacts to the input image. Each painting is produced
by simulating the behavior of ants of a given species while
they travel across the canvas, leaving a trail of varying width
and transparency.
In the following sections we describe the framework.
First, we present the painting algorithm. Next, we describe
Figure 2: On the left, an ant with ve sensory vectors. On
the middle, the living canvas of an ant species. On the right,
its painting canvas.
the evolutionary component. Finally, we detail the behav-
ioral and image features that are gathered.
The Painting Algorithm
Our ants live on the 2D world provided by the input image
and they paint on a painting canvas that is initially empty
(i.e., black). Both living and painting canvas have the same
dimensions and the ants move simultaneously on both can-
vas. The painting canvas is used exclusively for depositing
ink and has no interference with the behavior of the ants.
Each ant has a position, color, deposit transparency and en-
ergy; all the remaining parameters are shared by the entire
species. If the energy of an ant is bellow a given threshold it
dies, if is is above a given threshold it generates offspring.
The luminance of an area of the living canvas represents
the available energy, i.e. food, at that point. Therefore, ants
may gain energy by traveling through bright areas. The en-
ergy consumed by the ant is removed from the living canvas,
as will be explained later in detail.
The ants movement is determined by how they react to
light. Each ant senses the environment by looking in sev-
eral directions (see Fig. 2). We use 10 sensory vectors, each
vector has a given direction relative to the current direction
of the ant and a length. The sensory organs return the lumi-
nance value of the area where each vector ends. To update
the position of an ant one performs a weighted sum, calcu-
lating the sum of the sensory vectors divided by their norms,
Proceedings of the Fourth International Conference on Computational Creativity 2013 33
multiplied by the luminance of their end point and by the
weight the ant gives to each sensor. The result of this op-
eration is multiplied by a scaling scalar that represents the
ants base speed. Subsequently, to represent inaccuracy of
movement and sensory organs, the direction is perturbed by
the addition of Perlin (1985) noise to its angle.
The ant simulation algorithm is composed of the follow-
ing steps:
1. Initialization: n ants are placed on the canvas on pre-
established positions; Each ants assumes the color of the
area where it was placed; Their energy and deposit trans-
parencies are initialized using the species parameters;
2. For each ant:
(a) Update the ants energy;
(b) Update the energy of the environment;
(c) Place ink on the painting canvas;
(d) If the ants energy is bellow the death threshold remove
the ant from the colony;
(e) If the ants energy is above the reproduction threshold
generate an offspring; The offspring assumes the color
of the position where it was created and a percentage
of the energy of the progenitor (which loses this en-
ergy); The offspring inherits the velocity of the par-
ent, but a perturbation is added to the angular velocity
by randomly choosing an angle between descvel
min
and descvel
max
(both values are species parameters);
Likewise, the deposit transparency is inherited from the
progenitor but a perturbation is included by adding a
randomly choosen a value between dtransp
min
and
dtransp
max
;
(f) Update ants position;
3. Repeat from 2 until no living ants exist;
Steps (b) and (c) require further explanation. The con-
sumption of energy is attained by drawing on the living can-
vas a black circle of size equal to energy cons
rate
of a
given transparency (cons
trans
) . Ink is deposited on the
paining canvas by drawing a circle of the color of the ant
which is attributed when the ant is born with a size
given by (energy deposit
rate
) and of given transparency
(deposit
transp
). Fig. 2 depicts the living and painting can-
vas of an ant species during the simulation process. It is
important to notice that the color of an ant is determined at
birth. Thus, the ants may carry this color to areas of the
canvas that possess different colors in the original image. A
detailed description of the painting algorithm can be found
in Machado and Pereira (2012).
Evolutionary Engine
As previously mentioned, we employ a GA to evolve the
ant species parameters. The genotypes are tuples of oat-
ing point numbers which encode the parameters of the ant
species. The size of the genotype depends on the exper-
imental settings. Table 1 presents an overview of the en-
coded parameters. We use a two point crossover operator
for recombination purposes and a Gaussian mutation opera-
tor. We employ tournament selection and an elitist strategy,
Table 1: Parameters encoded by the genotype
Name # Comments
gain 1 scaling for energy gains
decay 1 scaling for energy decay
consrate 1 scaling for size of circles drawn
on the living canvas
constrans 1 transparency of circles drawn on
the living canvas
depositrate 1 scaling for size of circles drawn
on the painting canvas
deposittransp 1 base transparency of circles drawn
on the painting canvas
dtranspmin 1
limits for perturbation of deposit
transparency when offsprings are
generated
dtranspmax 1
initialenergy 1 initial energy of the starting ants
death
threshold
1 death energy treshold
birth
threshold
1 generate offspring energy thresh-
old
descvelmin 1
limits for perturbation of angular
velocity when offsprings are
generated
descvelmax 1
vel 1 base speed of the ants
noisemin 1 limits for the perlin noise
generator function noisemax 1
initialpositions 2 n initial coordinates of the n ants
placed on the canvas
sensoryvectors 2 m direction and length of the m sen-
sory vectors
sensory
weights
m weights of the m sensory vectors
the highest ranked individual proceeds unchanged to the
next population.
The Features
During the simulation of each ant species the following be-
havioral statistics are collected:
avg(ants) Average number of living ants;
coverage Proportion of the living canvas visited by the
ants; An area is considered to be visited if, at least, one
ant consumed resources from that area;
deposited
ink
The total amount of ink deposited by the
ants; This is calculated by multiplying the area of each
circle drawn by the ants by the opacity (i.e. 1
transparency) used to draw it.
avg(trail), std(trail) The average trail length and the
standard deviation of the trail lengths, respectively;
avg(life), std(life) The average life span of the ants and
its standard deviation, respectively;
avg(distance) The average euclidean distance between the
position where the ant was born and the one where it died;
avg(avg(width)), std(avg(width)) For each trail we cal-
culate its average width, then we calculate the average
width of all trails, avg(avg(width)), and the standard de-
viation of the averages, std(avg(width));
Proceedings of the Fourth International Conference on Computational Creativity 2013 34
avg(std(width)), std(std(width)) For each trail we cal-
culate the standard deviation of its width, then we cal-
culate their average, avg(std(width)), and their standard
deviation std(std(width));
avg(avg(av)), std(avg(av)), avg(std(av)), std(std(av))
These statistics are analogous to the ones regarding trail
width, but pertaining to the angular velocity of the ants;
When the simulation of each ant species ends we calculate
the following image features:
complexity the image produced by the ants, I, is encoded
in jpeg format, and its complexity estimated using the
following formula:
complexity(I) = rmse(I, jpeg(I))
s(jpeg(I))
s(I)
,
where rmse stands for the root mean square error, jpeg(I)
is the image resulting from the jpeg compression of I,
and s is the le size function
fract
dim
, lac The fractal dimension of the ant painting es-
timated by the box-counting method and its lacunar-
ity value estimated by the Sliding Box method (Karperien
2012), respectively;
inv(rmse) The similarity between the ant painting and the
original image estimated as follows:
inv(rmse) =
1
1 + rmse(I, O)
,
where I is the ant painting and O is the original image;
Experimental results
The results presented in this section were obtained using the
following experimental setup: Population Size = 25; Tourna-
ment size = 5; Crossover probability = 0.9; Mutation Prob-
ability = 0.1 (per gene); Initial Position of the ants = the
image is divided in 3 3 rectangles of the same size and
one ant is placed at the center of each of these rectangles;
Initial number of ants = 9; Maximum number of ants = 250;
Maximum number of simulation steps 1000. Thus, when
the drawing stage starts each ant species is represented by
nine ants. However, these ants may generate offspring dur-
ing simulation, increasing the number of ants in the canvas.
Typically, interactive runs had 30 to 40 generations, al-
though some were signicantly longer. The runs conducted
using explicit tness functions lasted 50 generations. For
each tness function we conducted 10 independent runs.
User Guided Runs
Machado and Pereira (2012) describe and analyze results at-
tained in the course of user guided runs. In Fig. 3 we depict
some of the individuals evolved in those runs, with the goal
of giving a avor of the different types of imagery that were
evolved.
Figure 3: Examples from user guided runs.
Using Features Individually
To test the evolutionary algorithm we performed runs where
each feature, with the exception of frac
dim
and lac, was
used as tness function. Maximizing the values of fractal
dimension and lacunarity would lead to results that we nd
uninteresting. Therefore, we established for these features
by measuring the fractal dimension and lacunarity of one of
our favorite ant paintings evolved in user guided runs, 1.5
and 0.95, respectively, and the maximum tness is obtained
when these values are reached. For these two features, t-
ness is assigned by the following formula:
fitness =
1
1 + |target
value
feature
value
|
In Fig. 4 we present the evolution of tness across the
evolutionary runs. To avoid clutter we only present a subset
of the considered tness functions. In general, the evolu-
tionary algorithm was able to nd, in all runs and for all
features, individuals with high tness in relatively few gen-
erations. Unsurprisingly, and although it is subjective to say
it, the runs tended to converge to ant paintings that, at least
in our eyes, are inferior to the ones created in the course of
interactive runs. Fig. 5 depicts the individuals that obtained
the maximum tness value for the corresponding image fea-
tures. These individuals are representative of the imagery
evolved in the corresponding runs.
It worth to notice that high complexity is obtained
by evolving images with abrupt transitions from black to
white. This results in high frequencies that make jpeg com-
pression inefcient, thus resulting in high complexity esti-
mates. The results attained with lacunarity yield paintings
with gaps between lines, revealing the black background,
Proceedings of the Fourth International Conference on Computational Creativity 2013 35
0
0.2
0.4
0.6
0.8
1
0 10 20 30 40 30
!
"
#
$
%
&
'
(
)
*
,
'
-
.
)
/
/
0).)#%1".
avg(anLs)
avg(avg(wldLh))
avg(dlsLance))
avg(sLd(av))
avg(llfe))
avg(Lrall)
coverage
deposlLed_lnk
fracL_dlm
lnv(rmse)
Figure 4: Evolution of the maximum tness. The results are
averages of 10 independent runs. The results have been nor-
malized to allow the presentation of the results using distinct
tness functions in the same chart.
which matches the texture of the image from where the tar-
get lacunarity value was collected. This contrasts with the
results obtained using fract
dim
, while the algorithm was
able to match the target fractal dimension value, the images
produced are radically different from the targets image. The
inv(rmse) runs revealed images that reproduce the original
with some degree of delity, showing that this feature can
promote similarity between the painting and the original.
The results obtained using a single behavioral feature are
uninteresting in the context of NPR. They tend to fall in
two categories, either they constitute poor variations of
the original or they are unrecognizable versions of it.
Combining Behavioral and Image Features
From the beginning it was clear that it would be necessary
to combine several features to attain our goals. To make the
tness function design process easy to understand, and thus
allow inexperienced users to design their own tness func-
tions, we decided that all tness functions should assume the
form of a weighted sum.
Since different features have different ranges of values,
it is necessary to normalize them, otherwise some features
would outweigh the others. Additionally, angular velocity
may be negative, so we should consider the absolute values.
Considering these issues, normalization is attained by the
following formula:
norm(feature) = abs
feature
oinemax(feature)
,
where oinemax returns the maximum value found in the
course of the runs described in the previous section for the
feature in question.
This modication is not sufcient to prevent the evolu-
tionary algorithm to focus exclusively on a single feature.
To minimize this problem, we consider a logarithmic scale
so that the evolutionary advantage decreases as the feature
value becomes higher, promoting the discovery of individ-
uals that use all features employed in the tness function.
This is accomplished as follows:
lognorm(feature) = log(1 + norm(feature))
(a) (b)
(c) (d)
Figure 5: The individuals that obtained the maximum t-
ness value for: (a) Complexity; (b) inv(rmse); (c) lac; (d)
fract
dim
.
All the tness functions that combine several features are
weighted sums of the lognorm of each of the features used.
However, for the sake of simplicity we will only mention
the feature names when writing their formulas. From here
onwards feature should be read as lognorm(feature).
Next we describe several tness functions that combine a
variable number of features. The analysis of the experimen-
tal results of evolutionary art systems is subjective by nature.
As such, more than presenting measures of performance that
would be meaningless when considering the goals of our
system, we focus on describing the intentions behind the de-
sign of each tness function, and make a subjective analysis
of the results based on the comparison between the evolved
paintings and our original design intentions.
f1: coverage + complexity + lac
The design of this tness function was prompted by the
results obtained in previous tests. The goal is to evolve
ant paintings where the entire canvas is visited, with high
complexity, and with a lacunarity value of 0.95.
As it can be observed in Fig. 6 the evolved paintings suc-
cessfully match these criteria. By comparing them with the
ones presented in Fig. 5 one can observe how lacunarity
inuences texture, complexity leads high frequencies, and
coverage promotes visiting most of the canvas.
f2: inv(rmse) 0.5 complexity
The rationale for this tness function is obtaining a good
approximation to the original image while keeping the com-
plexity low. Thus, we wish to obtain a simplied version of
Proceedings of the Fourth International Conference on Computational Creativity 2013 36
Figure 6: Two of the ttest images evolved using f1.
Figure 7: Two of the ttest images evolved using f2.
Figure 8: Two of the ttest images evolved using f3.
the original. Preliminary tests indicate the tendency of the
algorithm to focus, exclusively, on minimizing complexity,
which was achieved by producing images that were entirely
black. Since this sort of image exists in the initial popula-
tions of the runs, this is a case of premature convergence. To
circumvent it we decreased the weight given to complexity,
which allowed the algorithm to escape this local optimum.
Although the results are consistent with the design (see
Fig. 7) they do not depict the degree of abstraction and sim-
plication we intended. As such, they should be considered
a failure since they do not match our design intentions.
f3: avg(std(width))+std(avg(width))avg(avg(width))+
inv(rmse)
Here we focus on the width of the lines being drawn
Figure 9: Two of the ttest images evolved using f4 (rst
row), f5 (second row) and f6 (third row).
promoting the evolution of ant paintings with lines with
high variations of width, avg(std(width)), heterogeneous
widths among lines, std(avg(width)), and thin lines,
avg(avg(width)). To avoid radical deviations from the
original drawing we also value inv(rmse).
The experimental results, Fig. 8, depict these characteris-
tics, however to fully observe the intricacies of the ant paint-
ings a resolution higher than the space constraints of this
paper allows would be required.
f4: avg(std(av)) + inv(rmse) + coverage
f5: avg(avg(av)) avg(std(av)) + inv(rmse) + coverage
f6: avg(avg(av)) +avg(std(av)) +inv(rmse) +coverage
When designing f4-f6 we focused on controlling line di-
rection. In f4 we use avg(std(av)) to promote the ap-
pearance of lines that often change direction. In f5 we
use avg(avg(av)) avg(std(av)) to encourage the appear-
ance of circular motifs (high angular velocity and low vari-
ation of velocity). Finally, f6 is a renement of f4 with
Proceedings of the Fourth International Conference on Computational Creativity 2013 37
Figure 10: Results obtained by applying an individual from
the f4 runs to different input images.
avg(avg(av)) preventing the appearance of circular pat-
terns, valuing trails that curve in both directions, attaining
an average angular velocity close to zero.
In all cases, the addition of inv(rmse) and coverage
serves the goal of evolving ant paintings with some similar-
ity to the original and that visit a large portion of the canvas.
In Fig 9 we present some of the outcomes of this expe-
riences. As it can be observed the evolved images closely
match our expectations and, as such, we consider them to be
some of the most successful runs.
Once the individuals are evolved the ant species may be
applied to different input images, hopefully resulting in ant-
paintings that share the characteristics that we value. This
is one of the key aspects of the system: although nding
a valuable ant species may be time consuming, once it is
found it can be applied with ease to other images producing
large-scale NPR of them. In Fig. 10 we present ant paintings
created by this method.
Conclusions
We presented a creativity-support tool that aids the users by
providing a wide variety of paintings, which are arguably
consistent with the intentions of the users, and which they
would be unlikely to imagine on their own. While using this
tool the users become designers of tness functions, which
are built using a combination of behavioral and image fea-
tures. We reported the results obtained, focusing on the com-
parison between the evolved ant-paintings and the design in-
tentions that led to the development of each tness function.
Overall the results indicate that it is possible, to some ex-
tent, to convey design intention through tness functions,
leading to the discovery of individuals that match these in-
tentions. This allows the users to operate at a higher level of
abstraction than in user guided runs, circumventing the user-
fatigue problem typically associated with interactive evolu-
tion. The analysis of the results also reveals the discovery of
high-quality ant paintings that are radically different from
the ones obtained through interactive evolution.
Although the system serves the user intents, different runs
converge to different, and sometimes highly dissimilar, im-
ages. Each tness function can be maximized in a multitude
of ways, some of which are quite unexpected. As such, we
argue that the system opens the realm of possibilities that
are consistent with the intents expressed by the user, often
surprising him/her in the process.
On the downside, as the f2 runs reveal, in some cases the
design intentions are not fully conveyed by the evolved ant
paintings. It is also worth mentioning that interactive runs
allow opportunistic reasoning, which may allow the discov-
ery of unexpected and highly valued ant paintings.
The adoption of a semi-automatic tness assignment
scheme, such as the one presented by Machado et al. (2005),
is one of the directions for further research. It also become
obvious that we only began to scratch the vast number of
possibilities provided by the design of tness functions. In
the future, we will invite users that are not familiar with the
system to design their own tness functions, which will al-
low us to assess the difculty of the task for regular users.
Proceedings of the Fourth International Conference on Computational Creativity 2013 38
Acknowledgements
This research is partially funded by the Portuguese Foun-
dation for Science and Technology in the scope of project
SBIRC (PTDC/EIAEIA/115667/2009) and of the iCIS
project (CENTRO-07-ST24-FEDER-002003), which is co-
nanced by QREN, in the scope of the Mais Centro Program
and European Unions FEDER.
References
Aupetit, S.; Bordeau, V.; Monmarch e, N.; Slimane, C.; and
Venturini, G. 2003. Interactive Evolution of Ant Paintings.
In IEEE Congress on Evolutionary Computation, volume 2,
13761383.
Collomosse, J. P. 2006. Supervised genetic search for pa-
rameter selection in painterly rendering. In Applications of
Evolutionary Computing, EvoWorkshops 2006, 599610.
Collomosse, J. 2007. Evolutionary search for the artistic
rendering of photographs. In Romero, J., and Machado, P.,
eds., The Art of Articial Evolution: A Handbook on Evolu-
tionary Art and Music. Springer Berlin Heidelberg. 3962.
Greeneld, G. 2005. Evolutionary methods for ant
colony paintings. In Applications of Evolutionary Comput-
ing, EvoWorkshops2005: EvoBIO, EvoCOMNET, EvoHOT,
EvoIASP, EvoMUSART, EvoSTOC, volume 3449 of LNCS,
478487. Lausanne, Switzerland: Springer Verlag.
Greeneld, G. 2006. Ant Paintings using a Multiple
Pheromone Model. In Bridges.
Hewgill, A., and Ross, B. J. 2003. Procedural 3d tex-
ture synthesis using genetic programming. Computers and
Graphics 28:569584.
Karperien, A. 2012. Fraclac for imagej, version 2.5.
In http://rsb.info.nih.gov/ij/plugins/fraclac/FLHelp/Introduc
tion.htm.
Lewis, M. 2004. Aesthetic video lter evolution in an inter-
active real-time framework. In Applications of Evolution-
ary Computing, EvoWorkshops2004: EvoBIO, EvoCOM-
NET, EvoHOT, EvoIASP, EvoMUSART, EvoSTOC, volume
3005 of LNCS, 409418. Coimbra, Portugal: Springer Ver-
lag.
Machado, P., and Graca, F. 2008. Evolutionary pointil-
list modules: Evolving assemblages of 3d objects. In
Applications of Evolutionary Computing, EvoWorkshops
2008: EvoCOMNET, EvoFIN, EvoHOT, EvoIASP, Evo-
MUSART, EvoNUM, EvoSTOC, and EvoTransLog, Naples,
Italy, March 26-28, 2008. Proceedings, volume 4974 of Lec-
ture Notes in Computer Science, 453462. Springer.
Machado, P., and Pereira, L. 2012. Photogrowth: non-
photorealistic renderings through ant paintings. In Soule,
T., and Moore, J. H., eds., Genetic and Evolutionary Com-
putation Conference, GECCO 12, Philadelphia, PA, USA,
July 7-11, 2012, 233240. ACM.
Machado, P.; Romero, J.; Cardoso, A.; and Santos, A.
2005. Partially interactive evolutionary artists. New Genera-
tion Computing Special Issue on Interactive Evolutionary
Computation 23(42):143155.
Machado, P.; Dias, A.; and Cardoso, A. 2002. Learning
to colour greyscale images. The Interdisciplinary Journal
of Articial Intelligence and the Simulation of Behaviour
AISB Journal 1(2):209219.
Monmarch e, N.; Mahnich, I.; and Slimane, M. 2007. Arti-
cial art made by articial ants. In Romero, J., and Machado,
P., eds., The Art of Articial Evolution: A Handbook on Evo-
lutionary Art and Music. Springer Berlin Heidelberg. 227
247.
Moura, L. 2002. Swarm paintings non-human. In ARCHI-
TOPIA Book, Art, Architecture and Science. 124.
Neufeld, C.; Ross, B.; and Ralph, W. 2007. The evolution
of artistic lters. In Romero, J., and Machado, P., eds., The
Art of Articial Evolution: A Handbook on Evolutionary Art
and Music. Springer Berlin Heidelberg. 335356.
Perlin, K. 1985. An image synthesizer. In Cole, P.; Heil-
man, R.; and Barsky, B. A., eds., Proceedings of the 12st
Annual Conference on Computer Graphics and Interactive
Techniques, SIGGRAPH 1985, 287296. ACM.
Ramos, V., and Almeida, F. 2000. Articial ant colonies
in digital image habitats - a mass behaviour effect study on
pattern recognition. In In Dorigo, M., Middendorf, M., Stu-
zle, T. (Eds.): From Ant Colonies to Articial Ants - 2 nd Int.
Wkshp on Ant Algorithms, 113116.
Ramos, V. 2002. On the implicit and on the articial - mor-
phogenesis and emergent aesthetics in autonomous collec-
tive systems. In ARCHITOPIA Book, Art, Architecture and
Science. 2557.
Ross, B. J.; Ralph, W.; and Hai, Z. 2006. Evolutionary im-
age synthesis using a model of aesthetics. In Yen, G. G.;
Lucas, S. M.; Fogel, G.; Kendall, G.; Salomon, R.; Zhang,
B.-T.; Coello, C. A. C.; and Runarsson, T. P., eds., Proceed-
ings of the 2006 IEEE Congress on Evolutionary Computa-
tion, 10871094. Vancouver, BC, Canada: IEEE Press.
Tzafestas, E. 2000. Integrating drawing tools with behav-
ioral modeling in digital painting. In Ghandeharizadeh, S.;
Chang, S.-F.; Fischer, S.; Konstan, J. A.; and Nahrstedt, K.,
eds., ACM Multimedia Workshops, 3942. ACM Press.
Urbano, P. 2005. Playing in the pheromone playground:
Experiences in swarm painting. In Rothlauf, F.; Branke, J.;
Cagnoni, S.; Corne, D. W.; Drechsler, R.; Jin, Y.; Machado,
P.; Marchiori, E.; Romero, J.; Smith, G. D.; and Squillero,
G., eds., EvoWorkshops, volume 3449 of Lecture Notes in
Computer Science, 527532. Springer.
Urbano, P. 2007. Mimetic variations on stigmergic swarm
paintings. In Monmarch e, N.; Talbi, E.-G.; Collet, P.; Schoe-
nauer, M.; and Lutton, E., eds., Articial Evolution, vol-
ume 4926 of Lecture Notes in Computer Science, 6272.
Springer.
Urbano, P. 2011. The t. albipennis sand painting artists.
In EvoApplications (2), volume 6625 of Lecture Notes in
Computer Science, 414423. Springer.
Yip, C. 2004. Evolving Image Filters. Masters thesis,
Imperial College of Science, Technology, and Medicine.
Proceedings of the Fourth International Conference on Computational Creativity 2013 39
Adaptation of an Autonomous Creative Evolutionary System for Real-World
Design Application Based on Creative Cognition
Steve DiPaola, Graeme McCaig, Kristin Carlson, Sara Salevati and Nathan Sorenson
School of Interactive Arts and Technology
Simon Fraser University
[email protected], [email protected], [email protected], [email protected], [email protected]
Abstract
This paper describes the conceptual and implementation
shift from a creative research-based evolutionary system
to a real-world evolutionary system for professional de-
signers. The initial system, DarwinsGaze, is a Creative
Genetic Programing system based on creative cognition
theories. It generated artwork that 10,000s of viewers
perceived as human-created art, during its successful run
at peer-reviewed, solo shows at noted museums and art
galleries. In an effort to improve the system for use with
real-world designers, and with multi-person creativity in
mind, we began working with a noted design firm explor-
ing potential uses of our technology to support multi-
variant creative design iteration. This second generation
system, titled Evolver, provides designers with fast,
unique creative options that expand beyond their habitual
selections that can be inserted/extracted from the system
process at any time for modular use at varying stages of
the creative design process. We describe both systems
and the design decisions to adapt our research system,
whose goal was to incorporate creativity automatically
within its algorithms, to our second generation system,
which attempts to take elements of human creativity theo-
ries and populate them as tools back into the process. We
report on our study with the design firm on the adapted
systems effectiveness.
Introduction
Creativity is a complex set of cognitive process theorized
to involve, among other elements, attention shifts between
associative and analytical focus (Gabora, 2010), novel
goals (Luo and Knoblich, 2007), and situated actions and
difficult definitions of evaluation (Christoff et al, 2011).
Computational creative systems strive to model a variety of
creativitys aspects using computer algorithms from evolu-
tionary small-step modifications to intelligent autono-
mous composition and big-leap innovation in an effort to
better understand and replicate creative process (Boden,
2003). The focus by some researchers on replicating crea-
tivity in computational algorithms has been instrumental in
learning more about human cognition (individual and col-
laborative) and how creative support tools might be used to
enhance and augment human creative individuals and
teams. All these aspects continue to evolve our perceptions
of creativity and its role in computation in the current tech-
nology-saturated world.
Systems modeling creativity computationally have
gained acceptance in the last two decades, situated mainly
as artistic and research projects. Several researchers in
computational creativity have addressed questions around
such computational modeling by outlining different dimen-
sions of creativity and proposing schema for evaluating a
"level of creativity" of a given system, for example (Ritch-
ie, 2007; Jennings, 2010; Colton, Pease and Charnley,
2011). While there is ongoing research and scholarly dis-
course about how a system is realized, how the results are
generated, selected and adjusted and how the process and
product are evaluated, there is less research about direct
applications of creative cognitive support systems in real-
world situations. Now that more autonomous, generative
creative systems have been developed, we are re-
evaluating the role of the human collaborator(s) when de-
signing a creative system for real-world applications in an
iterative creative design process environment (Shneider-
man, 2007).
We explore creativity from theories of cognition that
attempt to understand attentional shifts between associative
and analytical focus. The existence of two stages of the
creative process is consistent with the widely held view
that there are two distinct forms of thought (Dartnell, 1993;
Neisser, 1963; Piaget, 1926; Rips, 2001; Sloman, 1996). It
has been proposed that creativity involves the ability to
vary the degree of conceptual fluidity in response to the
demands of any given phase of the creative process (Gabo-
ra, 2000; 2002a; 2002b; 2005). This dimension of variabil-
ity in focus is referred to as contextual focus. Focused at-
tention produces analytic thought, which is conducive to
manipulating symbolic primitives and deducing laws of
cause and effect, while defocused attention produces fluid
or associative thought which is conducive to analogy and
unearthing relationships of correlation. Thus, creativity is
not just a matter of eliminating rules but of assimilating
and then breaking free of them where warranted.
This paper focuses first on the implementation and ap-
plicability of contextual focus through our research system,
DarwinsGaze, developed to use an automatic fitness func-
tion. Second, we present our effort to adapt this successful
Proceedings of the Fourth International Conference on Computational Creativity 2013 40
but specific research system for more general use with re-
al-world designers, and with multi-person creativity in
mind. We worked with a noted design firm to examine
potential uses of our technology for supporting multi-
variant creative design iteration. Our analysis of their pro-
cess combined with our knowledge of the cognitive aspects
of creativity (gleaned from our early research), were used
to completely rewrite the DarwinsGaze system to an inter-
active creativity support tool within a production pipeline.
This 2nd generation system, Evolver, provides designers
with fast, unique options that expand beyond their habitual
selections that can be inserted and extracted from the sys-
tem process at any time for modular use at varying stages
of the creative design process. The changes focused firstly
on usability needs, but became more important when we
saw opportunities for affecting the shifts between contex-
tual and analytical focus of the designer through the
Evolver system. This process required evaluating the real-
world iterative process of designers and testing various
prototypes with designers from the firm Farmboy Fine Arts
(FBFA) to see how they engaged with interactive creativity
support. Lastly we evaluated with a user study the effec-
tiveness of this conversion process and how non-technical
designers appreciated and used this Creative Evolutionary
System. We hope that our experience and evaluation can
be a guide for other researchers to adapt creative research
systems to more robust and user centric real world produc-
tion tools.
The DarwinsGaze System
The DarwinsGaze system (DiPaola and Gabora, 2007) is a
Creative Evolutionary System (CES) (Bentley and Corne,
2002) (see Figure 1) based on a variant of Genetic Pro-
gramming (GP). Unlike typical Genetic Programming sys-
tems this system favors exploration over optimization,
finding innovative or novel solutions over a preconceived
notion of a specific optimal solution. It uses an automatic
fitness function (albeit one specific to portrait painting)
allowing it to function without human intervention be-
tween being launched and obtaining the final, often unan-
ticipated and pleasing set of results; in this specific and
limited sense we refer to DarwinsGaze as "autonomous".
The inspiration for this work is to directly explore to what
extent computer algorithms can be creative on their own
(Gabora and DiPaola, 2012). Related work has begun to
use creative evolutionary systems with automatic fitness
functions in design and music (Bentley and Corne, 2002),
as well as building of a creative invention machine (Koza,
2003). A contribution of the DarwinsGaze work is to mod-
el, in software, newly theorized aspects of human creativi-
ty, especially in terms of fluid contextual focus (see Figure
2).
DarwinsGaze capitalizes on recent developments in GP
by employing a form of GP called Cartesian Genetic Pro-
gramming (CGP) (Miller and Thomson, 2000; Walker and
Miller, 2005). CGP uses GP techniques (crossover, muta-
tion, and survival), but differs in certain key respects. The
Figure 1. Source Darwin image with examples of evolved
abstract portraits created using the DarwinsGaze autono-
mous creative system.
program is represented by a directed graph of indexed
nodes. Each node has a number of inputs and a function
that gives an output based on the inputs. The genotype is a
list of integers determining the connectivity and functional-
ity of the nodes, which can be mutated and mated to create
new directed graphs.
CGP has several features that foster creativity including
1) its node based structure facilitates the creation of visual
mapping modules, 2) its structure can represent complex
computational input/output connectivity, thus accommo-
dating our sophisticated tone and temperature-based color
space model which enables designerly decision making,
and most importantly 3) its component-based approach
favors exploration over optimization by allowing different
genotypes to map to the same phenotype. The last tech-
nique uses redundancy at the input, node, and functional
levels, allowing the genotype to contain nodes that are not
connected to the output nodes and so not expressed in the
phenotype. Having different genotypes (recipes) map to the
same phenotype (output) provides CGP with greater neu-
trality (Yu and Miller, 2005). Our work is based on Ash-
more and Miller's (2004) CGP application to evolve visual
algorithms for enhanced image complexity or circular ob-
jects in an image. Most of their efforts involve initializing
a population and then letting the user take over. Our initial
prototype was based upon their approach, but expanded it
with a more sophisticated similarity and creativity func-
tion, and revised their system for a portrait painter process.
Since the advent of photography, portrait painting has
not just been about accurate reproduction, but also about
using modern painterly goals to achieve a creative repre-
sentation of the sitter. We have created a fitness function
that mainly rewards accurate representation, but given cer-
tain situations it also rewards visual painterly aesthetics
using simple rules of art creation as well as a portrait
knowledge space. Specifically, the painterly portion of our
fitness function 1) weighs for face versus background
composition, 2) uses tonal similarity over exact color simi-
larity matched with a sophisticated artistic color space
model which weighs for warm-cool color temperature rela-
tionships based analogous and complementary color har-
mony rules and 3) employs unequal dominate and subdom-
inant tone and color rules and other artistic rules based on a
portrait painter knowledge domain (DiPaola and Gabora,
2007) as illustrated in Figure 2. We mostly weight heavily
Proceedings of the Fourth International Conference on Computational Creativity 2013 41
towards resemblance, which gives us a structured system,
but can under the influence of functional triggers allow for
artistic creativity. The approach gives us novelty and inno-
vation from within, or better said, responding to a struc-
tured system -- a trait of human creative individuals.
Figure 2. The DarwinsGaze fitness function mimics human
creativity by moving between restrained focus (resemblance)
to more unstructured associative focus (resemblance and
more ambiguous art rules of composition, tonality and color
theory).
Generated portrait programs in the beginning of the run
will look less like the sitter but from an aesthetic point of
view might be highly desirable, since the function set has
been built with painterly rules. Specifically, the fitness
function in the DarwinsGaze system calculates four scores
(resemblance and the three painterly rules) separately and
fluidly combines them in different ways to mimic human
creativity by moving between restrained focus (resem-
blance) to more unstructured associative focus (3 rules of
composition, tonality and color theory). In its default state
the fitness function uses a ratio of 80% resemblance to
20% non-proportional scoring of our three painterly rules.
Several functional triggers can alter this ratio in different
ways. The system will also allow very high scoring of
painterly rule individuals to be accepted into the next
population. When a plateau or local minima is reached for
a certain number of epochs, the fitness function ratio
switches course where painterly rules are weighted higher
than resemblance (on a sliding scale) and work in conjunc-
tion with redundancy at the input, node, and functional
levels. Using this method, in the wider associative mode,
high resemblance individuals are always part of the mix
and when these individuals show a marked improvement in
resemblance, a trigger is set to return to the more focused
80/20 resemblance ratio.
For CES used to create fine art paintings, the evaluation
was based less on the process and more on the output.
Could a closed process (that has no human intervention
once the evolutionary process was started) produce artwork
that was judged as creative using the methods by which
real human artists are judged? Example pieces from the
output over 30 days were framed and submitted to galleries
as a related set of work. Care was taken by the author to
select representational images of the evolved unsupervised
process, however creative human bias obviously exists in
the representational editing process. This is similar to how
a curator chooses a subset of pieces from their artists, so it
was deemed that is does not diminish the soft evaluation
process.
The framed art work (darwinsgaze.com) was accepted
and exhibited at six major galleries and museums including
the TenderPixel Gallery in London, Emily Carr Galley in
Vancouver, and Kings Art Centre at Cambridge University
as well as the MIT Museum, and the High Museum in At-
lanta, all either peer reviewed, juried or commissioned
shows from institutions that typically only accept human
art work. This gallery of abstract portraits of Darwin has
been seen by tens of thousands of viewers who have com-
mented with dated quotes in a gallery journal that they see
the artwork as an aesthetic piece that ebbs and flows
through creative ideas even though they were solely creat-
ed by an evolutionary art computer program using contex-
tual focus. Note that no attempt to create a formalized cre-
ativity Turing Test was made. Most of the thousands of
causal viewers assumed they were looking at human creat-
ed art. The work was also selected for its aesthetic value to
accompany an opinion piece in the journal Nature (Padian,
2008), and was given a strong critical review by the Har-
vard humanities critic, Browne (2009). While these are
subjective measures, they are standards in the art world.
The fact that the computer program produced novel crea-
tive artifacts, both as single art pieces and as a gallery col-
lection of pieces with interrelated themes, is compelling
evidence that the process passed a type of informal creativ-
ity Turing test.
The Shift from Autonomous Creative System
to Creative Support Tool: the Evolver System
To move forward from the DarwinsGaze system we began
looking to explore a real-world application of creativity in
computation by leveraging concepts of contextual focus to
integrate with collaborative process. The opportunity arose
to work with FBFA, an international art consultancy firm
that designs site-specific art collections for the luxury hotel
and corporate sectors, to develop software that could com-
plement and provoke their current iterative design process-
es. The focus on visual design for hotel decor was an inter-
esting perspective that enabled us to consider what we had
achieved with visual creative design in prior work, and
how we could engage in the designers intuitive yet visual
(and hence somewhat parameterized) creative process.
Proceedings of the Fourth International Conference on Computational Creativity 2013 42
In the effort to evaluate a CES within a Visual Design
domain, we explored the use and adaptation of Evolver.
Evolver is a computational creative tool modified from the
DarwinsGaze project structure. Evolver was created as a
result of in-depth research and observations to support a
specific design process at FBFA by automating some of
the design tasks and restructuring the contextual search
space. It provides a platform for brainstorming by generat-
ing various versions of original artwork provided by de-
signers, through specific features such as controlling the
color scheme or marrying different artworks together. It
also offers some production capabilities by automating
repeating tasks such as cropping for mass quantities of
artworks traditionally performed by designers in programs
such as Adobe Photoshop. Evolver incorporates a user-
friendly GUI (see Figure 3) paired with a flexible internal
image representation format for ease of use by the design-
er. The designer provides the seed material and selects pre-
ferred results while the system generates a population of
artwork candidates, cross breeds and mutates the candi-
dates under user control to generate new design products.
The designer may select and extract any resulting candi-
date piece at any stage of the process for use in other areas
or as generative fodder to later projects. System parameters
of Evolver include shapes, colors, layers, patterns, symme-
tries and canvas dimensions.
Developing the Evolver System to Fit the
Needs and Process of a Design Firm
FBFA takes design briefs from the hotel interior designers,
and based on their extensive photo and graphic design da-
tabase as source, designs specific art and design objects in
a multitude of material (although typically wall hanging)
often in unique sizes, shapes and multiples to specifically
work with the hotels (typically their large lobby and res-
taurants) needs. They do this by incorporating a number of
designers who using digital systems like Adobe Illustrator
significantly rework a source design to refit the space,
shape and material specifics.
We began by demonstrating to them an interactive ver-
sion of our DarwinsGaze system, which was mocked up on
the darwinsgaze.com website, called Evolve It to show
what a potentially fully-interactive new system would look
like. The designers process to create a successful proto-
type for the client was a multi-step, iterative and somewhat
inefficient process which relied on the designers feel of
the problem context, the potential solution contexts and
their intuitive exploration and selection process. In this
particular situation designers would discuss a project with
a client, then go to physical boxes or their digital database
containing immense amounts of image material, find seed
material that fits the feeling of the multiple contexts and
then manipulate them to better fit the design problem in
Adobe Illustrator. The designers manipulation adjusts
size, scale, shape, multiples and color in layers by hand.
This process is highly labor-heavy and we felt it was most
receptive to computational support because the designer
had already defined the contextual focus for this problem
through their own interpretation of the available options,
constraints and aesthetic preference (which had already
been confirmed by the client engaging with this company).
While the designers were reluctant to give up control of
their intuitive, creative knowledge, they readily engaged
with the Evolver system once they saw how CESs could
support the restructuring of the designers contextual space
while also reducing the labor-intensive prior process. This
shift freed up the designers ability to creatively engage
with the problem at hand. We strove to make the new sys-
tems flexible to different creative processes and paths that
different designers might have.
Figure 3. The Evolver Interface
Evolvers cognitive aspect provides designers with a plat-
form to externalize and visualize their ideas. Artwork gen-
erated through Evolver can be used for different purposes
in different phases of the design process, from conceptual
design through to presentation. During the early phase of
conceptual design, free-hand, non-rigid sketching tech-
niques have an important role in the formation of creative
ideas as designers externalize their ideas and interact with
them spatially and visually (Suwa, Gero and Purcell,
1998). Evolver supports flexibility of ideas in this phase by
enabling designers to easily produce an extensive range of
alternatives. The ambiguous nature of the multiple genera-
tions produced supports the uncertain and fuzzy nature of
conceptual design as they discover, frame out early ideas
and brainstorm. The alternatives produced relieve cogni-
tive load from the designer by separating them from the
manual task of manipulating the design parameters, but do
not separate them so far from the process that they cannot
use their psychomotor and affective design knowledge.
Evolver is structured to support the shift between con-
textual and analytical focus by restructuring the contextual
space users are working in. Users can choose to relinquish
a degree of control while broadening their focus, gaining
the ability to be inspired or provoked by novel generations
from the system. On the other hand, it is possible to guide
successive evolutions in a more deliberate, analytical way
and the ability of Evolver to import/export individuals
Proceedings of the Fourth International Conference on Computational Creativity 2013 43
to/from a precisely editable format (SVG - Adobe Illustra-
tor) allows tightly focused design directions to be pursued.
At later stages in the design process, artwork generated
through Evolver can be used as mockups for clients and
prototyping, and also as a communication tool in uses such
as presentation at the very end of design process. The work
produced by Evolver can be incorporated directly into the
tool-chain leading to a finished piece.
Evolver Genetic Encoding: Moving to a More
Linear Scheme
One of the most far-reaching design decisions involved in
the construction of an evolutionary system is the specifica-
tion of the genetic encoding. A particular choice of encod-
ing delineates the space of possible images and dramatical-
ly influences the manner in which images can change dur-
ing the course of evolution. The genotype induces a metric
on the space of potential images: certain choices of repre-
sentation will cause certain styles or images to be genet-
ically related and others to be genetically distant. The re-
lated images will appear with much more probability, even
if the distant images are technically possible to represent in
the encoding system. For this reason, it is important that
the genotype causes images that are aesthetically similar to
be genetically related. Relevant aspects of the aesthetic
merit of a work can then be successfully selected for and
combined throughout the course of the evolutionary run.
This property is referred to as gene linkage (Harik et al,
2006). We identified this property as especially important
to an interactive creativity support tool, for designers who
are used to exerting a high degree of creative control over
their output and in a scenario where a certain sense of
high quality design is to be maintained.
A genetic encoding can either be low level, representing
extremely basic atomic elements such as pixels and color
values, or high level, representing more complex structures
such as shapes and patterns. A common low level encoding
is to represent images as the composition of elemental
mathematical functions (Bentley and Corne 2002). Though
it is technically possible that any image can be conceivably
represented as a composition of such functions, this encod-
ing typically results in recognizable geometric patterns that
readily signal the algorithmic nature of the process. A
higher level encoding can be seen in shape grammars that
represent not individual pixels but aggregates of primitive
shapes (Machado et al, 2010). This approach can theoreti-
cally produce a much narrower range of images, but the
images that are produced do not demonstrate the same
highly-mathematical nature of lower-level encodings.
Compared to the CGP genetic structure oI DarwinsGaze,
Evolver uses a list-based, tree-structure encoding that
draws some inspiration Irom CGP but operates on higher-
level components in order to maximize the property oI
gene linkage and user interpretability.
We viewed this new genetic representation as broadly
linear in the sense that the genotype could be decom-
posed into elements and recombined, leading to a corre-
sponding eIIect in the phenotype oI recombining visually
identiIiable elements. The genetic representation is based
on a collection of "design elements" (DEs), which are ob-
jects that denote particular aspects of the image. For exam-
ple, a major component of our image representation is that
of a symbol: a shape that can be duplicated and positioned
on the canvas according to a position, rotation, and scaling
parameter. DEs are defined in terms of atomic values and
composite collections. The DE for a symbol, for example,
is represented as a tuple consisting of two floats represent-
ing the x and y coordinates of the shape, a float represent-
ing the rotation, a float representing the scale, and an enu-
merable variable representing the particular shape graphic
of the symbol. An image is then described by a list of these
symbols. The genetic operations of mutation and crosso-
ver are derived from the structure of the DE definitions.
Mutation is defined for the atomic values as a perturbation
of the current value. Crossover is defined for the collection
structures. The genotype is "strongly typed" so only genes
oI the same type can cross over. (For example, "position"
may cross over with "position" oI another other stamp's
record, "color" may cross over with "color"; however "po-
sition" will never cross over with "color".) Figure 4 shows
an example oI Evolver system output.
Evolver User Interface: Optimizing Creative
Support
To make the power of this flexible encoding system avail-
able to designers, we constructed an automatic import tool
that analyzed existing images and parsed their structure
into DEs that formed initial seed populations for the inter-
active evolution. This approach served to bootstrap the
evolutionary search with images that are known to demon-
strate artistic merit. Source artwork is converted to the
SVG vector image format, which is a tree-based descrip-
tion of the shapes and curves that comprise a vector based
image. The hierarchical grouping of art elements in the
original work is preserved in the SVG format, and is used
in determining which pieces are isolated to form symbol
DEs. We also make use of heuristics that take into account
the size of various potential groupings art elements and any
commonly duplicated patterns to identify candidates for
extraction.
The interactive evolution proceeds from a seed popula-
tion constructed from these original parsed image ele-
ments. The user interface, by default, depicts a population
of 8 pieces of generated art. These individuals can be se-
lected from, to become the parents of the next generation,
as is typical in interactive evolution. An added feature,
which proved useful, was the ability to bookmark individ-
uals, which placed them in a different collection that was
separated from the evolutionary run. This collection of
bookmarked individuals allowed users to store any
Proceedings of the Fourth International Conference on Computational Creativity 2013 44
interesting images discovered during the run while pro-
ceeding to guide the evolution in a different direction.
Figure 4. Example Evolver Output Image
Evaluating Designers Usage and Opinions of
the Evolver System
Some months after the end of the project, with Evolver still
being used and available for real world production at
FBFA, we invited a small group of FBFA and associated
designers to our labs, now under controlled study condi-
tions. There we conducted a 45 minute questionnaire-based
qualitative study that took place in 2 phases: it began with
a uniform re-introduction and re-demonstration of Evolver
and its functionalities, followed by a short session where
the designer had the opportunity to re-explore the tool and
answer a series of nine structured interview questions that
concentrate on the adaptation of Evolver within their cur-
rent and future work practices. The specific questions in
phase two were:
1. What is your first impression of Evolver?
2. How and in which stage would you use this tool in
your current practice?
3. How does this tool change your design process? Can
you provide an existing scenario of your current practice
and how you envision Evolver would change that?
4. Which features of this tool do you find most interest-
ing? Why?
5. What features would you like to change and/or add in
the future? Why?
6. How would you use this tool apart of your design
thinking stage in your process?
7. How does it help with the conceptualization of ideas?
8. What do you think of the role of computational tools
such as Evolver within the Visual Design domain?
9. Do you have any further comments/suggestions for
the future of this research?
The full qualitative study discursive results are beyond
the scope of this paper; however we have included an ex-
emplary set of these results, based on direct quotes from
the designers and our assessment of the dominant themes
in designer responses. Our main takeaways from this study
were:
1. Designers saw Evolver as a creative partner that could
suggest alternatives outside of the normal human cognitive
capacity:
"[The] Human brain is sometimes limited, I find Evolver
to have this unlimited capacity for creativity." (KK, In-
terview)
"Evolver introduces me to design options I never
thought of before, it enhances my design thinking and
helps me to produce abstract out of the norm ideas."
(LA, Interview)
2. Evolver also enhanced the human users ability to enter
a more intuitive or associative mode of thought by easing
some of the effort in manually visualizing alternative de-
sign concepts:
"Sketching stuff out on paper takes more energy and
tweaking - Evolver allows me to visualize easier, have a
dialogue and collaborate with the design space." (RW,
Interview)
3. Evolver could be used flexibly at different stages of the
design process to support different tasks and modes of
thought, including both generation and communication of
ideas
"The best part about the Evolver is that you can stop it at
any stage of generation, edit and feed it back to the en-
gine, also it is mobile and you can take it to meetings
with clients and easily communicate various ideas and
establish a shared understanding. It provides a frame of
reference- what is in your head now." (RW, Interview)
"#$%&'()#* &*+ ,()-.))(#*
We compare the details oI the decisions made to shiIt Irom
the autonomous DarwinsGaze system to the interactive
Evolver system and describe their importance (see Table
1). One oI the Iirst changes was to shiIt the genetic repre-
sentation (or the gene structure). The DarwinsGaze sys-
tem has genes which work together in a tree structure, to
evolve output as a bitmap oI the whole piece. The Evolver
System genes were more linear and 'predictably recombin-
able' in order to minimize contextual Iocus within the sys-
tem while prioritizing a variety oI potentially successIul
solutions. DarwinsGaze used automatic Iitness Iunction-
based Cartesian Genetic Programing while Evolver shiIted
to a simpler and interactive Genetic Algorithm in order to
engage the designer in the system and support their intui-
tive decision-making process. In DarwinsGaze there is no
control over pieces, layers or options Ior interaction in-
volvement. The Evolver system has many layers and ele-
ments and is built on the standards based vector language
(SVG). Using a design-shelI structure the user has more
subtle control including Ieature navigation, text, symmetry
and rotation. The user can either import many small SVG
Iiles as seed material or import a single large Iile and the
Proceedings of the Fourth International Conference on Computational Creativity 2013 45
system will automatically separate and label the elements.
With the user acting as the Iitness Iunction, the population
size can be adjusted and desired results can be book-
marked and set aside for manual iteration or can be re-
inserted into the Evolver systems gene pool. So Ior in-
stance, work that they create traditionally can be used as
partial seed material, used Iully at the start, output at any
time Irom the system as raw inspiration results to be re-
worked traditionally or used as a Iinal result. A careIul
eIIort was made to iteratively develop the graphical user-
interIace based on Ieedback Irom the designers about how
they think within a creative process, what metaphors they
use, and which perspectives and skills they rely on based
on their backgrounds and experience. Finally we integrated
additional post-processing options to give added novelty iI
needed (outside oI the Genetic Algorithm) with eIIects
such as kaleidoscope and multiple panels.
DarwinsGaze System Evolver System
Genes specific to image
resemblance & art rules
Genes linear, strong typed, fo-
cus on existing parameters
Automatic CGP: complex
FF / functional triggering
Interactive Genetic Algorithm:
simple structured forms
Bitmap, evolve-as-a-whole SVG, evolve as labeled layers
Operates autonomously, no
import/export material
Ability to import/export labeled
semantic material HCI based
Research system with spe-
ciIic evolve towards the
sitter images goals
Communicates at any point of
process with trad. design tools
supporting wide creative styles
Innovative / complex auto
Iunctional triggers : analyt-
ical to associative & back
Simpler user-interaction: popu-
lation size, bookmarks to sup-
port human creative triggers
One system : Iull process oI
creativity, no external
communication
Integrated system: built to work
w/ other tools, processes; sup-
ports creativity as an adaptive
human process
InIormed by creativity
theory and simulates it
internally in complex ways
InIormed by creativity theory
but uses it to support a real
world meta system w/humans
Table 1. Comparison Between DarwinsGaze and Evolver
Systems
The study of Evolver in use also made apparent an atti-
tude shift of visual designers towards CESs, which change
their role from sole creators to editors and collaborators.
The designers became more receptive of tools such as
Evolver as they came to view them not as replacing de-
signers or automating the creative process; but rather as
promoting new ways of design thinking, assisting and tak-
ing designers abilities to the next level by providing effi-
ciency and encouraging more aha moments. The visual
designers in the study described Evolver as an invisible
teammate, who they could collaborate with at any stage of
their design process. Evolver became a center of dialogue
among designers and helped them communicate their men-
tal models and understanding of design situations to clients
and other stakeholders.
Conclusions
Many significant research CES systems exist that are both
innovative and useful. However as the field matures, there
will be an increasing need to make CESs production wor-
thy and work within a creative industry environment such
as a digital design firm. To support others in this effort for
production-targeted transformation, in this paper we de-
scribed the shift from an autonomous fitness function
based creative system, DarwinsGaze, to an interactive fit-
ness function based creative support system, Evolver, for
real-world design collaboration. DarwinsGaze operates
using a complex automatic fitness function to model con-
textual focus as well as other aspects of human creativity
simulated internally. In shifting to the Evolver project we
found that the contextual focus perspective remained rele-
vant, but now re-situated to overlay the collaborative pro-
cess between designer and system. Four design principles
developed on this basis were: 1) support analytic focus by
providing tools tailored to the designers specific needs and
aesthetic preferences, 2) support associative or intuitive
focus by relieving the designers cognitive capacity, ena-
bling a quick and serendipitous workflow when desired,
and offering a large variety of parameterized options to
utilize, 3) support a triggering of focus-shift between the
designer and the system through options to bookmark and
save interesting pieces for later, as well as to move creative
material from and to the system while retaining the works
semantic structure and editability, and 4) support a joint
'train of thought' between system and user by structuring a
genotype representation compatible with human visu-
al/cognitive intuition.
We found that the shift to a real-world design scenario
required attention to the collaboration and creative pro-
cesses of the designers who value their experience-
developed expertise. The system design had to act as both
a support tool engaging some cognitive load of the process,
and a flexible, interactive repository of potentially success-
ful options. Future real-world design considerations can
explore methods for adapting intelligent operations to the
cognitive processes and constraints of necessary situations,
taking into account the expertise of collaborators.
Acknowledgements
This research was supported by the Natural Sciences and
Engineering Research Council of Canada and Mitacs
(Canada). We would like to thank the design firm Farmboy
Fine Arts, Liane Gabora, Nahid Karimaghalou, Robb Lov-
Proceedings of the Fourth International Conference on Computational Creativity 2013 46
ell and Sang Mah for agreeing to work on the industri-
al/academic partnership part of the work.
References
Ashmore, L., and Miller, J. 2004. Evolutionary Art with
Cartesian Genetic Programming. Technical Online Report.
http://www.emoware.org/evolutionary_art.asp.
Bentley, P., and Corne, D. eds. 2002. Creative Evolution-
ary Systems, San Francisco, CA.: Morgan Kaufmann.
Boden, M. 2003. The Creative Mind. London: Abacus.
Brown, J. 2009. Looking at Darwin: portraits and the mak-
ing of an icon. Isis. Sept, 100(3):54270.
Dartnell, T. 1993. Artificial intelligence and creativity: An
introduction. Artificial Intelligence and the Simulation of
Intelligence Quarterly 85.
DiPaola, S. and Gabora, L. 2007. Incorporating character-
istics of human creativity into an evolutionary art algo-
rithm. In (D. Thierens, Ed.), Proc Genetic and Evol Com-
puting Conf , 2442 2449. July 7-11, Univ College Lon-
don.
Feist, G.J, 1999. The influence of personality on artistic
and scientific creativity, in Handbook of Creativity, R.J.
Sternberg, Ed. (Cambridge University Press, Cambridge,
UK)
Gabora, L. 2000. Toward a theory of creative inklings. In
(R. Ascott, Ed.) Art, Technology, and Consciousness, In-
tellect Press, Bristol, UK.
Gabora, L. 2002. The beer can theory of creativity, in (P.
Bentley and D. Corne, Eds.) Creative Evolutionary Sys-
tems, 147-161. San Francisco, CA.: Morgan Kaufmann.
Gabora, L.2002. Cognitive mechanisms underlying the
creative process. In (T. Hewett and T. Kavanagh, Eds.)
Proceedings of the Fourth International Conference on
Creativity and Cognition, Oct 13-16, UK, 126-133.
Gabora, L. 2005.Creative thought as a non-Darwinian evo-
lutionary process. Journal of Creative Behavior, 39(4),
6587.
Gabora, L. 2010. Revenge of the neurds: Characterizing
creative thought in terms of the structure and dynamics of
memory. Creativity Research Journal, 22(1), 1-13.
Gabora, L., and DiPaola, S. 2012. How did humans be-
come so creative? Proceedings of the International Confer-
ence on Computational Creativity, 203-210. May 31 - June
1, Dublin, Ireland.
Harik, G. R., Lobo, F. G., and Sastry, K. 2006. Linkage
Learning via Probabilistic Modeling in the Extended Com-
pact Genetic Algorithm (ECGA). In D. M. Pelikan, K.
Sastry, & D. E. CantPaz (Eds.), Scalable Optimization via
Probabilistic Modeling, 3961. Springer Berlin Heidelberg.
Jennings, K. E. 2010. Developing creativity: Artificial bar-
riers in artificial intelligence. Minds and Machines, 1 13.
Koza, J R., Keane, M A., and Streeter, M J. 2003. Evolving
inventions. Scientific American. February. 288(2) 52 59.
Lawson, B. 2006. How designers think: the design process
demystified (4th Edition). Oxford: Elsevier.
Luo, J., and Knoblich, G. 2007. Studying insight with neu-
roscientific methods. Methods, 42, 77-86.
Machado, P., Nunes, H., & Romero, J. 2010. Graph-Based
Evolution of Visual Languages. In C. D. Chio, A. Braba-
zon, G. A. D. Caro, M. Ebner, M. Farooq, A. Fink, N. Ur-
quhart (Eds.), Applications of Evolutionary Computation,
271280. Springer Berlin Heidelberg.
Miller, J. 2011. Cartesian Genetic Programming, Springer.
Miller, J., and Thomson, P. 2000. Cartesian Genetic Pro-
gramming. Proceedings of the 3rd European Conference on
Genetic Programming, 121-132. Edinburgh, UK.
Neisser, U. 1963. The multiplicity of thought. British Jour-
nal of Psychology, 54, 1-14.
Padian, K. 2008. Darwins enduring legacy. Nature,
451(7179), 632634.
Piaget, J. 1926. The Language and Thought of the Child.
Routledge and Kegan Paul, London.
Pease, A. and Colton, S. 2011. On Impact and Evaluation
in Computational Creativity: A Discussion of the Turing
Test and an Alternative Proposal. In Proceedings of the
AISB symposium on AI and Philosophy .
Rips, L. J. 2001. Necessity and natural categories. Psycho-
logical Bulletin, 127(6), 827-852.
Ritchie, G. 2007. Some empirical criteria for attributing
creativity to a computer program. Minds and Machines 17.
Shneiderman, B. 2007. Creativity support tools: accelerat-
ing discovery and innovation. Commun. ACM, 50(12), 20
32.
Schn, D. 1988. Designing: Rules, Types, and Worlds.
Design Studies, 9/3 181-4.
Sloman, S. 1996. The empirical case for two systems of
reasoning. Psychol. Bull. 9(1), 322.
Suwa, M., Gero, J. S., and Purcell, T. 1998. The roles of
sketches in early conceptual design processes. In Proceed-
ings of Twentieth Annual Meeting of the Cognitive Sci-
ence Society, 1043-1048.
Walker, J. and Miller, J. 2005. Improving the Evolvability
of Digital Multipliers Using Embedded Cartesian Genetic
Programming and Product Reduction. Evolvable Systems:
From Biology to Hardware, 6th International Conference,
ICES 2005, Proceedings, Sitges, Spain. Springer.
Yu, T., and Miller, J. 2001. Neutrality and the Evolvability
of Boolean function landscape. Proceedings of the Fourth
European Conference on Genetic Programming, 204-217.
Berlin Springer-Verlag.
Proceedings of the Fourth International Conference on Computational Creativity 2013 47
A Computational Model of Analogical Reasoning in Dementia Care
Konstantinos Zachos and Neil Maiden
Centre for Creativity in Professional Practice
City University London
Northampton Square, London EC1V 0HB, UK
{k.zachos, N.A.M.Maiden}@city.ac.uk
Abstract
This paper reports a practical application of a computa-
tional model of analogical reasoning to a pressing social
problem, which is to improve the care of older people
with dementia. Underpinning the support for carers for
people with dementia is a computational model of ana-
logical reasoning that retrieves information about cases
from analogical problem domains. The model imple-
ments structure-mapping theory adapted to match
source and target domains expressed in unstructured
natural language. The model is implemented as a com-
putational service invoked by a mobile app used by car-
ers during their care shifts.
Dementia Care and Creativity
Dementia is a condition related to ageing. After the age of
65 the proportion of people with dementia doubles for eve-
ry 5 years of age so that one fifth of people over the age of
85 are affected (Alzheimers Society 2010). This equates to
a current total of 750,000 people in the UK with dementia,
a figure projected to double by 2051 when it is predicted to
affect a third of the population either as a sufferer, relative
or carer (Wimo and Prince 2010). Dementia care is often
delivered in residential homes. In the UK, for example, two
in three of all home residents have some form of dementia
(e.g. Wimo and Prince 2010), and delivering the required
care to them poses complex and diverse problems carers
that new software technologies have the potential to over-
come. However, this potential is still to be tapped.
The prevailing paradigm in dementia care is person-
centered care. This paradigm seeks an individualized ap-
proach that recognizes the uniqueness of each resident and
understanding the world from the perspective of the person
with dementia (Brooker 2007). It can offer an important
role for creative problem solving that produces novel and
useful outcomes (Sternberg 1999), i.e. care activities that
both recognize a sense of uniqueness and are new to the
care of the resident and/or carer. However, there is little
explicit use of creative problem solving in dementia care,
let alone with the benefits that technology can provide.
Therefore, the objective of our research was to enable more
creative problem solving in dementia care through new
software technologies.
This paper reports two computational services developed
to support carers to manage challenging behaviors in per-
son-centered dementia care a computational analogical
matching service that retrieves similar challenging behav-
ior cases in less-constrained domains, and a second service
that automatically generates creativity prompts based on
the computed analogical mappings. Both are delivered to
carers through a mobile software app. The next two sec-
tions summarize results from one pre-design study that
motivates the role of analogical matching in managing
challenging behavior in dementia care then describe the
two computational creativity services.
A Pre-Design Study
Creative problem solving is not new to care work. Osborn
(1965) reported that creative problem solving courses were
introduced in nursing and occupational therapy programs
in the 1960s. Le Storti et al. (1999) developed a program
that fostered the personal creative development of student
nurses, challenging them to use creativity techniques to
solve nursing problems. This required a shift in nursing
education from task- to role-orientation and established a
higher level of nursing practice a level that treated nurses
as creative members of health care teams. There have been
calls for creative approaches to be used in the care of peo-
ple with dementia. Successful creative problem solving
was recognized to counteract the negative and stressful
effects that are a frequent outcome of caring for people
with dementia (Help the Aged, 2007). Several current de-
mentia care learning initiatives can be considered creative
in their approaches. These include the adoption of training
courses in which care staff are put physically into resi-
dents shoes, and exercises to encourage participants to
experience life mentally through the eyes of someone with
dementia (Brooker 2007). Caring for people with late stage
dementia is recognized to require more creative approach-
es, and a common theme is the need to deliver care specific
to each individuals behavioral patterns and habits.
To discover the types of dementia care problem more
amenable to this model of creative problem solving, we
observed care work and interviews with carers at one UK
residential home revealed different roles for creative prob-
lem solving in dementia care. One of these roles was to
Proceedings of the Fourth International Conference on Computational Creativity 2013 48
reduce the instances of challenging behavior in residents.
Challenging behavior defined as culturally abnormal be-
havior(s) of such an intensity, frequency or duration that
the physical safety of the person or others is likely to be
placed in serious jeopardy, or behavior which is likely to
seriously limit use of, or result in the person being denied
access to, ordinary community facilities (Bromley and
Emerson 1995). Examples include the refusal of food or
medication, and verbal aggression.
Interviews with carers revealed that creative problem
solving has the potential to generate possible solutions to
reduce instances of challenging behavior. For example, if a
resident is uncooperative with carers when taking medica-
tion, one means to reduce it might be to have a carer wear a
doctors coat when giving the medication. The means is
creative because it can be useful, novel to the resident if
not applied to him before, and novel to the care team who
have not applied it before. Therefore, with carers in the
pilot home, we explored the potential of different creativity
techniques to reduce challenging behavior.
During one half-day workshop with 6 carers we ex-
plored the effectiveness and potential of different creativity
techniques to manage a fictional challenging behavior.
During a three-stage process the carers were presented with
the fictional resident and challenging behavior, generated
ideas to reduce the behavior, then prepared to implement
these ideas. They used different creativity techniques, pre-
sented to them as practical problem solving techniques, to
reduce the fictional challenging behavior. The carers
demonstrated the greatest potential and appetite for the
other exploratory creativity technique, called Other Worlds
(Innovation Story 2002). During the workshop, the carers
sought to generate ideas to reduce the challenging behavior
in four different, less constrained domains - social life,
research, word of mouth and different cultures. These ideas
were then transferred to the care domain to explore their
effectiveness in it. Other Worlds was judged to be the most
effective as well as the most interesting to carers. It created
more ideas than any of the other techniques, and two of the
ideas from the session were deemed sufficiently useful to
implement in the pilot home immediately. Carers singled
out the technique because, unlike others, it purposefully
transferred knowledge and ideas via similarity-based rea-
soning from sources outside of the immediate problem
spaces the resident, residential home and dementia care
domain.
The Carer App
To implement Other Worlds in care work we decided to
develop a mobile software app, called Carer, which carers
can use during their work. In the place of human facilita-
tion, the software retrieves then guides carers to explore
other worlds that are retrieved by the app, and in place of
face-to-face communication, the software was to support
asynchronous communication between carers who would
digitally share information about care ideas and practices
via the software.
The Carer app accesses a digital repository to retrieve
natural language descriptions of cases of good care practice
in XML based on the structure of dementia care case stud-
ies reported by the Social Care Institute for Excellence
(Owen and Meyer 2009) as well as challenging behavior
cases in non-care domains such as teen parenting, student
mentoring and prison life. Each case has two main parts of
up to 150 words of prose each the situation encountered
and the care plan enhancement applied and is attributed
to one class of domain to which the case belongs. The cur-
rent version of the repository contains 115 case descrip-
tions.
Figure 1. The Carer mobile app showing how carers describe
challenging behaviors (on the left-hand side) and a detailed de-
scription of one of these cases (on the right-hand side)
Carer app automatically retrieves the previous cases
using different services in response to natural language
entries typed and/or spoken by a carer into the app. One
supports case-based reasoning with literally similar cases
based on information retrieval techniques, similar to strate-
gies applied to people with chronic diseases (Houts et al.
1996). A second supports the other worlds technique more
generally by automatically generating different domains
such as traveling or cooking in which to generate care plan
enhancements to a current situation without the constraints
of the care domain (Innovation Company 2002). The user
is encouraged to think about how to solve the aggression
situation in the school playground. A simple flick of the
screen will generate a different other world, such as para-
chuting from an aircraft. A third service automatically
generates creativity prompts from retrieved case content.
Lastly, the Carer app invokes AnTiQue, an analogical
reasoning discovery service that matches the description of
a challenging behavior situation to descriptions in the re-
pository of challenging behavior cases in non-care do-
mains. To do this, the service implements a computational
analogical reasoning algorithm based on the Structure-
Proceedings of the Fourth International Conference on Computational Creativity 2013 49
Mapping Theory (Gentner 1983; Falkenhainer et al. 1989)
with natural language parsing techniques and a domain-
independent verb lexicon called VerbNet (Kipper et al.
2000). A carer can then record new ideas resulting from
creative thinking in audio form, then reflect on them by
playing them back to change them, generate further ideas,
compose them into a care plan and share the plan with oth-
er carers. Some of these features are depicted in Figure 1.
The right-hand side of Figure 1 shows one retrieved ana-
logical case description Managing a disrespectful child
as it is presented to a carer using the app. The Carer app is
described at length in Maiden (2012). The next section
describes two of the computational creativity services the
analogical reasoning discovery service and the creativity
prompt generation service.
The Analogical Reasoning Discovery Service
This service (called AnTiQue) matches a description of
challenging behavior in dementia care to descriptions of
challenging behavior problems and resolutions in other
domains, for example good policing practices to manage
disorderly revelers and good teaching practices to manage
disruptive children. AnTiQues design seeks to solve 2
research problems: (i) match incomplete and ambiguous
natural language descriptions of challenging behaviour in
dementia care and challenging behaviour problems and
resolutions in other domains using different lexical terms;
(ii) compute complex analogical matches between descrip-
tions without a priori classification of the described do-
mains.
Analogical service retrieval can increase the number of
cases that are useful to the care staff by retrieving descrip-
tions of cases solved successfully in other domains, for
example good policing practices to manage disorderly
revelers and good teaching practices to manage disruptive
children. The problem and solution description of each
case might have aspects that, through analogical reasoning,
can trigger discovery of new ideas on the current challeng-
ing behaviour. For example, a description of good policing
practice to manage disorderly revellers can provide analog-
ical insights with which to manage challenging behaviour
in dementia care. AnTiQue seeks to leverage these new
sources of knowledge in dementia care.
Analogical retrieval in AnTiQue uses a similarity model
called the Structure Mapping Theory (SMT) (Gentner
1983) which seeks to transfer a network of related facts
rather than unrelated one (Gentner 1983) from a source to a
target domain. To enable structure-matching AnTiQue
transforms natural language queries and case descriptions
into predicates that express prepositional networks of
nodes (objects) and edges (predicate values). Attributional
predicates state properties of objects in the form Predicat-
eValue(Object) such as asleep(resident) and ab-
sent(relative). Relational predicates express relations be-
tween objects as PredicateValue (Object1,Object2) such as
abuse(resident, care-staff) and remain(resident,room).
According to the SMT, a literal similarity is a comparison
in which attributional and relational predicates can both be
mapped from a source to a target. In contrast an analogy is
a comparison in which relational predicates but few or no
attributional predicates can be mapped. Therefore An-
TiQue retrieves cases with high match scores for relational
predicates and low match scores for attributional predi-
cates, for example a match with the predicate
abuse(detainee,police-officer) but no match with the predi-
cate drunk(detainee).
AnTiQue
NLP Parser
Predicate Parser
Predicate Expansion
Predicate Matcher
Similarity
Query
Parsed
Sentences
Predicates
Registry
XQueries
Services
Expanded
Predicates
Matched Services
VerbNet
Predicate
Rules & Heuristics
WordNet,
SSParser,
Stanford Parser
Verb Classes
WordNet,
Dependency
Thesaurus
Figure 2. Internal structure of AnTiQue
Figure 2 depicts AnTiQues 5 components. When in-
voked the service first divides query and case problem de-
scription text into sentences, then part-of-speech tagged,
shallow parsed to identify sentence constituents and
chunked in noun phrases. It then applies 21 syntax struc-
ture rules and 7 lexical extraction heuristics to identify
predicates and extract lexical content in each sentence.
Natural language sentences are presented as predicates in
the form PredicateValue(Object1, Object2). The service
then expands each query predicate with additional predi-
cate values that have similar meaning according to verb
classes found in VerbNet to increase the likelihood of a
match with predicates describing each case. For example
the predicate value abuse is in the same verb class as at-
tack. The service then matches all expanded predicates to a
similar set of predicates that describe the problem descrip-
tion of each case in the repository. This is achieved using
XQuery text- searching functions to discover an initial set
of cases that satisfy global search constraints. Finally it
applies semantic and dependency-based similarity
measures to refine the candidate case study set. The service
returns an ordered set of analogical cases based on the
match score with the query.
The components use WordNet, VerbNet, and the De-
pendency Thesaurus to compute attributional and relational
similarities. WordNet is a lexical database inspired by psy-
cholinguistic theories of human lexical memory (Miller
Proceedings of the Fourth International Conference on Computational Creativity 2013 50
1993). Its word senses and definitions provide the data
with which to disambiguate terms in queries and case prob-
lem descriptions. Its semantic relations link terms to other
terms with similar meanings with which to make service
queries more complete. For example a service query with
the term car is expanded with other terms with similar
meaning, such as automobile and vehicle, to increase
matches with web service descriptions.
VerbNet (Kipper et al. 2000) is a domain independent
verb lexicon. It organizes terms into verb classes that refine
Levin classes (Levin 1993) and add sub-classes to achieve
syntactic and semantic coherence among members of a
verb class. AnTiQue uses it to expand query predicate val-
ues with different members from the same verb class. For
example, queries with the verb abuse are expanded with
other verbs with similar meaning such as attack.
The Dependency Thesaurus supports dependency-based
word similarity matching to detect similar words from text
corpora. Lin (1998) used a 64-million word corpus to com-
pute pair-wise similarities between all of the nouns, verbs,
adjectives and adverbs in the corpus using a similarity
measure. Given an input word the Dependency Thesaurus
can retrieve similar words and group them automatically
into clusters. AnTiQue used the Dependency Thesaurus to
compute the relational similarity between 2 sets of predi-
cates.
In the remainder of this section we demonstrate the An-
TiQue components using text from the following example
challenging behaviour situation:
A resident acts aggressively towards care staff and
the resident verbally abuses other residents at
breakfast. Suspect underlying insecurities to new
people.
Natural Language Processing
This component prepares the structured natural language
(NL) service query for predicate parsing and expansion. In
the first step the text is split into sentences. In the second a
part-of-speech tagging process is applied that marks up the
words in each sentence as corresponding to a particular
lexical category (part-of-speech) using its definition and
context. In the third step the algorithm applies a NL pro-
cessing technique called shallow parsing that attempts to
provide some machine understanding of the structure of a
sentence without parsing it fully into a parsed tree form.
The output is a division of the text's sentences into a series
of words that, together, constitute a grammatical unit. In
our example the tagged sentence a resident acts aggres-
sively towards care staff and the resident verbally abuses
other residents at breakfast is shown in Figure 3. Tags that
follow a word with a forward slash (e.g. driver/NN) corre-
spond to lexical categories including noun, verb, adjective
and adverb. For example, the NN tag means noun singular
or mass", DT means determinant and VBZ means verb,
present tense, 3rd person singular. Tags attached to each
chunk (e.g. [The/DT driver/NN]NP) correspond to phrasal
categories. For instance, the NP tag denotes a noun
phrase, VP a verb phrase, S a simple declarative
clause, PP a prepositional phrase and ADVP a adverb
phrase.
[A/DT resident/NN]NP [acts/VBZ]VP [aggressive-
ly/RB]ADVP [towards/]PP [care staff/NN]NP.
Figure 3. The sentence a resident acts aggressively towards care
staff after performing part-of-speech tagging and chunking
The component then decomposes each sentence into its
phrasal categories used in the next component to identify
predicates in each sentence structure.
Predicate Parsing
This component automatically identifies predicate struc-
tures within each annotated NL sentence based on syntax
structure rules and lexical extraction heuristics. Syntax
structure rules break down a pre-processed NL sentence
into sequences of phrasal categories where each sequence
contains 2 or more phrasal categories. Lexical extraction
heuristics are applied on each identified sequence of phras-
al categories to extract its lexical content used to generate
one or more predicates.
Firstly the algorithm applies 21 syntax structure rules.
Each rule consists of a phrasal category sequence of the
form Ri ! [Bj], meaning that the rule Ri consists of a
phrasal category sequence B1, B2,, Bj. For example the
rule R4 ! [NP, VP, S, VP, NP] reads: rule R1 consists of a
NP followed by a VP, a S, a VP, and a NP, where NP, VP
and S mean a noun phrase, a verb phrase and a simple de-
clarative clause respectively. The method takes a phrasal
category list as input and returns a list containing each dis-
covered syntax structure rule and its starting point in the
corresponding phrasal category list, e.g. {(R1,3), (R5,1)}.
In our example, the input for the pre-processed sentence
shown in Figure 3 corresponds to a list Input = (NP, VP,
ADVP, PP, NP). Starting from the first list position the
method recursively checks whether there exists a sequence
within the phrasal category list that matches one of the
syntax structure rules. The output after applying the algo-
rithm on list Input is a list of only one matched syntax
structure rule, i.e. Output = {(R2,1)}.
Secondly the algorithm applies lexical extraction heu-
ristics on a syntax structure rule-tagged sentence to extract
content words for generating one or more predicates. For
each identified syntax structure rule in a sentence the algo-
rithm: (1) determines the position of both noun and verb
phrases within the phrasal category sequence; (2) applies
the heuristics to extract the content words (verbs and
nouns) from each phrase category; (3) converts each verb
and noun to its morphological root (e.g. abusing to abuse);
and (4) generates the corresponding predicate p in the form
PredicateValue(Object1, Object2) where PredicateValue is
the verb and Object1 and Object2 the nouns. To illustrate
this the algorithm identified rule R2+ for our example sen-
Proceedings of the Fourth International Conference on Computational Creativity 2013 51
tence in Figure 3. According to one heuristic {R2+} corre-
sponds to the following phrasal category sequence [NP,
VP, ADVP, PP, NP]. Therefore the algorithm determines
the position of both noun and verb phrases within this se-
quence, i.e. noun phrases in {NP,1} and {NP,5} and verb
phrases in {VP,2}. Lexical extraction heuristics are applied
to extract the content words from each phrase category, i.e.
{NP,1} ! resident, {NP,5} ! care staff, {VP,2} ! act.
Returning to our example, the algorithm generates two
predicates for the sentence a resident acts aggressively
towards care staff and the resident verbally abuses other
residents at breakfast, namely act(resident,care_staff) and
abuse(resident,resident).
Predicate Expansion
Predicate expansion and matching are key to the services
effectiveness. In AnTiQue queries are expanded using
words with similar meaning. AnTiQue uses ontological
information from VerbNet to extract semantically related
verbs for verbs in each predicate.
VerbNet classes are organised to ensure syntactic and
semantic coherence among members, for example the verb
abuse as repeatedly treat a victim in a cruel way is one of
24 members of the judgement class. Other members
include attack, assault and insult and 20 other verbs as
potential expansions. Thus VerbNet provides 23 verbs as
potential expansions for the verb abuse. Although classes
group together verbs with similar argument structures, the
meanings of the verbs are not necessarily synonymous. For
instance, the degree of attributional similarity between
abuse and reward is very low, whereas the similarity
between abuse and assault is very high. The service
constrains use of expansion to verb members that achieve a
threshold on the degree of attributional similarity
computed with WordNet-based similarity measurements
(Simpson and Dao 2005). Given 2 sets of text, T1 and T2,
the measurement determines how similar the meaning of
T1 and T2 is scored between 0 and 1. For example, for the
verb abuse, the algorithm computes the degree of
attributional similarity between abuse and each co-member
within the judgement class. In our example verbs such as
attack, assault and insult but not honour and doubt are
used to generate additional predicates in the expanded
query.
Predicate Matching
Coarse-grained Matching The expanded query is fired at
problem descriptions of cases in the repository as an
XQuery. Prior to executing the XQuery we pre-process all
problem descriptions of cases in the registries using the
Natural Language Processing and Predicate Parsing com-
ponents and store them locally. The XQuery includes func-
tions to match each original and expanded predicate value
to equivalent representations of candidate problem descrip-
tions of cases. The service retrieves an initial set of
matched cases.
Fine-grained Matching The Predicate Matcher applies
semantic and dependency-based similarity measures to
assess the quality of the candidate case set. It computes
relational similarity between the query and each case
retrieved during coarse-grain matching. To compute
relational similarities that indicate analogical matches
between service and query predicate arguments the
Predicate Matcher uses the Dependency Thesaurus to
select web services that are relationally similar to mapped
predicates in the service query.
In our example the case Managing a disrespectful child,
which describes a good childcare practice to manage a dis-
respectful child, is one candidate case retrieved during
coarse-grained matching. Figure 4 shows the problem and
solution description of the case.
Name Managing a disrespectful child
Problem An intelligent 13-year-old boy voices opinions that
are hurtful and embarrassing. The child refuses to
consider the views of others and often makes dis-
criminatory statements. The parents have removed
his privileges and threatened to take him out of the
school he loves. This approach has not worked. He
now makes hurtful comments to his mother about
her appearance. The child insults neighbours and
guests at their home. He is rude and mimics their
behaviour. The child shows no remorse for his
actions. His mother is at the end of her tether.
Solution The son needs very clear boundaries set. The par-
ents are going to set clear rules on acceptable be-
haviour. They will state what they are not prepared
to tolerate. They will highlight rude comments in a
firm tone with the boy. He will receive an explana-
tion as to why the comments are hurtful. Both par-
ents will agree punishments for rule breaking that
are realistic. They will work as a team and follow
through on punishments. The son can then regain
his privileges as rewards for consistent good be-
haviours.
Figure 4. A retrieved case describing a good childcare practice to
manage a disrespectful child
The algorithm receives as inputs a pre-processed sen-
tence list for the query and problem description of the case.
It compares each predicate in the pre-processed query sen-
tence list Pred(j)Query with each predicate in the pre-
processed problem description sentence list Pred(k)Case to
calculate the relevant match value, where
Pred(j)Query = PredValQuery(Arg1Query; Arg2Query)
and
Pred(k)Case = PredValCase (Arg1Case; Arg2Case).
The following conditions must be met in order to accept
a match between the predicate pair:
1. PredValCase exists in list of expanded predicate values of
PredValQuery;
2. Arg1Query and Arg1Case (or Arg2Query and Arg2Case respec-
tively) are not the same;
3. Arg1Case (or Arg2Case) exists in the Dependency Thesau-
rus result set when using Arg1Query (or Arg2Query) as the
query to the Thesaurus;
4. the resulting attributional similarity value from step 3 is
below a specified threshold.
Proceedings of the Fourth International Conference on Computational Creativity 2013 52
If all conditions are met, PredCase is added to the list of
matched predicates for the current case. If not the algo-
rithm rejects PredCase and considers the next list item.
AnTiQue queries the Dependency Thesaurus to retrieve
a list of dependent terms. Terms are grouped automatically
according to their dependency-based similarity degree.
Firstly the algorithm checks whether the case predicate
argument exists in this list. If so, it uses the semantic simi-
larity component to further refine and assess the quality of
the case predicate with regards to relational similarity.
Using this 2-step process AnTiQue returns an ordered
set of analogical cases based on the match score with the
query. In our example consider Pred(j)Query =
abuse(resident,residents) extracted from the sentence the
resident verbally abuses other residents at breakfast, and
the Pred(k)Case = insult(child,neighbours) from the sentence
The child insults neighbours and guests at their home tak-
en from the description of the Managing a disrespectful
child good childcare practice case in Figure 4. In this ex-
ample all conditions for an analogical match are met: the
predicate values abuse and insult are semantically equiva-
lent whilst the object names resident and child and resi-
dents and neighbours are not the same. According to the
Dependency thesaurus child is similar based on dependen-
cies to resident, and neighbour is similar based on depend-
encies to resident. Finally the attributional similarity value
of resident and child is 0.33, for resident and neighbour
0.25 both below the specified threshold. As a result the
predicate insult(child,neighbours) is added to the list of
matched predicates for the predicate
abuse(resident,resident).
At the end of each invocation, the service returns an or-
dered set of the descriptions of the highest-scoring cases
for the app component to display to the care staff.
The Creativity Trigger Generation Service
Although care staff can generate new resolutions directly
from retrieved case descriptions, formative usability testing
with the app revealed that users were often overwhelmed
by the volume of text describing each case and uncertain
how to start idea generation. Therefore we developed an
automated service that care staff can invoke to generate
creative triggers that extract content from the retrieved
descriptions to conjecture new ideas that care staff can
consider for the resident. Each trigger expresses a single
idea that care staff can use to initiate creative thinking. The
service uses the attributional predicates generated by the
analogical matching discovery service to generate prompts
that encourage analogical transfer of knowledge using the
object-pair mappings identified in each predicate. It has the
form Think about a new idea based on the, followed by
mapped subject and object names in the target domain. To
illustate, referring back to the Managing a disrespectful
child good practice case retrieved from the childcare do-
main shown in Figure 1, Figure 5 shows how they are pre-
sented in the Carer mobile app while Figure 6 lists all crea-
tivity prompts that the service generates for the analogical
case.
Figure 5. The Carer mobile app showing creativity prompts gen-
erated for the Managing a disrespectful child case
Think about a new idea based on the boundaries
Think about a new idea based on the clear rules
Think about a new idea based on the acceptable behaviour
Think about a new idea based on the rude comments
Think about a new idea based on the firm tone
Think about a new idea based on the explanation
Think about a new idea based on the comments
Think about a new idea based on the punishment
Think about a new idea based on the rule breaking
Think about a new idea based on the rewards
Think about a new idea based on the privileges
Think about new idea based on the consistent good behaviour
Figure 6. Creativity prompts generated for the Managing a disre-
spectful child case
Discovering Novel Ideas
Our design of the Carer app builds on Kerne et al. (2008)s
notion of human-centered creative cognition, in which in-
formation gathering and idea discovery occur concurrently,
and information search and idea generation reinforce each
other. The computational model of analogical reasoning
searches for and retrieves information from analogical do-
mains, and the creativity trigger generation service ma-
nipulates this information to support more effective idea
generation from information, however the generation of
new ideas remains a human cognitive activity undertaken
by carers, supported by bespoke features implemented in
the app.
For example, a carer can audio-record a new idea at any
time in response to retrieved analogical cases and/or pre-
sented creativity triggers by pressing the red button visible
in Figures 1 then verbalizing and naming the idea. Record-
ed ideas can be selected and ordered to construct a new
care enhancement plan that can be extended with more
Proceedings of the Fourth International Conference on Computational Creativity 2013 53
ideas and comments at any time. The carer can also play
back the audio-recorded ideas and care enhancement plans
to reflect and learn about them, inspired by similar use of
the audio channel in digitally supported creative brain-
storming (van Dijk et al. 2011). Reflection about an idea is
supported with guidance from the app to reflect on why the
idea is needed, what the idea achieved, and how and when
the idea should be implemented. Reflection about a care
enhancement plan is more sophisticated. A carer can drag-
and-drop ideas in and out of the plan and into different
sequences in it. Then, during play back of the plan, the app
concatenates the individual idea audio files and plays the
plan as a single recording, allowing the carer to listen to
and reflect on each version of the plan as a different narrat-
ed story. Moreover, s/he can reflect collaboratively with
colleagues using the app to share the plan as e-mail at-
tachments, thereby enabling asynchronous communication
between carers.
Formative Evaluation of the Carer App
The Carer app was made available for evaluation over
prolonged periods with carers in a residential home. At the
start of the evaluation, 7 nurses and care staff in the resi-
dential home were given an iPod Touch for their individual
use during their care work over a continuous 28-day peri-
od. All 7 carers received face-to-face training in how to use
the device and both apps before the evaluation started. A
half-day workshop was held at the residential home to al-
low them to experiment with all of both apps features. The
carers were also given training and practice with the 3
forms of Other Worlds creativity technique through prac-
tice and facilitation to demonstrate how it can lead to idea
generation. We deemed this training in the creativity tech-
nique an essential precondition for successful uptake of the
app.
Even though it only lasted 4 weeks, the reported evalua-
tion of the Carer app in one residential home provided
valuable data about the use of mobile computing and crea-
tivity techniques in dementia care. Figure 7 depicts the
results.
Residential
cases
Analogical
domain cases
Ideas
generated
Enhancement
plans generated
Totals 27 5 14 10
Figure 7. Situations, ideas and care enhancement plans generated
by care staff using Carer app
The focus group revealed that the nurses and carers im-
plemented at least one major change to the care of one res-
ident based on ideas generated using the app.
However, most of this success was not based on the ana-
logical cases retrieved by the computational model. Whilst
carers using the app did use the analogical matching ser-
vice, and the service did retrieve relevant cases from ana-
logical domains such as childcare and student manage-
ment, the carers were unable to map and transfer
knowledge from each of these source domains to the cur-
rent dementia-related challenging behavior. The log data
recorded only 5 uses of the analogical reasoning service to
retrieve descriptions of cases of challenging behaviors
from non-care domains. Rather, the carers appeared to use
the case-based reasoning service to retrieve descriptions of
challenging behavior cases from the care domain the log
data recorded 28 uses of this service, and most of the 114
recorded uses of the creativity prompt generation service
were generated from these same-domain dementia cases.
The focus group revealed that the carers did not use re-
trieved non-care domain cases because they were unable to
recognize analogical similarities between them and the
challenging behavior situation. We identified two possible
reasons for this. Firstly, AnTiQue implements an approach
that approximates analogical retrieval, hence there is al-
ways the possibility of computing seemingly wrong as-
sociations and retrieve cases that do not have analogical
similarities. Previous evaluations of AnTiQue with regards
to the precision and recall (Zachos & Maiden, 2008) re-
vealed a recall score of 100% and a precision score of
66,6% highlighting one potential limitation of computing
the attributional similarity using WordNet-based similarity
measures.
Secondly, the results suggests that carers will require
more interactive support based on results generated by the
computational model to support cognitive analogical rea-
soning, consistent with previously reported empirical find-
ings (e.g. Gick 1983). Examples of such increased interac-
tive support include explicitly reporting each computed
analogical mapping to the carer, use of graphical depic-
tions of structured knowledge to transfer from the source to
the target domain, and more deliberate analogical support
prompts, for example based on the form A is to B as C is to
D. We are extending Carer app with such features and look
forward to reporting these extensions in the near future.
Related Work
Since the 1980s, the efforts of many Artificial Intelligence
researchers and psychologists have contributed to an
emerging agreement on many issues relating to analogical
reasoning. In various ways and with differing emphases, all
current computational analogical reasoning techniques use
underlying structural information about the sources and the
target domains to derive analogies. However, at the algo-
rithmic level, they achieve the computation in many differ-
ent ways (Keane et al. 1994).
Based on the Structure Mapping Theory (SMT), Gentner
constructed a computer model of this theory called Struc-
ture Mapping Engine (SME) (Gentner 1989). The method
assumes that both target and source situations are repre-
sented using a certain symbolic representation. The SME
also only uses syntactic structures about the two situations
as the main input knowledge it has no knowledge of any
kind of semantic similarity between various descriptions
and relations in the two situations. All processing is based
on syntactic structural features of the two given representa-
tions.
The application of analogical reasoning to software re-
use is not new. For example, Massonet and van
Proceedings of the Fourth International Conference on Computational Creativity 2013 54
Lamsweerde (1997) applied analogy-making techniques to
complete partial requirements specifications using a rich,
well-structured ontology combined with formal assertions.
The method was based on query generalization for com-
pleting specifications. The absence of effective ontologies
and taxonomies would expose the weaknesses of the pro-
posed approach due to the reliance on ontologies. Pisan
(2000) tried to overcome this weakness by applying the
SME to expand semi-formal specifications. The idea was
to find mappings from specifications for problems similar
to the one in hand and use the mappings to adapt an exist-
ing specification without requiring domain specific
knowledge. The research presented in this paper over-
comes limitations of the above-mentioned approaches by
using additional knowledge bases to extent the mapping
process with semantic similarity measures.
Conclusion and Future Work
This paper reports a practical application of a computation-
al model of analogical reasoning to a pressing social prob-
lem, which is to improve the care of older people with de-
mentia. The result is a mobile app that is capable technical-
ly of accepting spoken and typed natural language input
and retrieving analogical domain cases that can be present-
ed with creativity triggers to support analogical problem
solving.
The evaluation results reported revealed that our model
of creative problem solving in dementia care did not de-
scribe all observed carer behavior, so we are currently re-
peating the rollout and evaluation of Carer in other resi-
dential homes to validate this finding. Carer is being ex-
tended with new creativity support features that include
web images that match generated creativity prompts, and
more explicit support for analogical reuse of cases from
non-dementia care domains. We are extending the reposi-
tory with new cases that are semantically closer to demen-
tia care and, therefore, easier to recognize analogical simi-
larities with.
Acknowledgment
The research reported in this paper is supported by the EU-
funded MIRROR integrated project 257617, 2010-14.
References
Alzheimers Society, 2010. Statistics.
http://www.alzheimers.org.uk/site/scripts/documents_info.p
hp?documentID=341
Wimo A. & Prince M., 2010, World Alzheimer Report 2010: The Global
Economic impact of Dementia,
http://www.alz.co.uk/research/worldreport/
Brooker, D., 2007, Person-centred dementia care: Making Services Bet-
ter, Bradford Dementia Group Good Practice Guides. Jessica Kingsley
Publishers London and Philadephia
Osborn A., 1965. The Creative Trend in Education. In: Source Book For
Creative Problem Solving: A Fifty Year Digest of Proven Innovation
Processes. Creative Education Foundation Press, New York.
Le Storti A., J., Cullen P., A., Hanzlik E., M., Michiels, J., M., Piano L.,
A., Lawless Ryan P., Johnson W., 1999. Creative Thinking In Nursing
Education: Preparing for Tomorrow's Challenges. Nursing Outlook. Vol.
47, no. 2 6266.
Help The Aged, 2007, My Home Life: Quality of Life In Care Homes; A
Review of the Literature London. [online]
http://myhomelifemovement.org/ downloads/mhl_review.pdf [Accessed 5
Jan 2011].
Bromley J. and Emerson E., 1995, Beliefs and Emotional Reactions of
Care Staff working with People with Challenging Behaviour, Journal of
Intellectual Disability Research 39(4), 341-352
Innovation Company, 2002. Sticky Wisdom (?WhatIf!). Capstone Pub-
lishing Company Limited, Chichester.
Owen T. and Meyer J., 2009, Minimizing the use of restraint in care
homes: Challenges, dilemmas and positive approaches, Adult Services
Report 25, Social Care Institute of Excellence, http://www.scie.org.uk/
publications/reports/report25.pdf.
Houts, P.S., Nezubd, A.M., Magut Nezubd, C, Bucherc, J.A., 1996, The
prepared family caregiver: a problem- solving approach to family care-
giver education, Patient Education and Counselling, 27,1, 63-73.
Gentner D., 1983, 'Structure-Mapping: A Theoretical Framework for
Analogy', Cognitive Science 5, 121-152.
Falkenhainer B., Forbus K.D. & Gentner D., 1989, 'The Structure-
Mapping Engine: Algorithm and Examples', Artificial Intelligence 41, 1-
63.
Kerne, A., Koh, E., Smith, S. M., Webb, A., Dworaczyk, B., 2008,
combinFormation: Mixed-Initiative Composition of Image and Text
Surrogates Promotes Information Discovery, ACM Transactions on
Information Systems, 27(1), 1-45.
Kipper K., Dang H.T. and Palmer M., 2000, Class-based Construction of
a Verb Lexicon, Proceedings AAAI/IAAI Conference 2000, 691696.
Miller K., 1993, Introduction to WordNet: an On-line Lexical Database
Distributed with WordNet software.
Levin B., 1993, English Verb Classes and Alternations: A Preliminary
Investigation, University Chicago Press.
Lin D., 1998, Automatic retrieval and clustering of similar words, In
COLINGACL, 768-774.
Maiden N.A.M., 2012, D5.2 Deliverable: Techniques and Software Apps
for Integrated Creative Problem Solving and Reflective Learning Version
1, Technical Report. Available at http://www.mirror-
project.eu/showroom-a-publications/deliverables.
Simpson, T. and Dao, T. (2005). Wordnet-based semantic similarity
measurement. codeproject.com/cs/library/semanticsimilaritywordnet.asp
van Dijk J., van der Roest J., van der Lugt R. & Overbeeke K., NOOT: A
Tool for Sharing Moments of Reflection during Creative Meetings, Pro-
ceedings 10th ACM Creativity and Cognition Conference, Atlanta Geor-
gia, Nov 2011, ACM Press.
Gick M.L., 1989, Two Functions of Diagrams in Problem Solving by
Analogy, Knowledge Acquisition from Text and Pictures, ed. M. Handi
& J.R. Levin, Elsevier Publishers B.V. North-Holland, 215-231.
Keane, M.T., Ledgeway, T. & Duff, S. (1994). Constraints on analogical
map- ping: A comparison of three models. Cognitive Science, 18, 387
438.
Massonet, P. and van Lamsweerde, A. (1997). Analogical reuse of re-
quirements framework. 3rd IEEE International Symposium on Require-
ments Engineering.
Pisan, Y. (2000). Extending requirement specifications using analogy.
22nd International Conference on Software Engineering (ICSE), limerick,
Ireland.
Zachos K. & Maiden N.A.M., 2008, Inventing Requirements from Soft-
ware: An Empirical Investigation with Web Services, Proceedings 16th
IEEE International Conference on Requirements Engineering, IEEE
Computer Society Press, 145-154.
Proceedings of the Fourth International Conference on Computational Creativity 2013 55
Transforming Exploratory Creativity with DeLeNoX
Antonios Liapis
1
, H ector P. Martnez
2
, Julian Togelius
1
and Georgios N. Yannakakis
2
1: Center for Computer Games Research
IT University of Copenhagen
Copenhagen, Denmark
2: Institute of Digital Games
University of Malta
Msida, Malta
[email protected], [email protected], [email protected], [email protected]
Abstract
We introduce DeLeNoX (Deep Learning Novelty Ex-
plorer), a system that autonomously creates artifacts in
constrained spaces according to its own evolving inter-
estingness criterion. DeLeNoX proceeds in alternating
phases of exploration and transformation. In the explo-
ration phases, a version of novelty search augmented
with constraint handling searches for maximally diverse
artifacts using a given distance function. In the trans-
formation phases, a deep learning autoencoder learns to
compress the variation between the found artifacts into
a lower-dimensional space. The newly trained encoder
is then used as the basis for a new distance function,
transforming the criteria for the next exploration phase.
In the current paper, we apply DeLeNoX to the cre-
ation of spaceships suitable for use in two-dimensional
arcade-style computer games, a representative problem
in procedural content generation in games. We also sit-
uate DeLeNoX in relation to the distinction between ex-
ploratory and transformational creativity, and in relation
to Schmidhubers theory of creativity through the drive
for compression progress.
Introduction
Within computational creativity research, many systems
have been designed that create artifacts automatically
through search in a given space for predened objectives,
using evolutionary computation or some similar stochastic
global search/optimization algorithm. Recently, the novelty
search paradigm has aimed to abandon all objectives, and
simply search the space for a set of artifacts that is as diverse
as possible, i.e. for maximum novelty (Lehman and Stanley
2011). However, no search is without biases. Depending on
the problem, the search space often contains constraints that
limit and bias the exploration, while the mapping from geno-
type space (in which the algorithm searches) and phenotype
space (in which novelty is calculated) is often indirect, intro-
ducing further biases. The result is a limited and biased nov-
elty search, an incomplete exploration of the given space.
But what if we could characterize the bias of the search
process as it unfolds and counter it? If the way space is be-
ing searched is continuously transformed in response to de-
tected bias, the resulting algorithm would more thoroughly
search the space by cycling through or subsuming biases. In
applications such as game content generation, it would be
particularly useful to sample the highly constrained space of
useful artifacts as thoroughly as possible in this way.
In this paper, we present the Deep Learning Novelty Ex-
plorer (DeLeNoX) system, which is an attempt to do exactly
this. DeLeNoX combines phases of exploration through
constrained novelty search with phases of transformation
through deep learning autoencoders. The target applica-
tion domain is the generation of two-dimensional spaceships
which can be used in space shooter games such as Galaga
(Namco 1981). Automatically generating visually diverse
spaceships which however fulll constraints on believability
addresses the content creation bottleneck of many game ti-
tles. The spaceships are generated by pattern-producing net-
works (CPPNs) via augmenting topologies (Stanley 2006).
In the exploration phases, DeLeNoX nds the most diverse
set of spaceships possible given a particular distance func-
tion. In the transformation phases, it characterizes the found
artifacts by obtaining a low-dimensional representation of
their differences. This is done via autoencoders, a novel
technique for nonlinear principal component analysis (Ben-
gio 2009). The features found by the autoencoder are or-
thogonal to the bias of the current CPPN complexity, ensur-
ing that each exploratory phase has a different bias than the
previous. These features are then used to derive a new dis-
tance function which drives the next exploration phase. By
using constrained novelty search for features tailored to the
concurrent complexity, DeLeNoX can create content that is
both useful (as it lies within constraints) and novel.
We will discuss the technical details of DeLeNoX shortly,
and showresults indicating that a surprising variety of space-
ships can be found given the highly constrained search
space. But rst we will discuss the system and the core idea
in terms of exploratory and transformational creativity, and
in the context of Schmidhubers theory of creativity as an
impulse to improve the compressibility of growing data.
Between exploratory and
transformational creativity
A ubiquitous distinction in creativity theory is that between
exploratory and transformational creativity. Perhaps the
most well-known statement of this distinction is due to Bo-
den (1990) and was later formalized by Wiggins (2006) and
others. However, similar ideas seem to be present in al-
Proceedings of the Fourth International Conference on Computational Creativity 2013 56
Transformation
Denoising Autoencoder
ExpIoration
Feasible-Infeasible Noveltv Search
Fitness Function
Training Set
Figure 1: Exploration transformed with DeLeNoX: the
owchart includes the general principles of DeLeNoX
(bold) and the methods of the presented case study (italics).
most every major discussion of creativity such as thinking
outside the box (De Bono 1970), paradigm shifts (Kuhn
1962) etc. The idea requires that creativity is conceptual-
ized as some sort of search in a space of artifacts or ideas. In
Bodens formulation, exploratory creativity refers to search
within a given search space, and transformational creativ-
ity refers to changing the rules that bind the search so that
other spaces can be searched. Exploratory creativity is often
associated with the kind of pedestrian problem solving that
ordinary people engage in every day, whereas transforma-
tional creativity is associated with major breakthroughs that
redene the way we see problems.
Naturally, much effort has been devoted to thinking up
ways of modeling and implementing transformational cre-
ativity in a computational framework. Exploratory creativity
is often modeled simply as objective-driven search, e.g.
using constraint satisfaction techniques or evolutionary al-
gorithms (including interactive evolution).
We see the distinction between exploratory and transfor-
mative creativity as a matter quantitative rather than qualita-
tive. In some cases, exploratory creativity is indeed limited
by hard constraints that must be broken in order to transcend
into unexplored regions of search space (and thus achieve
transformational creativity). In other cases, exploratory cre-
ativity is instead limited by biases in the search process. A
painter might have a particular painting technique she de-
faults to, a writer a common set of plot devices he returns to,
and an inventor might be accustomed to analyze problems
in a particular order. This means that some artifacts are in
practice never found, even though nding them would not
break any constraints those artifacts are contained within
the space delineated by the original constraints. Analo-
gously, any search algorithm will over-explore some regions
of search space and in practice never explore other areas be-
cause of particularities related to e.g. evaluation functions,
variation operators or representation (cf. the discussion of
search biases in machine learning (Mitchell 1997)). This
means that some artifacts are never found in practice, even
though the representation is capable of expressing them and
there exists a way in which they could in principle be found.
DeLeNoX and Transformed Exploration
As mentioned above, the case study of this paper is two-
dimensional spaceships. These are represented as images
generated by Compositional Pattern-Producing Networks
(CPPNs) with constraints on which shapes are viable space-
ships. Exploration is done through a version of novelty
search, which is a type of evolutionary algorithm that seeks
to explore a search space as thoroughly as possible rather
than maximizing an objective function. In order to do this,
it needs a measure of difference between individuals. The
distance measure inherently privileges some region of the
search space over others, in particular when searching at
the border of feasible search space. Additionally, CPPNs
with different topologies are likely to create specic pat-
terns in generated spaceships, with more complex CPPNs
typically creating more complex patterns. Therefore, in dif-
ferent stages of this evolutionary complexication process,
different regions of the search space will be under-explored.
Many artifacts that are expressible within the representation
will thus most likely not be found; in other words, there are
limitations to creativity because of search biases.
In order to alleviate this problem and achieve a fuller cov-
erage of space, we algorithmically characterize the biases
from the search process and the representation. This is what
the autoencoders do. These autoencoders are applied on a
set of spaceships resulting from an initial exploration of the
space. A trained autoencoder is a function from a complete
spaceship (phenotype) to a relatively low-dimensional array
of real values. We then use the output of this function to
compute a new distance measure, which differs from pre-
vious ones in that it better captures typical patterns at the
current representational power of the spaceship-generating
CPPNs. Changing the distance function amounts to chang-
ing the exploration process of novelty search, as novelty
search is now in effect searching along different dimensions
(see Fig. 1). We have thus transformed exploratory creativ-
ity, not by changing or abandoning any constraints, but by
adjusting the search bias. This can be seen as analogous to
changing the painting technique of a painter, the analysis se-
quence of an inventor, or introducing new plot devices for a
writer. All of the spaceships that are found by the newsearch
process could in principle have been found by the previous
processes, but were very unlikely to be.
Schmidhubers theory of creativity
Schmidhuber (2006; 2007) advances an ambitious and in-
uential theory of beauty, interestingness and creativity that
arguably holds explanatory power at least under certain cir-
cumstances. Though the theory is couched in computational
terms, it is meant to be applicable to humans and other an-
imals as well as articial agents. In Schmidhubers theory,
a beautiful pattern for a curious agent A is one that can suc-
cessfully be compressed to much smaller description length
by that agents compression algorithm. However, perfect
beauty is not interesting; an agent gets bored by environ-
ments it can compress very well and cannot learn to com-
press better, and also by those it cannot compress at all. In-
teresting environments for A are those which A can com-
press to some extent but where there is potential to improve
the compression ratio, or in other words potential for A to
learn about this type of environment. This can be illustrated
by tastes in reading: beginning readers like to read linguis-
tically and thematically simple texts, but such texts are seen
by advanced readers as predictable (i.e. compressible),
and the curious advanced readers therefore seek out more
Proceedings of the Fourth International Conference on Computational Creativity 2013 57
complex texts. In Schmidhubers framework, creative indi-
viduals such as artists and scientists are also seen as a curi-
ous agents: they seek to pose themselves problems that are
on the verge of what they can solve, learning as much as pos-
sible in the process. It is interesting to note the close links
between this idea and the theory of ow (Csikszentmihalyi
1996) but also theories of learning in children (Vygotsky et
al. 1987) and game-players (Koster and Wright 2004).
The DeLeNoX system ts very well into Schmidhubers
framework and can be seen as a novel implementation of
a creative agent. The system proceeds in phases of ex-
ploration, carried out by novelty search which searches for
interesting spaceships, and transformation, where autoen-
coders learn to compress the spaceships found in the previ-
ous exploration phase (see Fig. 1) into a lower-dimensional
representation. In the exploration phases, interesting
amounts to far away from existing solutions according to
the distance function dened by the autoencoder in the pre-
vious transformation phase. This corresponds to Schmidhu-
bers denition of interesting environments as those where
the agent can learn (improve its compression for the new en-
vironment); the more distant the spaceships are, the more
they force the autoencoder to change its compression algo-
rithm (the weights of the network) in the next transformation
phase. In the transformation phase, the learning in the au-
toencoder directly implements the improvement in capacity
to compress recent environments (compression progress)
envisioned in Schmidhubers theory.
There are two differences between our model and
Schmidhubers model of creativity, however. In Schmid-
hubers model, the agent stores all observations indenitely
and always retrains its compressor on the whole history of
previous observations. As DeLeNoX resets its archive of
created artifacts in every exploration phase, it is a rather for-
getful creator. A memory could be implemented by keeping
an archive of artifacts found by novelty search in all pre-
vious exploration phases, but this would incur a high and
constantly increasing computational cost. It could however
be argued that the dependence of each phase on the previous
represents an implicit, decaying memory. The other differ-
ence to Schmidhubers mechanism is that novelty search al-
ways looks for the solution/artifact that is most different to
those that have been found so far, rather than the one pre-
dicted to improve learning the most. Assuming that the au-
toencoder compresses relatively better the more diverse the
set of artifacts is, this difference vanishes; this assumption is
likely to be true at least in the current application domain.
A case study of DeLeNoX:
Spaceship Generation
This paper presents a case study of DeLeNoX for the cre-
ation of spaceship sprites, where exploration is performed
via constrained novelty search which ensures a believable
appearance, while transformation is performed via a denois-
ing autoencoder which nds typical features in the space-
ships current representation (see Fig. 1). Search is per-
formed via neuroevolution of augmenting topologies, which
changes the representational power of the genotype and war-
! "
#
(a) CPPN
!
"
#
$ &' ( )#*+,
- &' ( #*+,
#
.
/ " # /
!
#
.
(b) Spaceship representation
Figure 2: Fig 2a shows a sample CPPN using the full range
of pattern-producing activation functions available. Fig. 2b
shows the process of spaceship generation: the coordinates
0 to x
m
, normalized as 0 to 1 (respectively) are used as input
x of the CPPN. Two C values are used for each x, resulting
in two points, top (t) and bottom (b) for each x. CPPN input
x and output y are treated as the coordinates of t and b; if t
has a higher y value than that of b then the column is empty,
else the hull extends between t and b. The generated hull is
reected vertically along x
m
.
rants the transformation of features which bias the search.
Domain Representation
Spaceships are stored as two-dimensional sprites; the space-
ships hull is shown as black pixels. Each spaceship is
encoded by a Compositional Pattern-Producing Network
(CPPN), which is able to create complex patterns via func-
tion composition (Stanley 2006). A CPPN is ideal for vi-
sual representation as it can be queried with arbitrary spa-
tial granularity (innite resolution); however, this study uses
a xed resolution for simplicity. Unlike standard articial
neural networks where all nodes have the same activation
function, each CPPN node may have a different, pattern-
producing function; six activation functions bound within
[0, 1] are used in this study (see Fig. 2a). To generate a
spaceship, the sprite is divided into a number of equidistant
columns equal to the sprites width (W) in pixels. In each
column, two points are identied as top (t) and bottom (b);
the spaceship extends from t to b, while no hull exists if t is
belowb (see Fig. 2b). The y coordinate of the top and bottom
points is the output of the CPPN; its inputs are the points x
coordinate and a constant C which differentiates between t
and b (with C = 0.5 and C = 0.5, respectively). Only
half of the sprites columns, including the middle column
at x
m
= d
W
2
e, are used to generate t and b; the remaining
columns are derived by reecting vertically along x
m
.
A sufciently expanded CPPN, as a superset of a multi-
layer perceptron, is theoretically capable of representing any
function. This means that any image could in principle be
produced by a CPPN. However, the interpretation of CPPN
output we use here means that images are severely limited to
those where each column contains at most one vertical black
bar. Additionally, the particularities of the NEAT complex-
ication process, of the activation functions used and of the
distance function which drives evolution make the system
heavily biased towards particular shapes. It is this latter bias
that is characterized within the transformation phase.
Proceedings of the Fourth International Conference on Computational Creativity 2013 58
... input layer (P)
size MH
p
1
p
p
3
p
M-1
p
M
... q
1
q
q
N
hidden layer (Q)
size N
p
3
p
M-1
p
M
d
e
c
o
d
e
r
e
n
c
o
d
e
r
max{0, 1
2w
W
} + max{0, 1
2h
H
} +
As
A
n
i=1
log
2
P
m
(S
i
|C
i,m
). Jurafsky and Martin
(2000) note that because the cross-entropy of a sequence of
symbols (according to some model) is always higher than its
true entropy, the most accurate model (i.e., the one closest
to the true entropy) must be the one with the lowest cross-
entropy. In addition, because it is a per symbol measure,
it is possible to similarly compare generated harmonisations
of any length. Harmonisations with a low cross-entropy are
likely to be simpler and more predictable to a listener, while
those with a high cross-entropy are likely to be more com-
plex, more surprising and in the extreme possibly unpleas-
ant. See Manning and Sch utze (1999) for more details on
cross-entropy.
Model Construction
Cross-entropy is also used to guide the automatic construc-
tion of multiple viewpoint systems. Viewpoints are added
(and sometimes removed) from a system stage by stage.
Each candidate systemis used to calculate the average cross-
entropy of a ten-fold cross-validation of the corpus. The sys-
tem producing the lowest cross-entropy goes on to the next
stage of the selection process. For example, starting with the
basic system {Duration, Pitch}, of all the viewpoints
tried let us assume that ScaleDegree lowers the cross-
entropy most on its addition. Our system now becomes
{Duration, Pitch, ScaleDegree}. Duration can-
not be removed at this stage, as a Duration-predicting
viewpoint must be present. Assuming that on remov-
ing Pitch the cross-entropy rises, Pitch is also re-
tained. Let us now assume that after a second round
of addition we have the system {Duration, Pitch,
ScaleDegree, Interval}. Trying all possible dele-
tions, we may now nd that the cross-entropy decreases on
the removal of Pitch, giving us the system {Duration,
ScaleDegree, Interval}. The process continues until
no addition can be found to lower the cross-entropy by a pre-
determined minimum amount. When selection is complete,
the biases are optimised.
Development of Multiple Viewpoints
The modelling of melody is relatively straightforward,
in that a melody comprises a single sequence of non-
overlapping notes. Such a sequence is ideal for creating
N-grams. Harmony is much more complex, however. Not
Proceedings of the Fourth International Conference on Computational Creativity 2013 80
only does it consist (for our purposes) of four interrelated
parts, but it usually contains overlapping notes. In other
words, music is usually not homophonic; indeed, very fewof
the major key hymn tune harmonisations (Vaughan Williams
1933) in our corpora are completely homophonic. Some pre-
processing of the music is necessary, therefore, to make it
amenable to modelling by means of N-grams. We use full
expansion on our corpora (corpus A and corpus B each
contain fty harmonisations), which splits notes where nec-
essary to achieve a sequence of block chords (i.e., without
any overlapping notes). This technique has been used be-
fore in relation to viewpoint modelling (Conklin 2002). To
model harmony correctly, however, we must know which
notes have been split. Basic viewpoint Cont is therefore
introduced to distinguish between notes which are freshly
sounded and those which are a continuation of the preced-
ing one. Currently, the basic viewpoints (or attributes) are
predicted at each point in the sequence in the following or-
der: Duration, Cont and then Pitch.
Version 1
The starting point for the denition of the strictest possible
application of viewpoints is the formation of vertical view-
point elements (Conklin 2002). An example of such an el-
ement is 69, 64, 61, 57, where all of the values are from
the domain of the same viewpoint (i.e., Pitch, as MIDI
values), and all of the parts (SATB) are represented. This
method reduces the entire set of parallel sequences to a sin-
gle sequence, thus allowing an unchanged application of the
multiple viewpoint framework, including its use of PPM.
Only those elements containing the given soprano note are
allowed in the prediction probability distribution, however.
This is the base-level model, to be developed with the aim
of substantially improving performance.
Version 2
In this version, it is hypothesised that predicting all unknown
symbols in a vertical viewpoint element at the same time is
neither necessary nor desirable. It is anticipated that by di-
viding the overall harmonisation task into a number of sub-
tasks (Allan and Williams 2005; Hild, Feulner, and Menzel
1992), each modelled by its own multiple viewpoint system,
an increase in performance can be achieved. Here, a subtask
is the prediction or generation of at least one part; for exam-
ple, given a soprano line, the rst subtask might be to predict
the entire bass line. This version allows us to experiment
with different arrangements of subtasks. As in version 1,
vertical viewpoint elements are restricted to using the same
viewpoint for each part. The difference is that not all of the
parts are now necessarily represented in a vertical viewpoint
element.
Comparison of Subtask Combinations
In this section we carry out the prediction of bass given
soprano, alto/tenor given soprano/bass, tenor given so-
prano, alto/bass given soprano/tenor, alto given soprano, and
tenor/bass given soprano/alto (i.e., prediction in two stages),
in order to ascertain the best performing combination for
subsequent comparisons. Prediction in three stages is not
considered here because of time limitations.
Earlier studies in the realm of melodic modelling re-
vealed that the model which performed best was an LTMup-
dated after every prediction in conjunction with an STM (a
BOTH+ model) using weighted geometric distribution com-
bination. Time constraints dictate the assumption that such a
model is likely to perform similarly well with respect to the
modelling of harmony. In addition, only corpus A, a bias
of 2 and an L-S bias of 14 are used for viewpoint selection
(as for the best melodic BOTH+ runs using corpus A). As
usual, the biases are optimised after completion of selection.
Here, we predict Duration, Cont and Pitch together
(i.e., using a single multiple viewpoint system at each pre-
diction stage). We also use the seen Pitch domain at this
juncture (i.e., the domain of Pitch vertical viewpoint el-
ements seen in the corpus, as opposed to all possible such
elements).
It is appropriate at this point to make some general obser-
vations about the bar charts presented in this paper. Compar-
isons are made for a range of h (maximum N-gram order)
from 0 to 5. Each value of h may have a different automati-
cally selected multiple viewpoint system. Please note that all
bar charts have a cross-entropy range of 2.5 bits/prediction,
often not starting at zero. All bars have standard errors as-
sociated with them, calculated from the cross-entropies ob-
tained during ten-fold cross-validation (using nal multiple
viewpoint systems and optimised biases).
Figure 1 compares the prediction of alto given soprano,
tenor given soprano, and bass given soprano. The rst thing
to notice is that the error bars overlap. This could be taken
to mean that we cannot (or should not) draw conclusions in
such cases; however, the degree of overlap and the consis-
tency of the changes across the range of h is highly sugges-
tive of the differences being real. A clinching quantitative
argument is reserved until consideration of Figure 3. Pre-
diction of the alto part has the lowest cross-entropy and pre-
diction of the bass has the highest across the board. This is
very likely to be due to the relative number of elements in
the Pitch domains for the individual parts (i.e., 18, 20 and
23 for alto, tenor and bass respectively). The lowest cross-
entropies occur at an h of 1 except for the bass, which has
its minimum at an h of 2 (this cross-entropy is only very
slightly lower than that for an h of 1, however).
There is a completely different picture for the nal stage
of prediction. Figure 2 shows that, having predicted the alto
part with a low cross-entropy, the prediction of tenor/bass
has the highest. Similarly, the high cross-entropy for the
prediction of the bass is complemented by an exceptionally
lowcross-entropy for the prediction of alto/tenor (notice that
the error bars do not overlap with those of the other predic-
tion combinations). Once again, this can be explained by
the number of elements in the part domains: the sizes of the
cross-product domains are 460, 414 and 360 for tenor/bass,
alto/bass and alto/tenor respectively. Although we are not
using cross-product domains, it is likely that the seen do-
mains are in similar proportion. The lowest cross-entropies
occur at an h of 1.
Combining the two stages of prediction, we see in Fig-
Proceedings of the Fourth International Conference on Computational Creativity 2013 81
! ! ! ! ! !
! #$%&' (
! #$%&' (
! #$%&' (
Maximum N-gram Order
C
r
o
s
s
-
e
n
t
r
o
p
y
(
b
i
t
s
/
p
r
e
d
i
c
t
i
o
n
)
!"#
!"#
!"#
!"#
!"#
!"#
Figure 1: Bar chart showing how cross-entropy varies
with h for the version 2 prediction of alto given soprano,
tenor given soprano, and bass given soprano using the seen
Pitch domain. Duration, Cont and Pitch are pre-
dicted using a single multiple viewpoint system at each pre-
diction stage.
ure 3 that predicting bass rst and then alto/tenor has the
lowest cross-entropy. Notice, however, that the error bars of
this model overlap with those of the other models. This is a
critical comparison, requiring a high degree of condence in
the conclusions we are drawing. Let us look at the h = 1 and
h = 2 comparisons in more detail, as they are particularly
pertinent. In both cases, all ten cross-entropies produced by
ten-fold cross-validation are lower for B then AT than for A
then TB; and nine out of ten are lower for B then AT than for
T then AB. The single increase is 0.11 bits/chord for an h of
1 and 0.09 bits/chord for an h of 2 compared with a mean
decrease of 0.22 bits/chord for the other nine values in each
case. This demonstrates that we can have far greater con-
dence in the comparisons than the error bars might suggest.
A likely reason for this is that there is a range of harmonic
complexity across the pieces in the corpus which is reected
as a range of cross-entropies (ultimately due to composi-
tional choices). This inherent cross-entropy variation seems
to be greater than the true statistical variation applicable to
these comparisons.
We can be condent, then, that predicting bass rst
and then alto/tenor is best, reecting the usual human ap-
proach to harmonisation. The lowest cross-entropy is 4.98
bits/chord, occurring at an h of 1. Although having the same
cross-entropy to two decimal places, the very best model
combines the bass-predicting model using an h of 2 (opti-
mised bias and L-S bias are 1.9 and 53.2 respectively) with
the alto/tenor-predicting model using an h of 1 (optimised
bias and L-S bias are 1.3 and 99.6 respectively).
Table 1 gives some idea of the complexity of the multi-
ple viewpoint systems involved, listing as it does the rst six
viewpoints automatically selected for the prediction of bass
given soprano (h = 2) and alto/tenor given soprano/bass
! ! ! ! ! !
TB given SA
AB given ST
AT given SB
Maximum N-gram Order
C
r
o
s
s
-
e
n
t
r
o
p
y
(
b
i
t
s
/
p
r
e
d
i
c
t
i
o
n
)
!"#
!"#
!"#
!"#
!"#
!"#
Figure 2: Bar chart showing howcross-entropy varies with h
for the version 2 prediction of tenor/bass given soprano/alto,
alto/bass given soprano/tenor and alto/tenor given so-
prano/bass using the seen Pitch domain. Duration,
Cont and Pitch are predicted using a single multiple
viewpoint system at each prediction stage.
(h = 1). Many of the primitive viewpoints involved have
already been dened or are intuitively obvious. LastIn-
Phrase and FirstInPiece are either true of false, and
Piece has three values: rst in piece, last in piece or other-
wise. Metre is more complicated, being an attempt to de-
ne metrical equivalence within and between bars of various
time signatures. Notice that only two of the viewpoints are
common to both systems. In fact, of the twenty-four view-
points in the B given S system and twelve in the AT given SB
system, only ve are common. This demonstrates the degree
to which the systems have specialised in order to carry out
these rather different tasks. The difference in the size of the
systems suggests that the prediction of the bass part is more
complicated than that of the inner parts, as reected in the
difference in cross-entropy.
The Effect of Model Order
Figure 1 indicates that, for example, there is only a small re-
duction in cross-entropy from h = 0 to h = 1. The degree
of error bar overlap means that even this small reduction is
questionable. Is it possible that there is no real difference
in performance between a model using unconditional proba-
bilities and one using the shortest of contexts? Let us, in the
rst place, examine the individual ten-fold cross-validation
cross-entropy values. All ten of these values are lower for
an h of 1, giving us condence that there is indeed a small
improvement. Having established that, however, it would be
useful to explain why the improvement is perhaps smaller
than we might have expected.
One important reason for the less than impressive im-
provement is that although the h = 0 model is nominally
unconditional, the viewpoints Interval, DurRatio and
Interval Tactus appear in the h = 0 multiple view-
Proceedings of the Fourth International Conference on Computational Creativity 2013 82
! ! ! ! ! !
! #$%& '(
! #$%& '(
! #$%& '(
Maximum N-gram Order
C
r
o
s
s
-
e
n
t
r
o
p
y
(
b
i
t
s
/
c
h
o
r
d
)
!"#
!"#
!"#
!"#
!"!
!"#
Figure 3: Bar chart showing how cross-entropy varies with
h for the version 2 prediction of alto then tenor/bass, tenor
then alto/bass and bass then alto/tenor given soprano using
the seen Pitch domain. Duration, Cont and Pitch
are predicted using a single multiple viewpoint system at
each prediction stage.
point system (linked with other viewpoints). These three
viewpoints make use of attributes of the preceding chord;
therefore with respect to predicted attributes Duration
and Pitch, this model is partially h = 1. This hidden con-
ditionality is certainly enough to substantially improve per-
formance compared with a completely unconditional model.
Another reason is quite simply that the corpus has failed
to provide sufcient conditional statistics; in other words,
the corpus is too small. This is the fundamental reason for
the performance dropping off above an h of 1 or 2. We
would expect peak performance to shift to higher values of
h as the quantity of statistics substantially increases. Sup-
porting evidence for this is provided by our modelling of
melody. Much better melodic statistics can be gathered from
Viewpoint B AT
Pitch
Interval InScale
Cont TactusPositionInBar
Duration (ScaleDegree LastInPhrase)
Interval (ScaleDegree Tactus)
ScaleDegree Piece
Cont Interval
DurRatio TactusPositionInBar
ScaleDegree FirstInPiece
Cont Metre
Table 1: List of the rst six viewpoints automatically se-
lected for the prediction of bass given soprano (B, h = 2)
and alto/tenor given soprano/bass (AT, h = 1).
the same corpus because the Pitch domain is very much
smaller than it is for harmony. A BOTH+ model shows a
large fall in cross-entropy from h = 0 to h = 1 (with error
bars not overlapping), while peak performance occurs at an
h of 3.
Figure 2 reveals an even worse situation with respect to
performance differences across the range of h. For TB given
SA, for example, it is not clear that there is a real improve-
ment from h = 0 to h = 1. In this case, there is a reduction
in ve of the ten-fold cross-validation cross-entropy values,
but an increase in the other ve. This is almost certainly due
to the fact that, having xed the soprano and alto notes, the
number of tenor/bass options are severely limited; so much
so, that conditional probabilities can rarely be found. This
situation should also improve with increasing corpus size.
Separate Prediction of Attributes
We now investigate the use of separately selected and op-
timised multiple viewpoint systems for the prediction of
Duration, Cont and Pitch. Firstly, however, let us con-
sider the utility of creating an augmented Pitch domain.
Approximately 400 vertical Pitch elements appear in cor-
pus B which are not present in corpus A, and there are
undoubtedly many more perfectly good chords which are
absent from both corpora. Such chords are unavailable for
use when the models generate harmony, and their absence
must surely skew probability distributions when predicting
existing data. One solution is to use a full Cartesian prod-
uct; but this is known to result in excessively long run times.
Our preferred solution is to transpose chords seen in the cor-
pus up and down, a semitone at a time, until one of the
parts goes out of the range seen in the data. Such elements
not previously seen are added to the augmented Pitch do-
main. Derived viewpoints such as ScaleDegree are able
to make use of the extra elements. We shall see shortly that
this change increases cross-entropies dramatically; but since
this is not a like-for-like comparison, it is not an indication
of an inferior model.
Figure 4 shows that better models can be created by se-
lecting separate multiple viewpoint systems to predict indi-
vidual attributes, rather than a single system to predict all
of them. The difference in cross-entropy is quite marked,
although there is a substantial error bar overlap. An h of 1
is optimal in both cases. All ten cross-entropies produced
by ten-fold cross-validation are lower for the separate sys-
tem case, providing condence that the improvement is real.
The lowest cross-entropy for separate prediction at h = 1
is 5.44 bits/chord, compared with 5.62 bits/chord for predic-
tion together. The very best model for separate prediction,
with a cross-entropy of 5.35 bits/chord, comprises the best
performing systems of whatever the value of h.
Comparison of Version 1 with Version 2
A comparison involving Duration, Cont and Pitch
would show that version 2 has a substantially higher cross-
entropy than version 1. This is due to the fact that whereas
the duration of an entire chord is predicted only once in ver-
sion 1, it is effectively predicted twice (or even three times)
Proceedings of the Fourth International Conference on Computational Creativity 2013 83
! ! ! ! ! !
!"#$%$&"'(
!"#$%&$'
Maximum N-gram Order
C
r
o
s
s
-
e
n
t
r
o
p
y
(
b
i
t
s
/
c
h
o
r
d
)
!"#
!"#
!"#
!"!
!"#
!"#
Figure 4: Bar chart showing how cross-entropy varies with
h for the version 2 prediction of bass given soprano fol-
lowed by alto/tenor given soprano/bass using the augmented
Pitch domain. The prediction of Duration, Cont and
Pitch separately (i.e., using separately selected multiple
viewpoint systems) and together (i.e., using a single multi-
ple viewpoint system) are compared.
in version 2. Prediction of Duration is set up such that,
for example, a minim may be generated in the bass given
soprano generation stage, followed by a crotchet in the nal
generation stage, whereby the whole of the chord becomes
a crotchet. This is different from the prediction and gen-
eration of Cont and Pitch, where elements generated in
the rst stage are not subject to change in the second. The
way in which the prediction of Duration is treated, then,
means that versions 1 and 2 are not directly comparable with
respect to that attribute.
By ignoring Duration prediction, and combining only
the directly comparable Cont and Pitch cross-entropies,
we can make a judgement on the overall relative perfor-
mance of these two versions. Figure 5 is strongly indicative
of version 2 performing better than version 1. Again, there is
an error bar overlap; but for an h of 1, nine out of ten cross-
entropies produced by ten-fold cross-validation are lower for
version 2; and for an h of 2, eight out of ten are lower for ver-
sion 2. The single increase for an h of 1 is 0.07 bits/chord,
compared with a mean decrease of 0.22 bits/chord for the
other nine values. The mean of the two increased values for
an h of 2 is 0.03 bits/chord, compared with a mean decrease
of 0.20 bits/chord for the other eight values.
As one might expect from experience of harmonisation,
predicting the bass rst followed by the alto and tenor is bet-
ter than predicting all of the lower parts at the same time. It
would appear that the selection of specialist multiple view-
point systems for the prediction of different parts is bene-
cial in rather the same way as specialist systems for the
prediction of the various attributes. The optimal version 2
cross-entropy, using the best subtask models irrespective of
the value of h, is 0.19 bits/prediction lower than that of ver-
! ! ! ! ! !
Version 1
Version 2
Maximum N-gram Order
C
r
o
s
s
-
e
n
t
r
o
p
y
(
b
i
t
s
/
p
r
e
d
i
c
t
i
o
n
)
!"#
!"#
!"#
!"#
!"!
!"#
Figure 5: Bar chart showing how cross-entropy varies with
h for the separate prediction of Cont and Pitch in the alto,
tenor and bass given soprano using the augmented Pitch
domain, comparing version 1 with version 2.
sion 1.
Finally, the systems selected using corpus A are used in
conjunction with corpus A+B. Compared with Figure 5,
Figure 6 shows a much larger drop in cross-entropy for ver-
sion 1 than for version 2: indeed, the bar chart shows the
minimum cross-entropies to be exactly the same. Allowing
for a true variation smaller than that suggested by the error
bars, as before, we can certainly say that the minimumcross-
entropies are approximately the same. The only saving grace
for version 2 is that the error bars are slightly smaller. We
can infer from this that version 1 creates more general mod-
els, better able to scale up to larger corpora which may de-
viate somewhat from the characteristics of the original cor-
pus. Conversely, version 2 is capable of constructing models
which are more specic to the corpus for which they are se-
lected. This hypothesis can easily be tested by carrying out
viewpoint selection in conjunction with corpus A+B (al-
though this would be a very time-consuming process).
Notice that there are larger reductions in cross-entropy
from h = 0 to h = 1 in Figure 6 than in Figure 5. The
only difference between the two sets of runs is the corpus
used; therefore this performance change must be due to the
increased quantity of statistics gathered froma larger corpus,
as predicted earlier in the paper.
Generated Harmony
Generation is achieved simply by random sampling of over-
all prediction probability distributions. Each prediction
probability has its place in the total probability mass; for ex-
ample, attribute value X having a probability of 0.4 could be
positioned in the range 0.5 to 0.9. A random number from 0
to 1 is generated, and if this number happens to fall between
0.5 and 0.9 then X is generated.
It was quickly very obvious, judging by the subjective
quality of generated harmonisations, that a modication
Proceedings of the Fourth International Conference on Computational Creativity 2013 84
! ! ! ! ! !
Version 1
Version 2
Maximum N-gram Order
C
r
o
s
s
-
e
n
t
r
o
p
y
(
b
i
t
s
/
p
r
e
d
i
c
t
i
o
n
)
!"#
!"#
!"#
!"#
!"!
!"#
Figure 6: Bar chart showing how cross-entropy varies with
h for the separate prediction of Cont and Pitch in the alto,
tenor and bass given soprano using the augmented Pitch
domain and corpus A+B with systems selected using cor-
pus A, comparing versions 1 and 2.
to the generation procedure would be required to produce
something coherent and amenable to comparison. The prob-
lem was that randomsampling sometimes generated a chord
of very low probability, which was bad in itself because it
was likely to be inappropriate in its context; but also bad be-
cause it then formed part of the next chords context, which
had probably rarely or never been seen in the corpus. This
led to the generation of more low probability chords, re-
sulting in harmonisations of much higher cross-entropy than
those typically found in the corpus (quantitative evidence
supporting the subjective assessment). The solution was to
disallow the use of predictions below a chosen value, the
probability threshold, dened as a fraction of the highest
prediction probability in a given distribution. This denition
ensures that there is always at least one usable prediction
in the distribution, however high the fraction (probability
threshold parameter). Bearing in mind that an expert mu-
sician faced with the task of harmonising a melody would
consider only a limited number of the more likely options
for each chord position, the removal of low probability pre-
dictions was considered to be a reasonable solution to the
problem. Separate thresholds have been implemented for
Duration, Cont and Pitch, and these thresholds may
be different for different stages of generation. It is hoped
that as the models improve, the thresholds can be reduced.
The probability thresholds of models used for generat-
ing harmony are optimised such that the cross-entropy of
each subtask, averaged across twenty harmony generation
runs using the ten melodies from test dataset A+B, approx-
imately matches the corresponding prediction cross-entropy
obtained by ten-fold cross-validation of corpus A+B.
One of the more successful harmonisations of hymn tune
Das walt Gott Vater (Vaughan Williams 1933, hymn no.
36), automatically generated by the best version 1 model
with optimised probability threshold parameters, is shown
in Figure 7. It is far from perfect, with the second phrase
being particularly uncharacteristic of the corpus. There are
two parallel fths in the second bar and another at the begin-
ning of the fourth bar. The bass line is not very smooth, due
to the many large ascending and descending leaps.
One of the more successful harmonisations of the same
hymn tune, automatically generated by the best version 2
model with optimised probability threshold parameters, is
shown in Figure 8. The rst thing to notice is that the bass
line is more characteristic of the corpus than that of the ver-
sion 1 harmonisation. This could well be due to the fact that
this version employs specialist systems for the prediction of
bass given soprano. It is rather jumpy in the last phrase,
however, and in the nal bar there is a parallel unison with
the tenor. The second chord of the second bar does not t
in with its neighbouring chords, and there should be a root
position tonic chord on the third beat of the fourth bar. On
the positive side, there is a ne example of a passing note
at the beginning of the fth bar; and the harmony at the end
of the third phrase, with the chromatic tenor movement, is
rather splendid.
Conclusion
The rst set of version 2 viewpoint selection runs, for at-
tribute prediction together using the seen Pitch domain,
compare different combinations of two-stage prediction. By
far the best performance is obtained by predicting the bass
part rst followed by the inner parts together, reecting the
usual human approach to harmonisation. It is interesting to
note that this heuristic, almost universally followed during
harmonisation, therefore has an information theoretic expla-
nation for its success.
Having demonstrated the extent to which multiple view-
point systems have specialised in order to carry out these
two rather different prediction tasks, we use an even greater
number of specialist systems in a second set of runs. These
show that better models can be created by selecting separate
multiple viewpoint systems to predict individual musical at-
tributes, rather than a single system to predict them all.
In comparing version 1 with version 2, only Cont and
Pitch are taken into consideration, since the prediction of
Duration is not directly comparable. On this basis, ver-
sion 2 is better than version 1 when using corpus A, which
again tallies with human experience of harmonisation; but
when corpus A+B is used, their performance is identical.
We can infer from this that version 1 creates more gen-
eral models, better able to scale up to larger corpora which
may deviate somewhat from the characteristics of the origi-
nal corpus. Conversely, version 2 is capable of constructing
models which are more specic to the corpus for which they
are selected.
Acknowledgements
We wish to thank the three anonymous reviewers for their
constructive and insightful comments, which greatly im-
proved this paper.
Proceedings of the Fourth International Conference on Computational Creativity 2013 85
5
&
b
b
b
b
b
?
b
b
b
&
b
b
b
?
b
b
b
J
j
Figure 7: Relatively successful harmonisation of hymn tune Das walt Gott Vater (Vaughan Williams 1933, hymn no. 36)
automatically generated by the best version 1 model with optimised probability threshold parameters, using corpus A+B.
5
&
b
b
b
?
b
b
b
n b
&
b
b
b
?
b
b
b
n b
b
n
Figure 8: Relatively successful harmonisation of hymn tune Das walt Gott Vater (Vaughan Williams 1933, hymn no. 36)
automatically generated by the best version 2 model with optimised probability threshold parameters, using corpus A+B.
References
Allan, M., and Williams, C. K. I. 2005. Harmonising
chorales by probabilistic inference. In L. K. Saul; Y. Weiss;
and L. Bottou., eds., Advances in Neural Information Pro-
cessing Systems, volume 17. MIT Press.
Cleary, J. G., and Witten, I. H. 1984. Data compression us-
ing adaptive coding and partial string matching. IEEE Trans
Communications COM-32(4):396402.
Conklin, D., and Witten, I. H. 1995. Multiple viewpoint sys-
tems for music prediction. Journal of New Music Research
24(1):5173.
Conklin, D. 1990. Prediction and entropy of music. Mas-
ters thesis, Department of Computer Science, University of
Calgary, Canada.
Conklin, D. 2002. Representation and discovery of verti-
cal patterns in music. In C. Anagnostopoulou; M. Ferrand;
and A. Smaill., eds., Music and Articial Intelligence: Proc.
ICMAI 2002, LNAI 2445, 3242. Springer-Verlag.
Hild, H.; Feulner, J.; and Menzel, W. 1992. Harmonet:
A neural net for harmonizing chorales in the style of J. S.
Bach. In R. P. Lippmann; J. E. Moody; and D. S. Touretzky.,
eds., Advances in Neural Information Processing Systems,
volume 4, 267274. Morgan Kaufmann.
Jurafsky, D., and Martin, J. H. 2000. Speech and Language
Processing. New Jersey: Prentice-Hall.
Manning, C. D., and Sch utze, H. 1999. Foundations of
Statistical Natural Language Processing. MIT Press.
Pearce, M. T.; Conklin, D.; and Wiggins, G. A. 2004. Meth-
ods for combining statistical models of music. In Proceed-
ings of the Second International Symposium on Computer
Music Modelling and Retrieval.
Vaughan Williams, R., ed. 1933. The English Hymnal. Ox-
ford University Press.
Proceedings of the Fourth International Conference on Computational Creativity 2013 86
Automatical Composition of Lyrical Songs
Jukka M. Toivanen and Hannu Toivonen and Alessandro Valitutti
HIIT and Department of Computer Science
University of Helsinki
Finland
Abstract
We address the challenging task of automatically com-
posing lyrical songs with matching musical and lyrical
features, and we present the rst prototype, M.U. Sicus-
Apparatus, to accomplish the task. The focus of this pa-
per is especially on generation of art songs (lieds). The
proposed approach writes lyrics rst and then composes
music to match the lyrics. The crux is that the music
composition subprocess has access to the internals of
the lyrics writing subprocess, so the music can be com-
posed to match the intentions and choices of lyrics writ-
ing, rather than just the surface of the lyrics. We present
some example songs composed by M.U. Sicus, and we
outline rst steps towards a general system combining
both music composition and writing of lyrics.
Introduction
Creation of songs, combinations of music and lyrics, is a
challenging task for computational creativity. Obviously,
song writing requires creative skills in two different areas:
composition of music and writing of lyrics. However, these
two skills are not sufcient: independent creation of an ex-
cellent piece of music and a great text does not necessar-
ily result in a good song. The combination of lyrics and
music could sound poor (e.g., because the music and lyrics
express conicting features) or be downright impossible to
perform (e.g., due to a gross mismatch between pronuncia-
tion of lyrics and rhythm of the melody).
A crucial challenge in computational song writing is to
produce a coherent, matching pair of music and lyrics.
Given that components exist for both individual creative
tasks, it is tempting to consider one of the two following
sequential approaches to song writing:
First write the lyrics (e.g. a poem). Then compose music
to match the generated lyrics.
Or:
First compose the music. Then write lyrics to match the
melody.
Obviously, each individual component of the process should
produce results that are viable to be used in songs. In ad-
dition, to make music and lyrics match, the second step
should be able to use the result from the rst step as its guid-
ance. Consider, for instance, the specic case where lyrics
are written rst. They need to be analyzed so that matching
music can be composed.
Several issues arise here. The rst challenge is to make
such a modular approach work on a surface level. For in-
stance, pronunciation, syllable lengths, lengths of pauses,
and other phonetic features related to the rhythm can in
many cases be analyzed by existing tools. The composition
process should then be able to work under constraints set by
these phonetic features, to produce notes and rhythmic pat-
terns matching the phonetics. Identication of relevant types
of features, their recognition in the output of the rst step of
the process, and eventually generation of matching features
in the second step of the process are not trivial tasks.
The major creative bottleneck of the simple process out-
lined above is making music and lyrics match each other at a
deeper level, so that they jointly express the messages, emo-
tions, feelings, or whatever the intent of the creator is. The
pure sequential approach must rely on analysis of the lyrics
to infer the intended meaning of the author. Affective text
analysis may indicate emotions, and clever linguistic anal-
ysis may reveal words with more emphasis. However, text
analysis techniques face the great challenge of natural lan-
guage understanding. They try to work backwards from the
words to the meaning the author had in mind. In the case of
composing music rst and then writing corresponding lyrics,
the task is equally challenging.
Fortunately, in an integrated computational song writing
system, the second step can have access to some information
about the creative process of the rst step, to obtain an inter-
nal understanding of its intentions and choices. Figuratively
speaking, instead of analyzing the lyrics to guess what was
in the mind of the lyricist, the composer looks directly in-
side the head of the lyricist. We call this approach informed
sequential song writing (Figure 1). In this model, informa-
tion for the music composition process comes directly from
the lyrics writing process, as well as from text analysis and
user-given input.
In this paper we study and propose an instance of the in-
formed sequential song writing approach. The presented
system, M.U. Sicus-Apparatus, writes lyrics rst and then
composes matching music. Since lyrics generation is in this
approach independent of music composition, our emphasis
will be on the latter. Empirical evaluation of the obtained
results is left for future work.
Proceedings of the Fourth International Conference on Computational Creativity 2013 87
Figure 1: Schema of the informed sequential song genera-
tion
Art Songs
Songs can be divided in rough categories like art, folk, and
pop songs. This paper concentrates on the genre of so called
art songs which are often referred to as lieds in the German
tradition or m elodies in the French tradition. Art songs are a
particularly interesting category of compositions with strong
interaction of musical and lyrical features. Finest examples
of this class include the songs composed by F. Schubert. Art
songs are composed for performance, usually with piano ac-
companiment, although the accompaniment may be written
for an orchestra or a string quartet as well.
1
Art songs are always notated and the accompaniment,
which is considered to be an important part of the compo-
sition, is carefully written to suit the overall structure of the
song. The lyrics are often written by a poet or lyricist and
the music separately by a composer. The lyrics of songs
are typically of a poetic, rhyming nature, though they may
be free prose, as well. Quite often art songs are through-
composed which means that each section of the lyrics goes
with fresh music. In contrast, folk songs and some art songs
are strophic which means that all the poems verses are sung
to the same melody, sometimes possibly with little varia-
tions. In this paper, we concentrate on through-composed
art songs with vocal melody, lyrics, and piano accompani-
ment.
Related Work on Music and Poetry
Generation
Generation of music and poetry on their own right have been
studied separately in the eld of computational creativity
and there have been a few attempts to study the interac-
tion of textual and musical features (Mihalcea and Strap-
1
Sometimes songs with other instruments besides piano are re-
ferred to as vocal chamber music and songs for voice and orchestra
are called orchestral songs.
parava 2012). Some attempts have also been made to com-
pose musical accompaniments for text (Monteith et al. 2011;
Monteith, Martinez, and Ventura 2012). Interestingly how-
ever, generation of lyrical songs has received little attention
in the past. Because of the lack of earlier work on com-
bining music and lyrics in a single generative system, we
next briey review work done in the area of music and po-
etry/lyrics generation separately.
Song Generation
Composing music algorithmically is an old and much stud-
ied eld. Several different approaches and method com-
binations have been used to accomplish this task (Roads
1996). One of the most well-known examples, usually
known as Mozarts Musikalisches W urfelspiel, dates back
to the year 1792, long before modern computers. Many mu-
sical procedures such as voice-leading in Western counter-
point can be reduced to algorithmic determinacy. Addition-
ally algorithms originally invented in other elds than music
such as L-systems, fractals, constraint based methods, Hid-
den Markov Models, and conversion of arbitrary data like
electro-magnetic elds into music, have been used as the ba-
sis for music composition. A review of the approaches used
in algorithmic music composition is outside the scope of this
paper. For example, Roads (1996) presents a good overview
of different methodologies.
Monteith et al. (2012) have proposed a model of generat-
ing melodic accompaniments for given lyrics. This approach
concentrates on the extraction of linguistic stress patterns
and composition of a melody with matching note lengths
and fullment of certain aesthetic metrics for musical and
linguistic match. Differently from this approach, our system
composes all aspects of a song including the lyrics, harmony,
and melody, and thus it is not limited to the musicalization
of existing lyrics. It also employs an informed-sequential
architecture and thus the integration of lyrics writing and
music composition subprocesses is tighter.
Poetry or Lyrics Generation
A number of approaches and techniques exist for automatic
generation of poetry (Manurung, Ritchie, and Thompson
2000; Gerv as 2001; Manurung 2003; Toivanen et al. 2012).
Some systems have also been proposed to be used for gen-
erating song lyrics (Ramakrishnan, Kuppan, and Devi 2009)
and not only pure poetry. Again, a review of the approaches
used to produce poetry or lyrics automatically is outside the
scope of this paper.
Informed Sequential Song Generation
The lyrics part of the song contains the denotational content
of the song and partly some connotational aspects like word
choices and associations. In the current implementation, the
lyrics are written about a user-specied theme (e.g. life)
(Toivanen et al. 2012). The music composition module, on
the other hand, conveys only connotational information: in
the current implementation mood and intensity of the song.
The mood is a user-specied input parameter, currently sad
or happy, respectively corresponding to positive or negative
Proceedings of the Fourth International Conference on Computational Creativity 2013 88
Figure 2: Detailed structure of M.U. Sicus-Apparatus
emotional valence. Intensity corresponds to the emotional
arousal to be expressed in the song. It comes from the lyrics
writing process and illustrates how internal information of
creative processes can be passed between the subprocesses.
It is also used as a way to direct the attention to the words
expressing the input theme.
We employ the informed sequential song generation
scheme with the overall ow of Figure 2. First, the user
provides a theme (e.g., snow) and mood (e.g., happy) of
the song. M.U. Sicus-Apparatus then generates lyrics for
the song that tell about the given theme. The rhythm of
the melody is composed by a stochastic process that takes
into account the number of syllables, syllable lengths, and
punctuation of the lyrics. The harmony of the song is gen-
erated either in a randomly selected major (for happy songs)
or minor (for sad songs) according to the users input. Next
we discuss each of these phases and the overall structure of
M.U. Sicus-Apparatus in more detail.
Lyrics Generation
The lyrics for a new song consist of a verse of automati-
cally generated poetry. Typically a theme for the song is
given by the user, and the method then aims to provide a new
and grammatically well structured poemwith content related
to the theme. For lyrics generation, we use the method of
Toivanen et al. (2012). We give a short overview of the
methodology here.
The lyrics generation method is designed to avoid explicit
specications for grammar and semantics, in order to reduce
human effort in modeling natural language generation. In-
stead of explicit rule systems, the method uses existing cor-
pora and statistical methods. One of the reasons behind this
approach is also to keep language-dependency of the meth-
ods small. The system automatically learns word associa-
tions to model semantic relations. An explicit grammar is
avoided by using example instances of actual language use
and replacing the words in these instances by words related
to a given theme in suitable morphological forms.
As the lyrics writing module is writing lyrics for the song
it subsitutes varying proportions of words in a randomly se-
lected piece of text by new words (Toivanen et al. 2012).
This proportion can vary between 0% and 100% for every
individual line of lyrics although we required the overall
substitution rate to be over 50% in the experiments for this
paper. The arousal level of the song in a particular place is
determined by this substitution rate as discussed in the Dy-
namics section.
Music Generation
As an overview, M.U. Sicus-Apparatus works as follows.
The system rst generates a rhythm for the melody, based
on the phonetics of the lyrics already written. A harmoni-
cal structure is then generated, followed by generation of a
melody matching the underlying harmony. A piano accom-
paniment is generated directly from the harmony with addi-
tional rules for voice leading and different accompaniment
styles. Finally the resulting song is transformed to a music
sheet and a midi le. We next discuss each of the phases in
some more detail.
Affective Content Affective connotation has a central role
in the overall process. It is provided by the combination of
two elements. The rst one is the emotional valence, ex-
pressing the input mood via harmony and melody. The sec-
ond element is intensity, expressing emergent information of
the lyrics writing process (i.e. word replacement rates, see
below).
Rhythm of the Melody The rhythm generation procedure
takes into account the number of syllables in the text, lengths
of the syllables, and punctuation. Words in the lyrics are
broken into syllables and the procedure assigns for every
word a rhythmic element with equally many notes as there
are syllables in the word. These rhythmic elements are ran-
domly chosen from a set of rhythmic patterns usually found
in art songs so that in addition to the number of syllables also
the syllable lengths constrain the set of possible candidates.
Longer syllables get usually longer time values and shorter
syllables get usually shorter time values. The punctuation
mark is often stressed with a rest in the melody rhythm.
Harmony The harmony is composed according to the
user-specied mood. If the valence polarity of the mood
is positive the key signature is constrained to major and then
randomly selected from the set of possible major keys. In
the opposite case the key is selected from the set of minor
keys.
The system database contains different sets of harmonic
patterns regularly found in diatonic western classical mu-
sic for major and minor keys. The construction of har-
mony is based on a second-order Markov-chain selections
of these harmonic patterns and expression of these as chord
sequences in a given key. A typical harmonic pattern is, for
instance, the chord sequence I, IV, V. When dealing with
minor keys, harmonic minor scale is used. The harmony
generation procedure also assigns time values for each of
the chords in a probabilistic manner so that the length of
the generated harmonical structure matches the length of the
Proceedings of the Fourth International Conference on Computational Creativity 2013 89
melody rhythm generated earlier. Usually each chord is as-
signed a time value of either half note or a whole note. After
generating the sequence of chords, the method moves on to
determine pitches of the melody notes.
Melody The melody note pitches are generated on the ba-
sis of the underlying harmony and pitch of the previous note
by a random walk. Firstly, the underlying chord denes a
probability distribution for pitches which can be used. For
example, if the underlying chord is C major as well as the
key signature, the notes c, e, and g are quite probable candi-
dates, a, f and d are less probable and h is even less probable.
Secondly, the pitch of the previous note affects the pitch of
the next note in a way that small intervals between these two
notes are more probable than large intervals. Finally, the
note pitch is generated according to a combined probability
distribution that is a product of the probability distribution
determined by the underlying chord and the probability dis-
tribution determined by the previous melody note.
Accompaniment and Voice Leading The harmonical
structure provides the basic building blocks of the accom-
paniment but the chord sequence can be realised in many
styles. Currently, we have implemented several different
styles like Alberti bass and other chord patterns.
In order to have smooth transitions between chords in the
accompaniment, we apply a simple model of voice lead-
ing. For a given chord sequence our current implementation
chooses chord inversions that lead to minimal total move-
ment i.e. smallest sum of intervals, of simultaneous voices.
Dynamics The arousal level of the song is expressed as
dynamic marks in the music. Higher arousal is associated
with higher loudness (e.g. forte) and lower arousal is asso-
ciated with more peaceful songs (e.g. piano). For every line
of lyrics this proportion of substituted words (S) in a line
of poetry is expressed in the music either as piano (p, S <
25%), mezzo-piano (mp, 25% < S < 50%), mezzo-forte
(mf, 50% < S < 75%), or forte (f, 75% < S < 100%).
Output The system outputs both sheet music to be per-
formed by musicians and a midi le to be played through a
synthesizer. The music engraving is produced with the Lily-
Pond music score language (Nienhuys and Nieuwenhuizen
2003).
Examples
Figures 3 and 4 contain two example songs generated by
M.U. Sicus-Apparatus
2
. The song in Figure 3 is a sad one
about life, and the one in Figure 4 is a happy song about
ower buds. The words that have been emphasised by the
lyrics writing process are marked in bold in lyrics.
The proposed methodology seems to provide relatively
good combinations of text and music. As explained above,
the transmission of information on song dynamics comes di-
rectly from the lyrics writing process. This is interesting
because that particular information would be impossible to
extract directly from the lyrics itself. For instance, in the
2
These and other songs are available also in midi form at
http://www.cs.helsinki./discovery/mu-sicus
usaraus
uu
hn
aan
hu
i
1
0.75
|I| 1
r(i) (3)
An initial blank white image of size 2000 2000 pixels is
created and the set of scaled icons are drawn onto the blank
Proceedings of the Fourth International Conference on Computational Creativity 2013 99
image at randomlocations, the only constraints being that no
icons are allowed to overlap and no icons are allowed to ex-
tend beyond the border of the image. The result is a collage
of icons that represents the original concept. DARCI then
randomly selects an adjective from the set returned by the
semantic memory model weighted by each adjectives asso-
ciation strength. DARCI uses its adjective rendering com-
ponent, described in prior work, to render the collage image
according to the selected adjective (Norton, Heath, and Ven-
tura 2011; 2013; Heath, Norton, and Ventura 2013). The
nal image will both be artistic and in some way commu-
nicate the concept to the viewer. Figure 1 shows how this
process is incorporated into the full system.
Similarity Metric
To render an image, DARCI uses a genetic algorithm to dis-
cover a combination of lters that will render a source image
(in this case, the collage) to match a specied adjective. The
tness function for this process combines an adjective met-
ric and an interest metric. The former measures how effec-
tively a potential rendering, or phenotype, communicates the
adjective, and the latter measures the difference between
the phenotype and the source image. Both metrics use only
global image features and so fail to capture important local
image properties correlated with image content.
In this paper we introduce a third metric, similarity, that
borrows from the growing research on bag-of-visual-word
models (Csurka et al. 2004; Sivic et al. 2005) to analyze
local features, rather than global ones. Typically, these in-
terest points are those points in an image that are the most
surprising, or said another way, the least predictable. After
an interest point is identied, it is described with a vector
of features obtained by analyzing the region surrounding the
point. Visual words are quantized local image features. A
dictionary of visual words is dened for a domain by ex-
tracting local interest points from a large number of repre-
sentative images and then clustering them (typically with k-
means) by their features into n clusters, where n is the de-
sired dictionary size. With this dictionary, visual words can
be extracted from any image by determining which clusters
the images local interest points belong. A bag-of-visual-
words for the image can then be created by organizing the
visual word counts for the image into a xed vector. This
model is analogous to the bag-of-words construct for text
documents in natural language processing.
For the new similarity metric, we rst create a bag-of-
visual-words for the source image and each phenotype, and
then calculate the Euclidean distance between these two vec-
tors. This metric has the effect of measuring the number of
interest points that coincide between the two images.
We use the standard SURF (Speeded-Up Robust Features)
detector and descriptor to extract interest points and their
features from images (Bay et al. 2008). SURF quickly iden-
ties interest points using an approximation of the difference
of Gaussians function, which will often identify corners and
distinct edges within images. To describe each interest point,
SURF rst assigns an orientation to the interest point based
on surrounding gradients. Then, relative to this orientation,
SURF creates a 64 element feature vector by summing both
the values and magnitudes of Haar wavelet responses in the
horizontal and vertical directions for each square of a four
by four grid centered on the point.
We build our visual word dictionary by extracting these
SURF features from the database of universal icons men-
tioned previously. The 6334 icons result in more than
two hundred thousand interest points which are then clus-
tered into a dictionary of 1000 visual words using Elkan
k-means (Elkan 2003). Once the Euclidean distance, d,
between the source images and the phenotypes bags-of-
visual-words is calculated, the metric, S, is calculated to
provide a value between 0 and 1 as follows:
S = MAX(
d
100
, 1)
where the constant 100 was chosen empirically.
Online Survey
Since our ultimate goal is a system that can create images
that both communicate intention and are aesthetically inter-
esting, we have developed a survey to test our most recent
attempts at conveying concepts while rendering images that
are perceived as creative.
The survey asks users to evaluate images generated for
ten concepts across three rendering techniques. The ten con-
cepts were chosen to cover a variety of abstract and concrete
topics. The abstract concepts are adventure, love, mu-
sic, religion, and war. The concrete concepts are bear,
cheese, computer, re, and garden.
We refer to the three rendering techniques as unrendered,
traditional, and advanced. For unrendered, no rendering
is appliedthese are the plain collages. For the other two
techniques, the images are rendered using one of two tness
functions to govern the genetic algorithm. For traditional,
the tness function is the average of the adjective and inter-
est metrics. For advanced rendering, the new similarity met-
ric is added. Here the adjective metric is weighted by 0.5,
while the interest and similarity metrics are each weighted
by 0.25. For each rendering technique and image, DARCI
returned the 40 highest ranking images discovered over a
period of 90 generations. We then selected from the pools
of 40 for each concept and technique, the image that we felt
best conveyed the intended concept while appearing aesthet-
ically interesting. An example image that we selected from
each rendering technique can be seen in Figure 2.
To query the users about each image, we followed the
survey template that we developed previously to study the
perceived creativity of images rendered with different adjec-
tives (Norton, Heath, and Ventura 2013). In this study, we
presented users with six ve-point Likert items (Likert 1932)
per image; volunteers were asked how strongly they agreed
or disagreed (on a ve point scale) with each statement as it
pertained to one of DARCIs images. The six statements we
used were (abbreviation of item in parentheses):
I like the image. (like)
I think the image is novel. (novel)
I would use the image as a desktop wallpaper. (wallpaper)
Prior to this survey, I have never seen an image like this one. (never seen)
I think the image would be difcult to create. (difcult)
I think the image is creative. (creative)
Proceedings of the Fourth International Conference on Computational Creativity 2013 100
(a) unrendered (b) traditional (c) advanced
Figure 2: Example images
1
for the three rendering tech-
niques representing the concept garden.
(a) unrendered (b) traditional (c) advanced
Figure 3: Example dummy images
2
for the concept water
that appeared in the survey for the indicated rendering tech-
niques.
In previous work, we showed that the rst ve statements
correlated strongly with the sixth, I think the image is cre-
ative (Norton, Heath, and Ventura 2013), justifying this test
as an accurate evaluation of an images subjective creativity.
In this paper, we use the same six Likert items and add a sev-
enth to determine how effective the images are at conveying
their intended concept:
I think the image represents the concept of . (concept)
To avoid fatigue, volunteers were only presented with im-
ages from one of the three rendering techniques mentioned
previously. The technique was chosen randomly and then
the images were presented to the user in a random order.
To help gauge the results, three dummy images were intro-
duced into the survey for each technique. These dummy im-
ages were created for arbitrary concepts and then assigned
different arbitrary concepts for the survey so that the im-
age contents would not match their label. Unltered dummy
collages were added to the unrendered set of images, while
traditionally rendered versions were added to the traditional
and advanced sets of images. The three concepts used to
generate the dummy images were: alien, fruit, and ice.
The three concepts that were used to describe these images
in the survey were respectively: restaurant, water, and
freedom. To avoid confusion, from here on we will always
refer to these dummy images by their description word. The
1
The original icons used for the images in Figure 2 were designed by Adam Zubin, Birdie
Brain, Evan Caughey, Rachel Fisher, Prerak Patel, Randall Barriga, dsathiyaraj, Jeremy Bristol,
Andrew Fortnum, Markus Koltringer, Bryn MacKenzie, Hernan Schlosman, Maurizio Pedrazzoli,
Mike Endale, George Agpoon, and Jacob Eckert of The Noun Project.
2
The original icons used for the images in Figure 3 were designed by Alessandro Suraci, Anna
Weiss, Riziki P.M.G. Nielsen, Stefano Bertoni, Paulo Volkova, James Pellizzi, Christian Michael
Witternigg, Dan Christopher, Jayme Davis, Mathies Janssen, Pavel Nikandrov, and Luis Prado of
The Noun Project.
(a) (b)
(c) (d)
Figure 4: The images
3
that were rated the highest on aver-
age for each statement. Image (a) is the advanced rendering
of adventure and was rated highest for like, novel, dif-
cult, and creative. Image (b) is the traditional rendering of
music and was rated highest for wallpaper. Image (c) is
the advanced rendering of love and was rated highest for
never seen. Image (d) is the advanced rendering of music
and was rated highest for concept.
dummy images for the concept of water are shown in Fig-
ure 3. In total, each volunteer was presented with 13 images.
Results
A total of 119 anonymous individuals participated in the on-
line survey. Volunteers could quit the survey at anytime, thus
not evaluating all 13 images. Each person evaluated an aver-
age of 9 images and each image was evaluated by an average
of 27 people. The highest and lowest rated images for each
question can be seen in Figures 4 and 5 respectively.
The three dummy images for each rendering technique
are used as a baseline for the concept statement. The results
of the dummy images versus the valid images are show in
Figure 6. The average concept rating for the valid images
is signicantly better than the dummy images, which shows
that the intended meaning is successfully conveyed to hu-
man viewers more reliably than an arbitrary image. These
results conrm that the intelligent use of iconic concepts is
benecial for the visual communication of meaning. Fur-
ther, it is suggestive that the ratings for the other statements
are generally lower for the dummy images than for the valid
3
The original icons used for the images in Figure 4 were designed by Oxana Devochkina,
Kenneth Von Alt, Paul te Kortschot, Marvin Kutscha, James Fenton, Camilo Villegas, Gustavo Perez
Rangel, and Anuar Zhumaev of The Noun Project.
Proceedings of the Fourth International Conference on Computational Creativity 2013 101
(a) (b) (c)
(d) (e) (f)
Figure 5: The images
4
that were rated the lowest on average
for each statement. Image (a) is the advanced rendering of
re and was rated lowest for difcult and creative. Images
(b) and (c) are the unrendered and advanced version of re-
ligion and were rated lowest for neverseen and wallpaper
respectively. Images (d), (e), and (f) are the traditional ren-
derings of re, adventure, and bear, respectively, and
were rated lowest for like, novel, and concept respectively.
images. Since the the dummy images were created for a dif-
ferent concept than the one which they purport to convey
in the survey, this may be taken as evidence that success-
ful conceptual or intentional communication is an important
factor for the attribution of creativity.
The results of the three rendering techniques (unrendered,
traditional, and advanced) for all seven statements are shown
in Figure 7. The unrendered images are generally the most
successful at communicating the intended concepts. This is
likely because the objects/icons in the unrendered images
are left undisturbed and are therefore more clear and dis-
cernible, requiring the least perceptual effort by the viewer.
The rendered images (traditional and advanced) often distort
the icons in ways that make them less cohesive and less dis-
cernible and can thus obfuscate the intended meaning. The
trade-off, of course, is that the unrendered images are gener-
ally considered less likable, less novel, and less creative than
the rendered images. The advanced images are generally
considered more novel and creative than the traditional im-
ages, but the traditional images are liked slightly more. The
advanced images also convey the intended meaning more
reliably than the traditional images, which indicates that the
similarity metric is nding a better balance between adding
artistic elements and maintaining icon recognizability.
The difference between the traditional and advanced ren-
dering was minimized by the fact that we selected the image
4
The original icons used for the images in Figure 5 were designed by Melissa Little, Dan
Codyre, Carson Wittenberg, Kenneth Von Alt, Nicole Kathryn Grifng, Jenifer Cabrera, Renee
Ramsey-Passmore, Ben Rex Furneaux, Factorio.us collective, Anuar Zhumaev, Luis Prado, Ahmed
Hamzawy, Michael Rowe, Matthias Schmidt, Jule Steffen, Monika Ciapala, Bru Rakoto, Patrick
Trouv, Adam Heller, Marco Acri, Mehmet Yavuz, Allison Dominguez, Dan Christopher, Nicholas
Burroughs, Rodny Lobos, and Norman Ying of The Noun Project.
Figure 6: The average rating from the online survey for
all seven statements comparing the dummy images with the
valid images. The valid images were more successful at con-
veying the intended concept than the dummy images by a
signicant margin. Results marked with an asterix (*) indi-
cate statistical signicance using the two tailed independent
t-test. The lines at the top of each bar show the 95% con-
dence interval for each value. The sample sizes for dummy
and valid images are 251 and 818 respectively.
(out of DARCIs top 40) from each group that best conveyed
the concept while also being aesthetically interesting. Out of
all the traditional images, 39% had at least one recognizable
icon, while 74% of the advanced images had at least one rec-
ognizable icon. This difference demonstrates that the new
similarity metric helps to preserve the icons and provides
a greater selection of good images from which to choose,
which is consistent with the results of the survey. For com-
parison, Figure 8 shows some example images (both tradi-
tional and advanced) that were not chosen for the survey.
The results comparing the abstract concepts with the con-
crete concepts are shown in Figure 9. For all seven state-
ments, the abstract concepts are, on average, rated higher
than the concrete concepts. One possible reason for this is
that concrete concepts are not easily decomposed into a col-
lection of iconic concepts because, being concrete, they are
more likely to be iconic themselves. For concrete concepts,
the nouns returned by the semantic memory model are usu-
ally other related concrete concepts, and it becomes difcult
to tell which object is the concept in question. For example,
the concept bear returns nouns like cave, tiger, forest,
and wolf, which are all related, but dont provide much in-
dication that the intended concept is bear. A person might
be inclined to generalize to a concept such as wildlife. An-
other possible reason why abstract concepts result in better
survey results than do concrete concepts is because abstract
concepts allow a wider range of interpretation and are gen-
erally more interesting. For example, the concept cheese
would generally be considered straightforward to most peo-
ple, while the concept love could have variable meanings
to different people in different circumstances. Hence, the
5
The original icons used for the images in Figure 8 are the same as those used in Figures 4 and
5 with attribution to the same designers.
Proceedings of the Fourth International Conference on Computational Creativity 2013 102
Figure 7: The average rating from the online survey for all
seven statements comparing the three rendering techniques.
The unrendered technique is most successful at representing
the concept, while the advanced technique is generally con-
sidered more novel and creative. Statistical signicance was
calculated using the two tailed independent t-test. The lines
at the top of each bar show the 95% condence interval for
each value. The sample sizes for the unrendered, traditional,
and advanced techniques are 256, 285, and 277 respectively.
images generated for abstract concepts are generally consid-
ered more likable, more novel, and more creative than the
concrete images.
Conclusions and Future Work
We have presented three additions to the computer system,
DARCI, that enhance the systems ability to communicate
specied concepts through the images it creates. The rst
addition is a model of semantic memory that provides con-
ceptual knowledge necessary for determining how to com-
pose and render an image by allowing the system to make
decisions and reason (in a limited manner) about common
world knowledge. The second addition uses the word associ-
ations from a semantic memory model to retrieve conceptual
icons and composes them into a single image, which is then
rendered in the manner of an associated adjective. The third
addition is a new similarity metric used during the adjective
rendering phase that preserves the discernibility of the icons
while allowing for the introduction of artistic elements.
We used an online survey to evaluate the system and show
that DARCI is signicantly better at expressing the mean-
ing of concepts through the images it creates than an arbi-
trary image. We show that the new similarity metric allows
DARCI to nd a better balance between adding interesting
artistic qualities and keeping the icons/objects recognizable.
We show that using word associations and universal icons in
an intelligent way is benecial for conveying meaning to hu-
man viewers. Finally, we show that there is some degree of
correlation between how well an image communicates the
intended concept and how well liked, how novel, and how
creative the image is considered to be. To further illustrate
DARCIs potential, Figure 10 shows additional images en-
countered during various experiments with DARCI that we
(a) (b) (c)
(d) (e) (f)
Figure 8: Sample images
5
that were not chosen for the on-
line survey. Images (a), (b), and (c) are traditional render-
ings of adventure, love, and war respectively. Images
(d), (e), and (f) are advanced renderings of bear, re, and
music respectively.
thought were particularly interesting.
In future research we plan to do a direct comparison of
the images created by DARCI with images created by hu-
man artists and to further investigate how semantic mem-
ory contributes to the creative process. We plan to improve
the semantic memory model by going beyond word-to-word
associations and building associations between words and
other objects (such as images). This will require expanding
DARCIs image analysis capability to include some level of
image noun annotation. The similarity metric presented in
this paper is a step in that direction. An improved semantic
memory model could also help enable DARCI to discover
its own topics (i.e., nd its own inspiration) and to com-
pose icons together in more meaningful ways, by intentional
choice of absolute and relative icon placement, for example.
References
Bay, H.; Ess, A.; Tuytelaars, T.; and Gool, L. V. 2008.
Speeded-up robust features (SURF). Computer Vision and
Image Understanding 110:346359.
Burgess, C. 1998. From simple associations to the building
blocks of language: Modeling meaning in memory with the
HAL model. Behavior Research Methods, Instruments, &
Computers 30:188198.
Colton, S. 2011. The Painting Fool: Stories from building
an automated painter. In McCormack, J., and dInverno, M.,
eds., Computers and Creativity. Springer-Verlag.
Cskzentmih alyi, M., and Robinson, R. E. 1990. The Art of
Seeing. The J. Paul Getty Trust Ofce of Publications.
Csurka, G.; Dance, C. R.; Fan, L.; Willamowski, J.; and
Bray, C. 2004. Visual categorization with bags of keypoints.
6
The original icons used for the images in Figure 10 were designed by Alfredo Astort, Simon
Child, Samuel Eidam, and Jonathan Keating of The Noun Project.
Proceedings of the Fourth International Conference on Computational Creativity 2013 103
Figure 9: The average rating from the online survey for all
seven statements comparing the abstract concepts with the
concrete concepts. The abstract concepts generally received
higher ratings for all seven statements. Results marked with
an asterix (*) indicate statistical signicance using the two
tailed independent t-test. The lines at the top of each bar
show the 95% condence interval for each value. The sam-
ple sizes for abstract and concrete concepts are 410 and 408
respectively.
(a) bear (b) murder (c) war
Figure 10: Notable images
6
rendered by DARCI during var-
ious experiments and trials.
In Proceedings of the Workshop on Statistical Learning in
Computer Vision, 122.
De Deyne, S., and Storms, G. 2008. Word associations:
Norms for 1,424 Dutch words in a continuous task. Behavior
Research Methods 40(1):198205.
Deerwester, S.; Dumais, S. T.; Furnas, G. W.; Landauer,
T. K.; and Harshman, R. 1990. Indexing by latent semantic
analysis. Journal of the American Society for Information
Science 41(6):391407.
Denoyer, L., and Gallinari, P. 2006. The Wikipedia XML
corpus. In INEX Workshop Pre-Proceedings, 367372.
Elkan, C. 2003. Using the triangle inequality to acceler-
ate k-means. In Proceedings of the Twentieth International
Conference on Machine Learning, 147153.
Erk, K. 2010. What is word meaning, really?: (and how can
distributional models help us describe it?). In Proceedings
of the 2010 Workshop on GEometrical Models of Natural
Language Semantics, 1726. Stroudsburg, PA, USA: Asso-
ciation for Computational Linguistics.
Fellbaum, C., ed. 1998. WordNet: An Electronic Lexical
Database. The MIT Press.
Heath, D.; Norton, D.; and Ventura, D. 2013. Conveying
semantics through visual metaphor. ACM Transactions of
Intelligent Systems and Technology, to appear.
Kiss, G. R.; Armstrong, C.; Milroy, R.; and Piper, J. 1973.
An associative thesaurus of English and its computer analy-
sis. In Aitkin, A. J.; Bailey, R. W.; and Hamilton-Smith, N.,
eds., The Computer and Literary Studies. Edinburgh, UK:
University Press.
Krzeczkowska, A.; El-Hage, J.; Colton, S.; and Clark, S.
2010. Automated collage generation with intent. In Pro-
ceedings of the 1
st
International Conference on Computa-
tional Creativity, 3640.
Likert, R. 1932. A technique for the measurement of atti-
tudes. Archives of Psychology 22(140):155.
Lund, K., and Burgess, C. 1996. Producing high-
dimensional semantic spaces from lexical co-occurrence.
Behavior Research Methods, Instruments, & Computers
28:203208.
McCorduck, P. 1991. AARONs Code: Meta-Art, Articial
Intelligence, and the Work of Harold Cohen. W. H. Freeman
& Co.
Nelson, D. L.; McEvoy, C. L.; and Schreiber,
T. A. 1998. The University of South Florida
word association, rhyme, and word fragment norms.
http://www.usf.edu/FreeAssociation/.
Norton, D.; Heath, D.; and Ventura, D. 2010. Establishing
appreciation in a creative system. In Proceedings of the 1
st
International Conference on Computational Creativity, 26
35.
Norton, D.; Heath, D.; and Ventura, D. 2011. Autonomously
creating quality images. In Proceedings of the 2
nd
Interna-
tional Conference on Computational Creativity, 1015.
Norton, D.; Heath, D.; and Ventura, D. 2013. Finding cre-
ativity in an articial artist. Journal of Creative Behavior, to
appear.
Sivic, J.; Russell, B. C.; Efros, A. A.; Zisserman, A.; and
Freeman, W. T. 2005. Discovering objects and their location
in images. International Journal of Computer Vision 1:370
377.
Sun, R. 2008. The Cambridge Handbook of Computational
Psychology. New York, NY, USA: Cambridge University
Press, 1st edition.
Thomas, S.; Boatman, E.; Polyakov, S.; Mumenthaler,
J.; and Wolff, C. 2013. The noun project. http:
//thenounproject.com.
Turney, P. D., and Pantel, P. 2010. From frequency to mean-
ing: Vector space models of semantics. Journal of Articial
Intelligence Research 37:141188.
Wandmacher, T.; Ovchinnikova, E.; and Alexandrov, T.
2008. Does latent semantic analysis reect human associ-
ations? In Proceedings of the ESSLLI Workshop on Distri-
butional Lexical Semantics, 6370.
Proceedings of the Fourth International Conference on Computational Creativity 2013 104
Abstract
This paper describes a computer model for visual composi-
tions. It formalises a series of concepts that allows a comput-
er agent to progress a visual work. We implemented a proto-
type to test the model; it employs letters from the alphabet to
create its compositions. The knowledge base was built from
examples provided by designers. From these examples the
system obtained the necessary information to produce novel
compositions. We asked a panel of experts to evaluate the
material produced by our system. The results suggest that
we are in the right track although much more work needs
to be done.
Introduction
This text reports a computer model for visual compositions.
The following lines describe the motivation behind it. One
of the most important topics that a student in design needs to
master is that related to visual composition. By composition
we refer to the way in which elements in a graphic work are
organised on the canvas. The design process of a composition
implies the selection, planning and conscious organisation of
visual elements that aim to communicate (Myers 1989; Deep-
ak 2010). Compositions can be very complex with several
elements interacting in diverse ways.
Unfortunately, an important number of design texts in-
clude what we called unclear explanations about composi-
tion and its characteristics; in many cases, they are based on
personal appreciations rather than on more objective criteria.
To illustrate our point, here are descriptions of the concept of
visual balance found in some design texts: Psychologically
we cannot stand a state of imbalance for very long. As time
passes, we become increasingly fearful, uncomfortable, and
disoriented (Myers 1989: 85); The formal quality in sym-
metry imparts an immediate feeling of permanence, strength,
and stability. Such qualities are important in public buildings
to suggest the dignity and power of a government (Lauer
and Pentak 2012: 92); exacting, noncasual and quiet, but can
also be boring (Brainard 1991:96). Similar defnitions can
be found in Germani-Fabris (1973); Faimon and Weigand
(2004); Fullmer (2012); and so on. As one can see there is a
need for clearer explanations that can guide designers, teach-
ers and students on these topics.
We believe that computer models of creativity are very
useful tools that can contribute to formalize this type of
concepts and, hopefully, to make them more accessible and
clearer to students and the general public. Therefore, the pur-
pose of this project is to develop a computer model of visual
composition and implement a prototype. Particularly, we are
interested in representing the genesis of the visual composi-
tion process; c.f. with other computer models that represent
more elaborated pieces of visual works like ERI-Designer
(Prez y Prez et al. 2010), The Painting Fool (Colton 2012),
DARSY (Norton et al. 2011). Related works also include
shape grammars (Stiny 1972) and relational production sys-
tems (Vere 1977, 1978). Other interesting approaches are
those based in evolutionary mechanism (e.g. Goldberg 1991;
Bentley 1999). However, we are interested in understanding
each step in the composition process rather than look for op-
timization processes.
This paper is organised as follows: section 2 describes
some characteristics that we consider essential in visual com-
position; section 3 describes the core aspects of our model;
section 4 describes the core characteristics of our prototype
and how we used it to test our model; section 5 discusses the
results we obtained.
Characteristics of a Composition
Composition is a very complex process that usually involves
several features and multiple relations between them. It is out
of the scope of this project to attempt to represent the whole
elements involved in a composition.
A composition is integrated by design elements and by
design principles. The design elements are dots, lines, col-
ours, textures, shapes and planes that are placed on a canvas.
The design principles are the way these elements relate to
A Computer Model for the Generation of Visual Compositions
Rafael Prez y Prez
1
, Mara Gonzlez de Cosso
1
, Ivn Guerrero
2
Divisin de Ciencias de la Comunicacin y Diseo
1
Universidad Autnoma Metropolitana, Cuajimalpa, Mxico D. F.
Posgrado en Ciencia e Ingeniera de la Computacin
2
Universidad Nacional Autnoma de Mxico
{rperez/mgonzalezc}@correo.cua.uam.mx; [email protected]
Proceedings of the Fourth International Conference on Computational Creativity 2013 105
each other and to the canvas. The principles that we employ
in this project are rhythm, balance and symmetry.
Rhythm is the regular repetition of elements. For regu-
lar repetition we mean that the distance between adjacent
elements is constant. Groups of repeated elements make pat-
terns. The frequency of a pattern describes how many times
the same element is repeated within a given area in a can-
vas. Thus, the frequency depends on the size and distance
between elements. A composition might include two or more
patterns with the same or different frequencies.
Balance is related to the distribution of visual elements on
the canvas. If there is an equal distribution on both sides of
the canvas, there is a formal balance. If the elements are not
placed with equal distribution, there is an informal balance.
Myers describes informal balance as
Off-centre balance. It is best understood as the principle of
the seesaw. Any large, heavy` fgure must be placed closer
to the fulcrum in order to balance a smaller, lighter` fg-
ure located on the opposite side. The fulcrum is the point
of support for this balancing act. It is a physical principle
transposed into a pictorial feld. The fulcrum is never seen,
but its presence must be strongly felt (1989: 90).
Symmetry, (from the Greek ourtrtv symmetren),
with measure, means equal distribution of elements on
both sides of the canvas. The canvas is divided into many
equal areas as needed. The basic divisions separate the can-
vas in four areas using a vertical axis and a horizontal axis.
Diagonal divisions can also be included. Symmetry can be
explained as follows: 'Given plane A, a fgure is symmetrical
in relation to it, when it refects in A, and goes back to its ini-
tial position (Agostini 1987:97). In other words symmetry
of a (planar) picture [is] a motion of the plane that leaves that
picture unchanged (Field 1995:41). In this project we work
with three types of symmetry:
1. Reectional symmetry or mirror symmetry. It refers to the
refection of an element from a central axis or mirror line.
If one half of a fgure is the mirror image of the other, we
say that the fgure has refectional or mirror symmetry,
and the line marking the division is called the line of re-
fection, the mirror line, or the line of symmetry (Kinsey
and Moore 2002:129).
2. Rotational symmetry. The elements rotate around a cen-
tral axis. It can be in any angle or frequency, whilst the
elements share the same centre. For example, in nature, a
sunfower shows each element rotating around a centre.
3. Bilateral symmetry or translational symmetry. Refers to
equivalent elements that are placed in different locations
but with the same direction. The element moves along
a line to a position parallel to the original (Kinsey and
Moore 2002:148).
Description of the Model
For this work we assume that all compositions are generated
on a white canvas with a fxed size. Compositions are com-
prised by the following elements: blank, simple elements and
compound elements, also referred to as groups. Blank is the
space of the canvas that is not occupied by any element. A
simple-element is the basic graphic unit employed to create a
visual composition. A compound-element is a group formed
by simple-elements (as it will be explained later, all adja-
cent elements within a group must have the same distance).
A compound-element might also include other compound-
elements. Once a simple-element is part of a group, it cannot
participate in another group as a simple-element.
All elements have associated a set of attributes:
1. Blank has an area.
2. Simple-elements have a position (determined by the centre
of the element), an orientation, an area and an inclination.
3. Compound-elements have a position, an area, a shape, a
rhythm and a size. The position is calculated as the geo-
metric centre of the element. Compound-elements can
have four possible shapes: horizontal, vertical, diagonal
and any other. The rhythm is defned as the constant rep-
etition of elements. The size is defned by the number of
elements (simple or compound) that comprise the group.
There are three basic primitive-actions that can be performed
on simple and compound elements: insert in the canvas, elim-
inate from the canvas and modify its attributes.
Relations. All elements in a canvas have relations with the
other elements. Our model represents three types of relations:
distance, balance and symmetry.
Distance. We include four possible distances between elements:
Lying-on: one element is on top of other element.
Touch: the edge of one element is touching the edge of
other element.
Close: none of the previous classifcations apply and the
distance between the centre of element 1 and element 2 is
equal or minor to a distance known as Distance of Close-
ness (DC). It represents that an element is close to another
element. The appropriate value of DC depends on cultural
aspects and might change between different societies (see
Hall 1999).
Remote: the distance between the centres of element 1and
element 2 is major to DC.
Balance. We employ two different axes to calculate balance:
horizontal and vertical. They all cross the centre of the can-
vas. The balance between two elements is obtained as fol-
lows. The area of each element is calculated and then multi-
Proceedings of the Fourth International Conference on Computational Creativity 2013 106
plied by its distance to the centre. If the results are alike the
elements are balanced. Unbalanced relations are not explic-
itly represented.
Symmetry. We work with three types of symmetry: refec-
tional (Rf), translational (Tr) and rotational (Rt). We employ
two different axes to calculate it: horizontal (H) and verti-
cal (V). So, two different elements in a canvas might have
one of fve different symmetric relations between them:
horizontal-refectional (H-Rf), vertical-refectional (V-Rf),
horizontal-translational (H-Tt), vertical-translational (V-Tt)
and rotational (Rt). Asymmetrical relations are not explicitly
represented.
Creation of Groups. Inspired by Gestalt studies in percep-
tion (Wertheimer 2012) in this work, groups are created
based on the distance between its elements. The minimum
distance (MD) is the smallest distance between two elements
(e.g. if the distance between element 1 and element 2 is 1 cm,
the distance between element 2 and element 3 is 3 cm, and
the distance between element 1 and element 3 is 4 cm, MD is
equal to 1 cm). Its value ranges from zero (when the centre of
element 1 is lying on top of the centre of element 2) to DC.
0 s MD s DC
That is, inspired by Gestalt studies that indicate that the eye per-
ceives elements that are close as a unit, a group cannot include
elements with a remote distance.
The process of grouping works as follows. All simple-el-
ements that are separated from other simple-elements by the
same distance are grouped together, as long as such a distance
is minor to the remote distance. If as a result of this process
at least one group is created, the same process is performed
again. The process is repeated until it is not possible to cre-
ate more groups. Notice that this way of grouping produces
that all groups have associate a rhythm, i.e. all groups include
the constant repetition of (at least one) elements. We refer to
the groups created during this process as Groups of Layer 1.
Figure 1 layer 0 shows simple elements on a canvas before
the system groups them; Figure 1 layer 1 shows the groups
that emerge after performing this process: group 1 (the blue
one), group 2 (the purple one) and group 3 (the yellow one);
d1 represents the distance between elements in group 1; d2
represents the distance between elements in group 2; d3 rep-
resents the distance between elements in group 3. The fol-
lowing lines describe the algorithm:
First iteration, Layer 1
1. Considering only simple-elements fnd the MD value.
2. If there are not at least two simple-elements whose MD
is equal or minor to DC then fnish.
2. All simple-elements that are separated from other simple-
elements by a distance MD form a new group.
3. Go to step 1.
Now, employing a similar mechanism, we can try to crea-
te new groups using the Groups of Layer 1 as inputs (see
Figure 1 Layer 2). We refer to the groups created during this
second process as Groups of Layer 2. Groups at layer 2 are
comprised by simple-elements and/or compound-elements.
The algorithm works as follows:
If at least one group was created during Layer 1 then per-
form Layer 2.
Second iteration, Layer 2
1. Considering simple and compound elements, that have not
formed a group in this layer yet, fnd the value of the MD.
2. If there are not at least two elements whose MD is equal
or minor to DC then fnish.
2. All elements that are separated from other elements by a
distance MD form a new group.
3. Go to step 1.
Notice how the blue group and the purple group merge;
the reason is that the distance between purple group and the
blue group (d21) is smaller than the distance between the
blue group and the yellow group (d13), or the distance bet-
ween the purple group and the yellow group (d23). Because
there is no other group to merge, the yellow group has to wait
until the next cycle (next layer) to be integrated (see Figure
Figure 1. A composition represented by 3 layers.
Layer 3:
1 group
Layer 2:
2 groups
Layer 1:
3 groups
Layer 0:
simple
elements
R
1
R
2
R
3
R
3
R
1
R
2
R
2
1
d
1
d
2
d
3
R: Rhythm
d: distance
d
2
1
d
2
3
d
1
3
d
2
1
3
R
2
1
3
Proceedings of the Fourth International Conference on Computational Creativity 2013 107
1 layer 3). This process is repeated until no more layers can
be created. All groups created during the frst iteration are
known as Groups at Layer 1; all groups created during the
second iteration are known as Groups at Layer 2; all groups
created during the nth iteration are known as Groups at Layer
n. A composition that generates n layers is referred to as nth
Layers Composition.
Calculating rhythms. The process to calculate rhythms with-
in a composition works as follows. Each group at layer 1 has
its own rhythm (see Figure 1 layer 1). So, the blue group
has a rhythm 1 (R1), the purple group has a rhythm 2 (R2)
and the yellow group has a rhythm 3 (R3). When the system
blends the blue and purple groups, the new group includes
three different rhythms (see Figure 1 Layer 2): R1, R2 and a
new rhythm R21. Rhythm R21 is the result of the distance be-
tween the centre of the blue group and the centre of the purple
group. We can picture groups as accumulating the rhythms
of its members. So, in Figure 1 Layer 2 we can observe four
rhythms: R1, R2, R21 (inside the purple group) and R3 in
the yellow group. A group that includes only one rhythm is
classifed monotonous; a group that includes two or more
rhythms is classifed as varied. So, the purple blue has a var-
ied rhythm while the yellow group has a monotonous rhythm.
Analysis of the composition. Our model represents a composi-
tion in terms of all existing relations between its elements. This
representation is known as Context.
Because each layer within a composition includes differ-
ent elements, and possible different relations between them,
the number of contexts associated to one composition de-
pends on its number of layers. Thus, a 3 layers composition
has associated three contexts: context-layer 1, context-layer
2 and context-layer 3.
Context of the composition = Context-layer 1 + Context-layer 2 + Context-layer 3
Besides relationships, a context-layer also includes informa-
tion about the attributes of each element, and what we refer
to as the attributes of the layer: Density of the layer, Balance
of the layer, Symmetry of the layer and Rhythm of the layer.
The Density of the Layer (DeL) is the relation between the
blanks area and all elements area:
Density of the Layer =
The Balance of the layer and Symmetry of the layer indi-
cate if the layer as a whole is symmetrical and is balanced.
The Rhythm of the layer indicates the type of rhythm that the
layer has as a whole. Like in the case of the groups it can have
the following values: Monotonous or Varied (see Figure 2).
Components of a context-layer
Relation between elements
Attributes of the elements
Attributes of the layer
Figure 2. Components of a context layer.
Composition process
We can describe a composition as a process that consists
on sequentially applying a set of actions, which gener-
ate several partial or incomplete works until the right
composition arises or the process is abandoned (Prez y
Prez et al. 2010)
Thus, if we have a blank canvas and perform an action on it,
we will produce an initial partial composition; if we modify
that partial composition by performing another action, then
we will produce a more elaborated partial composition; we
can keep on repeating this process until, with some luck, we
will end producing a whole composition. Thus, by perform-
ing actions we progress the composition (see Figure 3).
The model allows calculating for each partial composition
all its contextual-layers. This information is crucial for gene-
rating novel compositions.
Producing new works
Our model includes two main processes: the generation of
knowledge structures and the generation of compositions.
Generation of knowledge structures
The model requires a set of examples that are provided by hu-
man experts; we refer to them as the previous designs. So, each
previous design is comprised by one or more partial compo-
sitions; each of these partial compositions is more elaborated
than the previous one. At the end we have the fnal composition.
All Elements area
Blanks area
Blank canvas Empty context
Action 1
Partial Composition 1 Context 1
Action 2
Partial Composition 2 Context 2
And so on...
Figure 3. A composition process.
Proceedings of the Fourth International Conference on Computational Creativity 2013 108
As explained earlier, we can picture a composition pro-
cess as a progression of contexts mediated by actions until
the last context is generated. In the same way, if we have the
sequence of actions that leads towards a composition (and
that is the type of information we can get from the set of
examples), we can analyse and register how the composition
process occurred. The goal is to create knowledge structures
that group together a context and an action to be performed.
In other words, the knowledge base is comprised by contexts
(representing partial compositions) and actions to transform
them in order to progress the composition.
Because the previous designs do not represent explicitly
their associated actions, it is necessary to obtain them. The
following lines explain how this process is done. We compare
two contexts and register the differences between them. Such
differences become the next action to perform. For example,
if Context 1 represents an asymmetrical composition and
Context 2 represents a horizontal symmetrical one, we can
associate the action make the current composition horizon-
tally symmetrical to Context 1 as the next action to continue
the work in progress.
Once this relation has been established, it is recorded in
the knowledge base as a new knowledge structure. We do the
same with all the contexts in all the layers of a given partial
composition. The actions that can be associated to a context
are: make (refectional, rotational or translational) symmetri-
cal the current composition; balance (horizontally or vertica-
lly) the current composition; insert, delete or modify a simple
or compound element; make (refectional, rotational or trans-
lational) asymmetrical the current composition; unbalance
(horizontally or vertically) the current composition; end the
process of composition. The following lines describe the al-
gorithm to process the previous designs.
1. Obtain the number of all the partial compositions of a giv-
en example (NumberPC)
2. Calculate all the contexts for each partial composition
3. For n:= 1 to (NumberPC 1)
3.1 Compare the differences between Context n and
Context n+1
3.2 Find the action that transform Context n into
Context n+1
3.3 Create a new knowledge structure associating
Context n and the new Action
3.4 Record in the knowledge base this new knowledge
structure.
4. The context of the last partial composition gets the action
end of the process of composition.
We repeat the same process for each example in the set of
previous designs. All the knowledge structures obtained in
this way are recorded in the knowledge base. The bigger the
set of previous designs the richer our knowledge base is.
Generation of compositions: The composition process fol-
lows the E-R model described in (Prez y Prez and Sharples
2001). The following lines describe how it works.
The E-R model has two main processes: engagement and
refection. During engagement the system generates material;
during refection such material is evaluated and, if necessary,
modifed. The composition is a constant cycle between en-
gagement and refection. The model requires an initial state,
i.e. an initial partial composition to start; then, the process is
triggered. The following lines describe how we defned en-
gagement and refection:
Engagement:
1. The system calculates all the Contexts that can be obtained
from the current partial composition.
2. All these contexts are employed as cues to probe memory.
3. The system retrieves from memory all the knowledge
structures that are equal or similar to the current contexts.
If none structure is retrieved, an impasse is declared and
the system switches to refection.
4. The system selects one of them at random and performs
its associated action. As a consequence the current partial
composition is updated.
5. And the cycles repeats again (step 1).
Reection:
1. If there is an impasse the system attempts to break it and
then returns to the generation phase.
2. The system checks that the current composition satisfes
the requirements of coherence (e.g. the system verifes that
all the elements are within the area of the canvas; that ele-
ments are not accidentally on top of each other; and so on).
3. The system verifes the novelty of the composition in
progress. A composition is novel if it is not similar to any
of the compositions in the set of previous designs.
The system starts in engagement; after three actions it switch-
es to refection and then goes back to engagement. If during
engagement an impasse is declared, the system switches to
refection to try to break it and then switches back to engage-
ment. The cycle ends when an unbreakable impasse is trig-
gered or when the action end of the process of composition
is performed.
Example of a composition: For space reasons, it is impossible
to describe in detail how the system creates a whole new de-
sign. Instead, in Figure 4 we show some partial compositions
generated by our program and their associated contexts. To
create the partial-composition in Figure 4A, the system starts
Proceedings of the Fourth International Conference on Computational Creativity 2013 109
with a blank canvas and then inserts three elements at random
(the three elements on the top-left). This partial composition
has two layers: the context of each layer is depicted on the
right side of Figure 4A. For the sake of clarity the fgure does
not include the attributes of the elements; then, during engage-
ment, it takes the current contexts as cues to probe memory
and retrieves some actions to progress the work. Between the
retrieved actions one is selected at random. So, it inserts three
new elements that produce a vertical translational symmetry
(see Figure 4B). The context in each layer clearly shows the
relation between all elements in the canvas. In this case, in
Layer 1 we have two Vertical Translational Symmetry (VTS)
and in Layer 2 we have one VTS symmetry.
The system switches to refection and realises that some
elements are on top of others. Employing some heuristics to
analyse the composition, the program decides that is better
to separate them. The system switches back to engagement,
takes the current contexts as cues to probe memory and re-
trieves actions to be performed. In this occasion, the system
inserts in the third quadrant a new group with a horizontal
mirrored symmetry (see Figure 4C). The right side of the f-
gure shows the context at each layer. The process is repeated
again generating the partial composition in Figure 4D and its
corresponding contexts.
Tests and Results
We implemented a prototype to test our model. Because of
the technical complexity of implementing the whole model
we decided to include some constraints. In our prototype all
simple-elements have the same size, colour and shape: in this
work, simple elements are letters of the alphabet. Because of
the technical diffculty of implementing relationships, in this
prototype we only use symmetry and balance.
Like the model, the prototype has two main parts: creation
of knowledge structures and generation of new compositions.
The prototype has an interface that allows the user to create
her own compositions. She can insert, delete or modify let-
ters in the canvas. By clicking one button she can also build
new symmetrical or balanced elements, or generate random
groups. The program automatically indicates all the existing
groups in all layers; it also shows all the relationships that
currently exist between the elements in the canvas. In the
same way, the attributes of all elements are displayed as well
as their rhythms. So, the user only has to create her composi-
tion on the canvas (the program includes a partial-compo-
sition button that allows the user to indicate when a partial
composition is ready). In this way, the system automatically
creates the fle of previous designs. Once the knowledge base
is ready, the user can trigger the E-R cycle to generate novel
compositions.
We provided our prototype with fve previous designs; Fig-
ures 5 and 6 show two works generated by our program.
In order to obtain an external feedback we decided to ask a
panel of experts their opinion about our programs work. The
Q
Q
Q Q
Q Q
Q Q Q Q
Q
Q Q
Q Q
Q Q Q
Q
Q
Q Q
Q Q
Q Q Q
Q Q
Q
Q Q
Q
Q Q
Q
Q Q
Q
Q Q
Q
Q Q
Q
Q Q
Q
Q Q
Q
Q Q
Q
Q Q
Q
Q Q
Q
Q Q
Q
Q Q
Q
Q Q
Q
Q Q
Q
Q Q Q
Q
Q Q
Q
Q Q
Q
Q Q
(B) Context Layer 1 Context Layer 2
(A) Context Layer 1 Context Layer 2
(C) Context Layer 1 Context Layer 2
(D) Context Layer 1 Context Layer 2
VTS: Vertical Translational Symmetry
HMS: Horizontal Mirrored Symmetry
B: Balanced
RS: Radial Symmetry
Figure 4. Partial compositions and their contexts.
VTS
VTS
VTS
HMS,B
HMS,B HMS,B
HMS,B
HMS,B
HTS
HMS,B
B
RS,B
RS,B B
VTS
VTS VTS
VTS
VTS
VTS
Q
Q
Q
Q
Q
Q
Q
Q
QQ
Q
QQ
Q Q
Q
Q
Q
QQ
Q
Q
Q Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
QQQ Q
Q
Q
Q
Q
Q Q Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Figure 5. A composition created
by our agent. It is Composition
2 in the questionnaire.
Figure 6. A second composition
created by our agent. It is Com-
position 3 in the questionnaire.
Proceedings of the Fourth International Conference on Computational Creativity 2013 110
Experts liked composition 1 and 2. This was an interes-
ting result because it suggested that our model was capable
of generating designs with an acceptable quality. It was also
clear that most experts disliked composition 3 (Figure 6);
although it is fair to say that their evaluation was only one
point lower than the highest evaluation.
Compositions 1 and 4 (made by the human designer) had
a better evaluation regarding balance and symmetry than
compositions 2 and 3 (made by our program). We could have
forced our program to generate symmetrical or balanced de-
signs, but that was exactly what we wanted to avoid. Our
system had the capacity of detecting such characteristics and
nevertheless attempted something different. Experts assess-
ment on symmetry was neither clear nor unanimous. We were
surprised to fnd this out, since symmetry does not depend on
subjective judgment. Something similar occurred with bal-
ance and to some extent with rhythm. These results seemed
to suggest that experts had different ways of evaluating these
characteristics. Experts considered that the rhythm in Com-
position 2 was the best.
Overall subjects preferred composition 4; compositions 1
and 2 got similar results, with a slightly preference for com-
position 1; composition 3 got the lowest rank.
Discussion and Conclusions
This project describes a computer model for visual composi-
tion. The model establishes:
A clear criteria to defne simple-elements and groups.
A set of attributes for simple-elements, groups and layers.
Relationships between elements and a mechanism to
identify such relationships.
A method to analyse a visual composition based on layers,
relationships and attributes.
A mechanism based on the E-R model to produce novel
compositions.
As far as we know, there is no other similar model. Al-
though we are aware that many important features of compo-
sitions are not considered yet, we claim that our model allows
a computer agent to produce novel visual designs.
We tested our model implementing a computer agent.
The system was capable of producing compositions. None of
them are alike to any of the previous designs, although some
of its characteristics resemble the set of examples.
A panel of experts evaluated two compositions generated
by our system and two compositions generated by a human
designer. We decided to ask a small group of experts, who we
believe share core concepts about design, to evaluate our pro-
totypes compositions rather than to ask lots of people with dif-
ferent backgrounds. The results suggest two interesting points:
1. In most cases, the opinions of the experts were not unani-
panel consisted of twelve designers: four men and eight wom-
en. All of them had studied a bachelors degree in design and
half of them got a postgraduate degree. We developed a ques-
tionnaire that included four compositions: two were created by
our system (compositions 2 and 3, Figures 5 and 6) and two
were created by a designer (composition 1 and 4, Figures 7 and
8). The human compositions had to follow similar constraints
to those of our programs compositions: they had to be in black
and white, the designer can only employ one letter to develop
her work, and so on. The participants were not told that some
works had been done by a computer program. Subjects were
asked to assess in a range from 1 (lowest) to 5 (highest) four
characteristics for each composition: a) whether they liked the
composition, b) whether they considered that the composition
had symmetry, c) whether the composition had balance and,
d) what kind of rhythm the composition had. They were also
invited to comment freely on each composition regarding bal-
ance and symmetry. In the last part of the questionnaire, par-
ticipants were asked to rank the compositions from the best to
the worst. Figure 9 shows the results of the questionnaire.
Q Q Q Q Q Q Q Q Q Q
Q
Q Q
Q
Q
Q
Q
Q Q
Q
Q
Q
Q
Q Q
Q
Q
Q
Q
Q Q
Q
Q
Q
Q
Q
Figure 7: Human generated com-
position. Corresponds to compo-
sition 1 in the questionnaire.
Figure 8: Human generated com-
position. Corresponds to compo-
sition 4 in the questionnaire.
5
4
3
2
1
0
Like Balance Symmetry Rhythm Preference
Compositions
1 2 3 4
Figure 9: Results of the questionnaire.
Experts opinions on Compositions
Assessment
Proceedings of the Fourth International Conference on Computational Creativity 2013 111
mous. That is, some experts found more interesting some
of the characteristics of the computer-generated composi-
tion than those produced by humans.
2. Experts seem to have different ways of perceiving and
evaluating compositions.
Point 1 suggests that our model is capable of generating
interesting compositions. That is, it seems that we are moving
in the right direction.
Point 2 seems to confrm the necessity of clearer mecha-
nisms to evaluate a composition. Of course, we are not sug-
gesting that personal taste and intuition should be eliminated
from design. We are only recommending the use of clearer
defnitions and mechanisms for evaluations. We are con-
vinced that they will be very useful, especially in teaching
and learning graphic composition.
One of the reviewers of this paper suggested comparing
our work with shape grammars (Vere 1977, 1978). Our pro-
posal is far of being a grammar; it does not include features
like terminal shape elements and non-terminal shape ele-
ments. In the same way, we do not work with shapes but with
relations between the elements that comprise the composi-
tion. Those relations drive the generation of new composi-
tions. We believe that our approach is much more fexible
than the grammars approach. A second reviewer suggested
comparing our work with relational productions (Stiny 1972).
It is true that our work also employs the before and after
situations described by Stiny. However, we are not interested
in modelling inductive (or any other type of) learning; our
purpose is to record the actions that the user performs in or-
der to progress a composition. Later, the system employs this
information to develop its own composition. None of these
two approaches include characteristics like a fexible genera-
tion process intertwined to an evaluation process, analysis by
layers of the relations between the elements that comprise
a composition, and other characteristics that our approach
does. Thus, although some of the features that our model
employs remind us of previous works, we claim that our ap-
proach introduces interesting novel features.
We hope this work encourage other researches to work on
visual composition generation.
References
Agostini, F. 1987. Juegos con la imagen. Madrid: Editorial
Pirmide.
Bentley, P. 1999. An Introduction to Evolutionary Design by
Computers. Morgan Kaufmann Publishers.
Brainard, S. 1991. A Design Manual. New Jersey: Prentice Hall.
Colton, S. 2012. The Painting Fool: Stories from Building an
Automated Painter, In Computers and Creativity, edited by
J. McCormack and M. dInverno, Springer-Verlag.
Deepak J. M. 2010. Principles of design through photogra-
phy. New Delhi: Wisdom Tree / Ahmedabad: National In-
stitute of Design.
Faimon, P. and Weigand, J. 2004. The nature of design. USA:
How Design Books.
Field, M. and Golubitsky, M. 1995. Symmetry in Chaos: a
Search for Pattern in Mathematics, Art and Nature. Ox-
ford: Oxford University Press.
Fullmer, D.L. 2012. Design Basics. USA: Fairchild Books.
Germani-Fabris 1973. Fundamentos del proyecto grfco.
Espaa: Ediciones Don Bosco.
Goldberg, D. E. 1991. Genetic Algorithms as a Computation-
al Theory of Conceptual Design. In Proc. of Applications
of Artifcial Intelligence in Engineering 6, pp. 3-16.
Hall T. E. 1999. La Dimensin Oculta. Mxico D.F.: Siglo
XXI. (Original title: The Hidden Dimension; Translated
by: Flix Blanco).
Kinsey, L.C. and Moore, T.E. 2002. Symmetry, Shape and
Space. An introduction to Mathematics through Geomme-
try. USA: Key College Publishing/Springer.
Lauer, D. A. and Pentak, S. 2012. Design Basics. USA:
Wadsworth, 8th Edition.
Myers, J.F. 1989. The Language of Visual Art. Perception as
a basis for Design. EUA: Holt, Reinehart and Winston, Inc.
Norton, D, Heath, D. and Ventura, D. 2011. Autonomously
Creating Quality Images. In Proceedings of the Second
International Conference on Computational Creativity,
Mexico City, Mexico, pp. 10-15.
Prez y Prez, R., Aguilar, A. and Negrete, S. 2010. The ERI-
Designer: A Computer Model for the Arrangement of Fur-
niture. Minds an Machines, 20 (4): 483-487.
Prez y Prez, R. and Sharples, M. 2001 MEXICA: a com-
puter model of a cognitive account of creative writing.
Journal of Experimental and Theoretical Artifcial Intel-
ligence 13 (2):119-139.
Stiny, G. (1972). Shape grammars and the generative speci-
fcation of painting and sculpture. Information Processing
71.
Vere, S. (1977). Relational production systems, Artifcial In-
telligence, Volume 8, Issue 1.
Vere, S. (1978). Inductive learning of relational productions.
Academic Press Inc, University of Illinois at Chicago circle.
Wertheimer, M. 2012. On perceived motion and fgural or-
ganization. Cambridge, Mass: MIT Press.
Proceedings of the Fourth International Conference on Computational Creativity 2013 112
Learning how to reinterpret creative problems
Kazjon Grace
College of Computing and Informatics
University of North Carolina at Charlotte
Charlotte, NC, USA
[email protected]
John Gero
Krasnow Institute
for Advanced Study
George Mason University
Fairfax, VA, US
[email protected]
Rob Saunders
Faculty of Architecture, Design and Planning
Sydney University
Sydney, NSW, Australia
[email protected]
Abstract
This paper discusses a method, implemented in the do-
main of computational association, by which computa-
tional creative systems could learn from their previous
experiences and apply them to inuence their future be-
haviour, even on creative problems that differ signi-
cantly from those encountered before. The approach is
based on learning ways that problems can be reinter-
preted. These interpretations may then be applicable to
other problems in ways that specic solutions or object
knowledge may not. We demonstrate a simple proof-of-
concept of this approach in the domain of simple visual
association, and discuss how and why this behaviour
could be integrated into other creative systems.
Introduction
Learning to be creative is hard. Experience is known to be
a signicant inuence in creative acts: cognitive studies of
designers show signicant differences in the ways novices
and experts approach creative problems (Kavakli and Gero,
2002). Yet each creative act is potentially so different from
every other act that it is complex to operationalise the expe-
rience gained and apply it to subsequent acts of creating.
Systems that can, through experience, improve their own
capacity to be creative are an interesting goal for computa-
tional creativity research as they are a rich avenue for im-
proving system autonomy. While computational creativity
research has coalesced over the last decade around quanti-
ed ways to evaluate creative output, there have been few
attempts to imbue a system with methods of self-evaluation
and processes by which it could learn to improve. This re-
search presents one possible avenue for pursuing that goal.
A distinction should be drawn between learning about the
various objects and concepts to be used in particular creative
acts, which serves to aid those acts specically, and learning
about how to be a better creator more broadly. Knowledge
about objects inuences future creative acts with those ob-
jects, but the generalisability of that knowledge is suspect.
One example of where this learning challenge is partic-
ularly relevant is analogy-making, in which every mapping
created between two objects is, by the denition of an anal-
ogy as a new relationship, in some way unique. Multiple
analogies using the same object or objects are not guaran-
teed to be similar. This makes it very difcult to generalise
knowledge about making analogies and apply it to any future
analogy-making act.
We propose to tackle this problem of learning to be (com-
putationally) creative by learning ways to interpret prob-
lems, rather than learning solutions to problems or learning
about objects used in problems. These interpretations can be
learnt, evaluated, recalled and reapplied to other problems,
potentially producing useful representations. This process is
based on the idea that perspectives that have been adopted
in the past and have led to some valuable creative output
may be useful to adopt again if a compatible problem arises.
While even quite similar creative problems may require very
different solutions, quite different problems may be able to
be reinterpreted in similar ways. We discuss this approach
specically for association and analogy-making but it may
hypothetically apply to other components of computational
creativity. We develop a proof-of-concept implementation in
the domain of computational association, and outline some
ways in which this learning of interpretations could be more
useful than object- or solution-learning in creative contexts.
Models for how previous experiences can inuence be-
haviour could be a valuable addition to learning in creative
systems. A computational model able to learn ways to ap-
proach creative problems would behave in ways driven by
its previous experiences, permitting kinds of autonomy of
motivation and action currently missing from most models
of computational creativity. For example, it would be pos-
sible to develop a creative system that could autonomously
construct aesthetic preferences based on what it has (or has
not) experienced, or to learn styles by which it can categorise
the work of itself and others, such as described in (Jennings,
2010). A creative system capable using past experiences to
inuence its behaviour is a key step towards computationally
creative systems that are embedded in the kind of rich his-
torical and cultural contexts which are so valuable to human
artists and scientists alike.
Learning interpretations in computational
association
We have previously developed a model of computational as-
sociation based on the reinterpretation of representations so
as to render them able to be mapped. Our model, along
with an implementation of it in the domain of ornamental
Proceedings of the Fourth International Conference on Computational Creativity 2013 113
design, is detailed in (Grace, Gero, and Saunders, 2012).
We distinguish association from analogy by the absence of
the transfer process which follows the construction of a new
mapping: analogy is, in this view, association plus trans-
fer. Interpretation-driven association uses a cyclical inter-
action of re-representation and mapping search processes to
both construct compatible representations of two objects and
produce a new mapping between them. An interpretation is
considered to be a transformation that can be applied to the
representations of the objects being associated. These trans-
formations are constructed, evaluated and applied during the
course of a search for a mapping, transforming the space of
that search and inuencing its trajectory while the search
occurs. This differs from the theory of rerepresentation in
analogy-making presented in Yan, Forbus and Gentner 2003
as in our system representations are iteratively adapted in
parallel with the search for mappings, rather than only after
mapping has failed. This permits interpretation to inuence
the search for mappings, and for mapping to inuence the
construction, evaluation and use of interpretations in turn.
The implementation of this model preented here explores
the process of Interpretation Recollection, through which in-
terpretations that have been instrumental in creating past as-
sociations can be recalled to inuence a current association
problem. This process occurs in conjunction with the con-
struction of interpretations from observations made about
the current problem.
In the model interpretation recollection is a step in the
iterative interpretation process in which the set of past, suc-
cessful interpretations is checked for any interpretations ap-
propriate to the current situation. These past interpretations
will then be considered for application to the object repre-
sentations alongside other interpretations that have previ-
ously been constructed or recalled. A successful interpre-
tation one that has previously led to an association can
thereby by reconstructed and reapplied to a new associa-
tion problem. In this paper we demonstrate that this feature
of the interpretation-driven model leads to previous experi-
ences inuencing acts of association-making, and claim that
this is promising groundwork for future investigations into
learning in creative contexts.
In the implementation described in this paper we use sim-
plied approaches to determining the relevance of previ-
ously successful interpretations and reapplying them to the
current context. The metric for determining appropriateness
is straightforward: any previous interpretation which has a
non-zero effect on a current object representation is deter-
mined to be capable of inuencing the course of the current
association problem and included. This simplies the no-
tion of appropriate for future use and leads to an obvious
scalability issue, but we demonstrate that this very simple
approach inuences behaviour. More sophisticated methods
for determining when and how known interpretations should
be reapplied are an area of future investigation.
Experimenting with learnt interpretations
As a preliminary investigation into the potential of
interpretation-based creative learning, we will demonstrate
that the approach we have developed permits previous ex-
perience to inuence the behaviour of an association sys-
tem. To illustrate this we will prime the system to produce
different results after having experienced different histories.
In our system previously constructed associations can inu-
ence new association problems through interpretation learn-
ing; past associations can act to prime the system to pro-
duce particular results on future associations. By demon-
strating that an association systems experience with one
pair of objects can inuence its behaviour associating differ-
ent objects, we show the advantage of interpretation-based
approach to learning. Comparatively an object-based ap-
proach to learning would not have permitted generalisation
to an unfamiliar pair of objects.
In our experiments the system is exposed to a particular
stimulus (either a simple unambiguous association problem
or nothing in the case of the control trial) and then attempts
to solve an ambiguous association problem that is the same
between all trials. Our association system produces many
different mappings between any two objects, so changes in
the distribution of mappings produced on the second prob-
lem is used as an indicator of priming effects.
Three trials were conducted. In the rst trial no prim-
ing association was performed, in the second trial a priming
association between Objects 1 and 2 of Figure 1 was per-
formed, and in the third trial a priming association Objects
1 and 3 of Figure 1 was performed. In each trial an associa-
tion between Objects 4 and 5, depicted in Figure 2, followed
the priming stage. Each trial was performed 100 times, with
the system being re-initialised (and re-primed) between each
one so that the histories are identical for every association.
A distribution of the results of the association between Ob-
jects 4 and 5 was produced. All trials were conducted using
three relationships: relationships of the relative orientation
of shapes, such as 45
and 135
res-
olution was utilised, an isovist would be a 360-dimensional
vector,
~
I = [r
1
, r
2
, . . . , r
360
] where r
i
(x) = exp
(x c
i
)
2
2
i
(1)
The realisation of a memory element in our approach is
done by saving the input as the centre c
i
, and adjusting the
value of the radius
i
to incorporate values that lie close to
each other. Mathematically, this memory element will have
i
(x) > 0 activation for all values of x that fall in a
i
neighbourhood of the point c
i
dened in equation 2. Fur-
ther, lim
xci
i
(x) = 1. This condition ensures that the
activation unit with the centre c
i
closest to the current input
x activates the most.
B(c
i
;
i
) = {x X | d(x, c
i
) <
i
} (2)
In a collection of multiple RBF units, with each having a
different centre c
i
and radius
i
, multiple values can be re-
membered. If an input x is presented to this collection, the
unit with highest activation will be the one that has the best
matching centre c
i
. Or in other words, for the presented in-
put value, the memory block can be said to recall the nearest
possible value c
i
. For one input pattern, there will be one
corresponding recall value. This setting of multiple RBF
units can thus work as a memory unit. The Memory Blocks
described previously comprise multiple RBF units. As an
example, a memory block comprising n RBF units can be
represented with gure 3.
Figure 3: RBF Memory Block: Each RBF unit stores one
data value in the form of centre c
i
; the range of values for
which the unit has positive activation are dened by the val-
ues of
i
according to equation 2. c
max
is the value that
the memory recalls as the best match to the input, and
max
represents the condence in the match.
So far we have described the use of the RBF unit as a
memory block having a scalar valued centre c
i
. In order to
memorise a multi-dimensional pattern (in this application an
isovist pattern, comprising 360 ray-lenghts), we modify the
traditional RBFs to handle a multi-dimensional input isovist
vector ~x by replacing its scalar valued centre with a 360-
dimensional vector ~ c
i
. While Euclidean distance and dot
product of two multi-dimensional vectors are also scalar and
do not disrupt the working of standard RBFs, their capacity
to capture the difference in shape between two isovist pat-
terns is minimal. Therefore, in order to account for differ-
ence in shape, we replace the Euclidean distance metric by
Procrustes Distance (Kendall 1989). The procrustes distance
is a statistical measure of shape similarity that accounts for
dissimilarity between two shapes while ignoring factors of
scaling and transformation. For two isovist vectors ~x
m
and
~x
n
, the procrustes distance h~x
m
, ~x
n
i
p
rst identies the op-
timum translation, rotation, reection and scaling required
to align the two shapes, and nally provides a minimised
scaled value of the dissimilarity between them. An example
of two similar and non-similar isovists with their procrustes-
aligned isovists is shown in gure 4. Utilising procrustes
distance with the multidimensional centre ~ c
i
, we term this
Multidimensional Procrustes RBF, which is dened as:
i
(~x) = exp
h~x, ~ c
i
i
2
p
2
i
!
(3)
Procrustes distance provides a dissimilarity measure ranging
between 0 and 1. A zero procrustes distance therefore leads
to maximum activation and vice versa. A multidimensional
procrustes RBF has the capacity to store a multi-dimensional
vector in the form of its centre. It is important to note that
for the application described in this paper, the difference be-
tween two multi-dimensional vectors, viz. the isovists, was
recorded using procrustes distance. However, in general the
memory model can be adapted for any suitable distance met-
ric, or used with the simple Euclidean distance. The use of
procrustes distance as a distance metric was adapted specif-
ically for the purpose of the application of identifying sur-
prising locations in an environment.
Figure 4: Two isovist pairs (illustrated in red and blue) and
corresponding aligned isovists (black dashed), one with a
high procrustes distance (left) and other with a low pro-
crustes distance (right).
IMB and AMB IMB and AMB are in principle collec-
tions of one or more multidimensional-procrustes RBF and
multidimensional RBF units respectively, grouped together
as a block (such as the one represented in gure 3). Each
Proceedings of the Fourth International Conference on Computational Creativity 2013 142
block is initialised with a single unit that stores the rst in-
put vector (for IMB) and derived features (for AMB). The
feature vector employed to associate two input patterns (in
this application isovists) comprise (i) area, (ii) circularity,
(iii) eccentricity, together making up a 3-dimensional vec-
tor. Initially, each block is created with a single memory
unit having a default radius 0.1. Thereafter, the memory
block adapts one of the two behaviours. For new patterns
that lie far from the centre, the memory block grows by in-
corporating a new RBF unit having its centre same as the
presented pattern. On the other hand, for patterns that lie
close to existing patterns, the radii of the RBF units are ad-
justed in order to obtain positive activation. Adjustment of
the radii is analogous to adjustments of weights performed
during the training of a Neural Network. The procedure fol-
lowed to expand or adjust the radii can be understood by
following algorithms 1 & 2. Consider a memory block com-
prising k neural units, with their centres ~ c
1
, ~ c
2
, . . . , ~ c
k
and
radii
1
,
2
, . . . ,
k
and the distance metric hi
d
. Let the
model be presented with a new input vector ~x. The algo-
rithm 1 rst computes hi
d
distance (procrustes distance in
the case of an isovist block) between each central vector and
the presented pattern, and compares the distance with pre-
specied best and average match threshold values
best
and
average
. If the distance value is found as d
best
, the
corresponding central vector is returned - as this signies
that a similar pattern already exists in memory. However, in
the case where
avg
d <
best
, the radius of the corre-
sponding best match unit is updated. This updating ensures
that the memory responds with a positive activation when
next presented with a similar pattern.
Algorithm 1 Memory Block Updation
Require: ~x, [c
1
, c
2
, . . . , c
k
],
best
,
avg
,
1: for all center vectors c
i
do
2: d
i
(~x) h~x, ~ c
i
i
d
3: end for
4: bestScore min
i
(d
i
)
5: bestIndex argmin(d
i
)
6: blockUpdated false
7: if (
best
bestScore) then
8: ~r ~c
bestIndex
9: blockUpdated true
10: else if (
avg
bestScore <
best
) then
11: if (
bestIndex
< ) then
12: [~c
bestIndex
,
bestIndex
] computeCenter()
13: blockUpdated true
14: end if
15: end if
16: if (blockUpdated == false) then
17: add new neural unit center with
18: ~c
k+1
= ~x
19:
k+1
= 0.1
20: end if
The network expands on the presentation of patterns that
cannot be incorporated by adjusting the weights/radii of the
RBF units. This feature provides three advantages over the
Algorithm 2 Center vector and radius calculation
Require: ~c
bestIndex
,
best
, ~x
~c
old
~c
bestIndex
~c
bestIndex
(~c
bestIndex
+ ~x) /2
d
new
(h~x, ~ c
bestIndex
i
d
)
2
2log(
best
)
d
old
(h~c
old
, ~ c
bestIndex
i
d
)
2
2log(
best
)
bestIndex
max (d
new
, d
old
)
traditional BAMs. The rst is that there is no a-priori train-
ing required by the memory block. The memory is up-
dated as new patterns are presented, and the training is on-
line. Secondly, adjustment of weights ensures that similar
patterns are remembered through a common central vector,
thereby reducing the number of neural units required to re-
member multiple patterns. Despite the averaging process, a
high level of recall accuracy is guaranteed by maintaining all
radii
i
. The values of
best
,
avg
and are applica-
tion specic parameters that require adjustment. However,
for the purpose of associating and remembering isovists, in
our application we determined these using equations 4, 5, 6.
Here, D
ij
is a nn matrix containing hi
p
distances between
all central vectors; std(D
ij
) stands for standard deviation.
D
ij
= h~ c
i
, ~ c
j
i
p
S
d
=
n
X
i6=j
D
ij
best
=
percentile(S
d
, 95)
S
d
(4)
avg
=
percentile(S
d
, 50)
S
d
(5)
=
min(std(D
ij
))
max(std(D
ij
))
(6)
Association Weights Association weights act as a sepa-
rate layer of the network architecture, and play the role of
mapping the input patterns with their associated features.
For a case of m isovist patterns and n associated feature
vectors stored in IMB and AMB respectively, the associa-
tion weights would comprise a (m(n + 1)) matrix. The
rst column of the matrix contains the indices of each cen-
tral vector ~ c
i
and the remaining columns contain mapping
weights. On initialisation, the mapping weights are set to
zero. Once each memory block is updated, the correspond-
ing best match index obtained as an output of the memory
block is used to congure the values of the matrix. Let q
be the index returned from IMB, and r be the index ob-
tained from AMB. The weight updation simply increments
the value at the q
th
row and r + 1
th
column of the weight
matrix. If such a row or column does not exist (signifying
a new addition to the memory block), a new row/column is
added. During the use of the memory model to recall the
associated vector from the presented input vector, assuming
an index p was returned, the p
th
row is selected, and the in-
dex of the column containing the highest score is obtained.
Proceedings of the Fourth International Conference on Computational Creativity 2013 143
Let this index be k. If the highest score in k
th
column this
implies that for AMB, the centre of the k
th
activation unit is
most strongly associated with the current input. This kind
of mapping look-up can be performed vice versa as well
and provides an efcient bi-directional many-to-many map-
ping functionality, which is hard to implement in traditional
memory models.
Surprise Calculation The Kullback-Leibler (KL) diver-
gence (Kullback 1997) is a measure of difference between
two probabilistic models of current observations. To es-
timate KL divergence, an application specic probabilistic
model of the current data is required, and in most cases the
design of such a model requires specic expertise. In our ap-
proach, each memory model computes the surprise without
having the need to train/estimate or design any probabilistic
model. This is achieved by using activation scores that each
memory unit outputs on presentation of a pattern. These
scores are obtained through RBF activation units. Each
score in principle is therefore a probabilistic estimate of the
similarity between the input vector and the centre of the cor-
responding memory unit. Exploiting this property, we mea-
sure the KL divergence on activation scores. On presentation
of a new input vector ~x to a memory block, the activation
scores are rst computed. Since these scores are calculated
before the block updates (using algorithm 1 & 2), they are
termed a-priors, A = [a
1
, a
2
, . . . , a
n
]. Post the execution
of algorithm 1, the memory block would either remain the
same (in the case of best match), or change one of its radius
values (for average match), or lastly may have an additional
neural unit (no match). Accordingly, the activation scores
obtained after the updating might be different from the a-
priors. Scores obtained after the updating of memory are
termed posteriors, P = [p
1
, p
2
, . . . , p
m
]. If n < m, the a-
priors are extrapolated with the mean value of A to ensure
m = n, and nally the KL-divergence or the surprise en-
countered is computed as:
S =
m
X
i=1
ln
p
i
a
i
p
i
(7)
Here a
i
and p
i
are a-prior and posterior activation scores
respectively. IMB and AMB each provides an estimate of
the surprise encountered by each block. Surprise value from
IMB indicates the surprise in terms of shape of the isovist
(in the current application), and one from AMB indicate the
surprise encountered in terms of associated features. Overall
surprise is an average of the two surprise values. Illustration
of the surprise values returned from AMB along with the
values in the input vector are presented in gure 5a. Calcu-
lation of surprise in the memory model has two advantages,
one that the user does not need to meticulously design of
a probabilistic model and second that the surprise calcula-
tion is independent of the number of dimensions of the input
vector.
Forgetfulness in memory
In order to imitate human memory more closely, one addi-
tional functionality that can be added in the presented mem-
ory model is the property of forgetting. The principal of
out of sight is out of mind can be implemented in the pre-
sented memory model by the use of a bias value for each
memory unit. Diverting from the traditional use of bias val-
ues, in our approach a bias value is used to adjust the acti-
vation score in such a way that the most recently perceived
or activated memory unit attains a tendency to have higher
activation score, and vice versa. This is achieved by decre-
menting the bias values of the units that were not recalled. In
this way, if a pattern is presented once to the memory and is
never recalled, that pattern will have the lowest bias. The ef-
fect of low bias will be low levels of activation, and therefore
a low recall rate. This feature is an important consideration
when evaluating what causes surprise and is therefore pro-
grammed as an optional conguration that can be used in the
current memory model. However, for the current evaluation
of surprising locations, it is assumed that the perceiver will
not forget any location that was presented earlier.
Experiments & Results
Deciphering surprising structures
The isovist patterns extracted from the Google Sketchup
models along with the feature vector (described earlier) were
presented one at a time to the memory model. For the
present application, the values of
best
and
avg
were ap-
propriately selected to ensure that the change in size of the
location, viz. the value of area, does not contribute to the
value of AMB surprise. This was deliberately designed to
serve the purpose of the present application, viz. decipher-
ing surprising locations. The aim in our application was to
consider a location surprising largely based on the surprise
caused by its shape (isovist) and, to a limited extent, by the
associated features. Hence only regions that differ in shape
as well as in the values of derived features tend to be most
surprising. The plot in gure 5(a) illustrates the values of
surprise (ordinate) obtained from IMB and AMB for each
isovist index (abscissa). As evident, the values of IMB sur-
prise are initially very high, since the memory model has
not been exposed to any isovist patterns. As the memory
is presented with more isovist patterns (represented by in-
creasing index of isovists), the surprise initially uctuates,
and then gradually decreases. On the other hand, AMB
surprise always retains low values due to the low value of
match thresholds
best
and
avg
chosen for AMB. How-
ever, despite low match thresholds, the AMB surprise was
highest at two locations where the associated feature values
peaked (illustrated in gure 5(b)). Again, this sudden drift
was surprising and was very well captured by the computed
surprise shown in the same plot. The view of the location
corresponding to locations with highest and lowest surprise
values are presented in gure 5(c). The views are recorded
from the Google Sketchup model.
Forgfullness demonstration
The behaviour of IMB and AMB surprise - while having the
forgetting behaviour enabled - can be very well veried from
gure 6(a) and (b). Figure 6(a) presented the IMB and AMB
surprise values obtained with the same experiment compris-
ing 300 isovists. Unlike gure 5(a), this time the gradual de-
Proceedings of the Fourth International Conference on Computational Creativity 2013 144
50 100 150 200 250 300
0
0.2
0.4
0.6
0.8
1
Isovist index
S
c
a
l
e
d
s
u
r
p
r
i
s
e
v
a
l
u
e
s
Surprise values without Forgetting behaviour
IMB surprise
AMB surprise
(a)
50 100 150 200 250 300
0
0.2
0.4
0.6
0.8
1
Isovist index
S
c
a
l
e
d
v
a
l
u
e
s
o
f
f
e
a
t
u
r
e
s
area
circularity
eccentricity
(b)
(c)
Figure 5: The gure illustrates the results of surprise evalua-
tion of IMBand AMB, without forgetfulness behaviour. 5(a)
presents scaled values of IMB and AMB surprise, and 5(b)
presents scaled values of associated features. 5(c) illustrates
the viewfromidentied high surprise (top row), and lowsur-
prise (bottom row) locations. It was discovered that surprise
values were high at transitions between two locations, and
low surprise was identied at locations with monotonous
passages and rooms.
crease in the values of IMB surprise is not noticed. Regular
peaks demonstrate that despite prior exposure to similar iso-
vists or features, both IMB and AMB evaluate high surprise.
This is because each memory block is implementing the for-
getting behaviour (described earlier). As a result, they forget
what was previously remembered, and hence cause higher
values of surprise. The general trend in the difference of sur-
prise values with forgetting and without forgetting behaviour
is illustrated in gure 6(b). The white region between the
two curves is the difference between overall surprise values.
Remembering all patterns without forgetting causes the sur-
prise values to gradually reduce. In comparison to the val-
ues of surprise with forgetting behaviour, these cause fewer
peaks. Additionally, the thick red and green curves present
smoothened values of overall surprise with forgetting and
without forgetting behaviour respectively. These again pro-
vide the reader with the general trend each one follows.
50 100 150 200 250 300
0
0.2
0.4
0.6
0.8
1
Isovist index
S
c
a
l
e
d
s
u
r
p
r
i
s
e
v
a
l
u
e
s
Surprise values with Forgetting behaviour
IMB surprise
AMB surprise
(a)
50 100 150 200 250 300
0
0.2
0.4
0.6
0.8
1
Isovist Index
S
c
a
l
e
d
S
u
r
p
r
i
s
e
V
a
l
u
e
s
W/o F
W F
(b)
Figure 6: Comparison of the results of surprise evaluation
with forgetfulness either enabled or disabled. 6(a) presents
individual IMB and AMB surprise values, and 6(b) presents
the difference between overall surprise experienced in the
two cases. This is shown by the two shaded regions. Ad-
ditionally, 6(b) also represents smoothened values of over-
all surprise in case of forgetting enabled (WF) and disabled
(W/o F). Surprise values of IMB and AMB were found to at-
tain more frequent peaks in the memory with forgetfulness,
as it tends to forget previously presented patterns.
Conclusions & Discussion
In this paper, we presented a computational model of as-
sociative memory that is capable of remembering multi-
dimensional real-valued patterns, performing bi-directional
Proceedings of the Fourth International Conference on Computational Creativity 2013 145
association, and importantly, mimicking human memory by
providing an account of surprise stimulated. The memory
model is constructed using collections of multi-dimensional
RBF units with procrustes distance as the metric for com-
parison between input and centre. The unique feature of the
presented memory model is that it masks the complex re-
quirement of probabilistic modelling required otherwise in
the current literature for computing surprise. Additionally,
the presented memory model, while providing similar func-
tionality to BAM has capacity to remember real-valued pat-
terns without issues concerning stability. Furthermore, simi-
lar to the working of human memory, the presented memory
model can be congured to forget patterns that are not re-
called over long periods of time, thereby implementing the
rule, out of sight is out of mind.
The use of the memory model is demonstrated by iden-
tifying locations within an architectural building model that
has variations in structure, which stimulates surprise. An
isovist - a way of representing the structural features of a
location - is used to represent the shape of a surrounding
environment. Experimental results reveal and conrm the
expected behaviour of surprise computation in two ways.
First, from the application point of view, the identied high
surprise locations were found to exist near transitions be-
tween two smaller parts of the Villa-Savoye house. This
would be expected when the shape of the region where a
person/agent enters changes its shape drastically. Second,
the expected difference between the surprise values obtained
from two experiments with forgetfulness behaviour enabled
and disabled was veried (gure 6(b)). While the values
of overall surprise continued to spike in the memory with
forgetfulness, a gradual decrease was observed in the mem-
ory without forgetfulness. These two results verify the be-
haviour of surprise computation and the forgetfulness be-
haviour of the proposed memory model, and the technique
employed for surprise computation.
Acknowledgments
The 3D model used for the test was obtained from the
Google Sketchup Warehouse (Villa Savoye by Keka 3D
Warehouse 2007). The authors are grateful to R. Linich for
language review.
References
Albright, T. D. 2012. On the Perception of Probable Things:
Neural Substrates of Associative Memory, Imagery, and Per-
ception. Neuron 74(2):227245.
Baldi, P., and Ittii, L. 2010. Of bits and wows: A Bayesian
theory of surprise with applications to attention. Neural Net-
works 23(5):649666.
Bartlett, M. S. 1952. The statistical signicance of odd bits
of information. Biometrika 39:228237.
Benedikt, M. 1979. To take hold of space: isovists and
isovist elds. Environment and Planning B: Planning and
Design 6(1):4765.
Bhatia, S.; Chalup, S. K.; and Ostwald, M. J. 2012. Ana-
lyzing Architectural Space: Identifying Salient Regions by
Computing 3D Isovists. In Proceedings of 46th Annual Con-
ference of the Architectural Science Association, ASA 2012.
Brown, D. C. 2012. Creativity, Surprise &Design: An Intro-
duction and Investigation. The 2nd International Conference
on Design Creativity (ICDC2012) 1:7586.
Carlson, L. A.; H olscher, C.; Shipley, T. F.; and Conroy-
Dalton, R. 2010. Getting Lost in Buildings. Current Direc-
tions in Psychological Science 19(5):284289.
Cole, D., and Harrison, A. 2005. Using Naturally Salient
Regions for SLAM with 3D Laser Data. In In Proceedings
of International Conference on Robotics and Automation,
Workshop on SLAM.
Good, I. J. 1956. The surprise index for the multivariate
normal distribution. The Annals of Mathematical Statistics
27(4):11301135.
Itti, L., and Baldi, P. 2009. Bayesian surprise attracts human
attention. Vision Research 49(10):12951306.
Kendall, D. G. 1989. A survey of the statistical theory of
shape. Statistical Science 4(2):8789.
Kosko, B. 1988. Bidirectional associative memories. IEEE
Transactions on Systems, Man, and Cybernetics 18(1):49
60.
Kullback, S. 1997. Information theory and statistics. Dover
Publications.
Palm, G. 2013. Neural associative memories and sparse
coding. Neural Networks 37:165171.
Perttula, A.; Carter, S.; and Denoue, L. 2009. Kartta: ex-
tracting landmarks near personalized points-of-interest from
user generated content. Proceedings of the 11th Inter-
national Conference on Human-Computer Interaction with
Mobile Devices and Services 72.
Ranganathan, A., and Dellaert, F. 2009. Bayesian surprise
and landmark detection. In 2009 IEEE International Con-
ference on Robotics and Automation (ICRA), 20172023.
IEEE.
Redheffer, R. M. 1951. A note on the surprise index. The
Annals of Mathematical Statistics 22(1):128130.
Shannon, C. E. 2001. A Mathematical Theory of Commu-
nication. ACM SIGMOBILE Mobile Computing and Com-
munications Review 5(1):355.
Siagian, C., and Itti, L. 2009. Biologically Inspired Mobile
Robot Vision Localization. IEEE Transactions on Robotics
25(4):861873.
Trimble. 2013. Google Sketchup. Retrieved from
http://sketchup.google.com/intl/en/.
Weaver, W. 1966. Probability, rarity, interest, and surprise.
Pediatrics 38(4):667670.
Xia, J. C.; Arrowsmith, C.; Jackson, M.; and Cartwright,
W. 2008. The waynding process relationships between
decision-making and landmark utility. TourismManagement
29(3):445457.
Zhang, L.; Tong, M. H.; and Cottrell, G. W. 2009. SUN-
DAy: Saliency using natural statistics for dynamic analysis
of scenes. In Proceedings of the 31st Annual Cognitive Sci-
ence Conference, 29442949.
Proceedings of the Fourth International Conference on Computational Creativity 2013 146
Computational Models of Surprise in Evaluating Creative Design
Mary Lou Maher
1
, Katherine Brady
2
, Douglas H. Fisher
2
1
Software Information Systems, University of North Carolina, Charlotte, NC
2
Electrical Engineering & Computer Science, Vanderbilt University, Nashville, TN
[email protected], [email protected], [email protected]
Abstract
In this paper we consider how to evaluate whether a de-
sign or other artifact is creative. Creativity and its eval-
uation have been studied as a social process, a creative
arts practice, and as a design process with guidelines for
people to judge creativity. However, there are few ap-
proaches that seek to evaluate creativity computational-
ly. In prior work we presented novelty, value, and sur-
prise as a set of necessary conditions when identifying
creative designs. In this paper we focus on the least
studied of these surprise. Surprise occurs when expec-
tations are violated, suggesting that there is a temporal
component when evaluating how surprising an artifact
is. This paper presents an approach to quantifying sur-
prise by projecting into the future. We illustrate this ap-
proach on a database of automobile designs, and we
point out several directions for future research in as-
sessing surprising and creativity generally.
Evaluating Creativity and Surprise
As we develop partially and fully automated approaches to
computational creativity, the boundary between human
creativity and computer creativity blurs. We are interested
in approaches to evaluating creativity that make no as-
sumptions about whether the creative entity is a person, a
computer, or a collective intelligence of human and com-
putational entities. In short, we want a test for creativity
that is not biased by the form of the entity that is doing the
creating (Maher and Fisher 2012), but the test should be
flexible enough to allow for many forms of creative output.
Ultimately, such tests will imbue artificial agents with an
ability to assess their own designs and will inform compu-
tational models of creative reasoning. Such tests will also
inform the design of cognitive assistants that collaborate
with humans in sophisticated, socially-intelligent systems.
Evaluating creativity by the characteristics of its results
has a long history, including contributions from psycholo-
gy, engineering, education, and design. Most descriptions
of creative designs include novelty (sufficiently different
from all other designs) and value (utilitarian and/or aesthet-
ic) as essential characteristics of a creative artifact
(Csikszenmihalyi & Wolfe, 2000; Amabile, 1996; Runco,
2007; Boden, 2003; Wiggins, 2006; Cropley & Cropley,
2005; Besemer & OQuin, 1987; Horn & Salvendy, 2003;
Goldenberg & Mazursky, 2002; Oman and Tumer, 2009;
Shah, Smith, & Vargas-Hernandez, 2003).
Surprise is an aspect of creative design that is rarely
given attention, even though we believe that it is distinct
from novelty and value: a design can be both novel and
valuable, but not be surprising. It may be tempting to think
that surprise simply stems directly from its novelty or
difference relative to the set of existing and known arti-
facts, but we believe that while surprise is related to novel-
ty, it is distinct from novelty as that term is generally con-
strued. In particular, surprise stems from a violation of
expectations, and thus surprise can be regarded as novel-
ty (or sufficient difference) in a space of projected or ex-
pected designs, rather than in a space of existing designs.
In earlier work, Maher and Fisher (2012) presented nov-
elty, value, and surprise as essential and distinct character-
istics of a creative design. They also forwarded computa-
tional models based on clustering algorithms, which were
nascent steps towards automating the recognition of crea-
tive designs. This paper takes a closer look at surprise, add-
ing an explicit temporal component to the identification of
surprising designs. This temporal component enables a
system to make projections about what designs will be
expected in the future, so that a system can subsequently
assess a new designs differences from expectations, and
therefore judge whether a new design deviates sufficiently
from expectations to be surprising.
AI Approaches for Assessing Surprise
There is little work on assessing surprise in computational
circles; but there has been some, which we survey here.
Horvitz et al (2005) develop a computational model of
surprise for traffic forecasting. In this model, they generate
probabilistic dependencies among variables, for example
linking weather to traffic status. They assume that when an
event has less than 2% probability of occurring, it is
marked as surprising. They temporally organize the data,
grouping incidents into 15-minute intervals. Surprising
events in the past are collected in a case library of surprises
that is used to identify when a surprising event has oc-
curred. Though related, the concept of rarity as an identifi-
er of something surprising is not the same as difference
(novelty) as an interpretation of surprise for example,
perhaps the rare event differs on only one or two dimen-
Proceedings of the Fourth International Conference on Computational Creativity 2013 147
sions from other events, and it is these slight differences
that make the event rare, and thus surprising.
An important characteristic of the Horvitz et al model is
that it makes time explicit, by grouping events into tem-
poral intervals.
A possible limitation of considering rarity as an interpre-
tation of surprise is that as rare events recur, as they are apt
to do, many observers would regard them as less surpris-
ing. So conditioning surprise by prior precedent might be a
very desirable addition to the model. Indeed, Rissland
(2009) advances a case-based approach to reasoning about
rare and transformative legal cases, where the first appear-
ance of a rare case is surprising and transformative, but
subsequent appearances of similar, but still rare events, are
neither transformative, nor surprising.
While Risslands research is not concerned with compu-
tational assessment of surprise per se, it recognizes that
there are certain legal precedents that radically alter the
legal landscape. Rissland calls such precedents black
swans, which are rare, perhaps only differing from past
legal cases in small ways, but they are surprising none-
theless. Importantly, as cases that are similar to the black
swan surface, these grey cygnets (as she calls them) are
covered by the earlier black swan precedent; a grey cygnet
is not transformative and not surprising. The general lesson
for approaches to assessing surprise is that rarity may not
be enough, because over any sufficient time span the recur-
rence of rare events is quite likely! But of course, an ob-
servers memory may be limited to a horizon, so that when
time intervals are bounded by these horizons, rarity may in
fact be a sufficient basis for assessing surprise.
Itti and Baldi (2004) describe a model of surprising fea-
tures in image data using a priori and posterior probabili-
ties. Given a user dependent model M of some data, there
is a P(M) describing the probability distribution. P(M|D) is
the probability distribution conditioned on data. Surprise is
modeled as the distance d between the prior, P(M), and
posterior P(M|D) probabilities. In this model, time is not an
explicit attribute or dimension of the data. There are only
two times: before and now.
Ranasinghe and Shen (2008) develop a model of sur-
prise as integral to developmental robots. In this model,
surprise is used to set goals for learning in an unknown
environment. The world is modeled as a set of rules, where
each rule has the form: Condition ! Action ! Predic-
tions. A condition is modeled as: Feature ! Operator !
Value. For example, a condition can be feature1 > value1
where greater than is the operator. A prediction is modeled
as: Feature ! Operator. For example, a prediction can be
feature1 > where it is expected that feature1 will increase
after the action is performed. Comparisons can detect the
presence or absence of a feature, and the change in the size
of a feature (<, ", =, #, >). If an observed feature does not
match its predicted value, then the system recognizes sur-
prise. This model does not make any explicit reference to
time and uses surprise as a flag to update the rule base.
Maher and Fisher (2012) have used clustering algo-
rithms to compare a new design to existing designs, to
identify when a design is novel, valuable, and surprising.
The clustering model uses distance (e.g., Euclidean dis-
tance) to assess novelty and value of product designs (e.g.,
laptops) that are represented by vectors of attributes (e.g.,
display area, amount of memory, cpu speed). In this ap-
proach, a design is considered surprising when it is so dif-
ferent from existing designs that it forms its own new clus-
ter. This typically happens when the new design makes
explicit an attribute that was not previously explicit, be-
cause all previous designs had the same value for that at-
tribute. Maher and Fisher use the example of the Bloom
laptop, which has a detachable keyboard (i.e., detachable
keyboard = TRUE), where all previous laptop designs had
value FALSE along what was a previously implicit, unrec-
ognized attribute. Thus, like one of Risslands black swans,
the Bloom transformed the design space.
In Maher and Fisher, the established clusters of design
are effectively representing the expectation that the next
new design will be associated with one of the clusters of
existing designs, and when a new design forms its own
cluster it is surprising and changes our expectations for the
next generation of new designs.
Maher and Fisher (2012) focused on evaluation of crea-
tivity on the part of an observer, not an active designer.
Brown (2012) investigates many aspects of surprise in cre-
ative design, such as who gets surprised: the designer or
the person experiencing or evaluating the design. Brown
(2012) also presents a framework for understanding sur-
prise in creative design by characterizing different types of
expectations, active, active knowledge, and not active
knowledge, as alternative situations in which expectations
can be violated in exploratory and transformative design.
To varying extents, many of the computational ap-
proaches above model surprise as a deviation from expec-
tation, where the expectation is an expected value that is
estimated from data distributions or a prediction made by
simulating a rule-based model. In these, however, there is
no explicit representation of time as a continuum, nor ex-
plicit concern with projecting into the future.
Recognizing Surprising Designs
Our approach to projecting designs into the future assumes
that each product design is represented by a vector of ordi-
nal attributes (aka variables). For each attribute, a mathe-
matical function of time can be fit to the attribute values of
existing (past) designs, showing how the attributes values
have varied with time in the past. This best fitting function,
obtained through a process of regression, can be used to
predict how the attributes values will change in the future
as well. Our approach to projecting into the future is in-
spired by earlier work by Frey and Fisher (1999) that was
concerned with projecting machine learning performance
curves into the future (thereby allowing cost benefit anal-
yses of collecting more data for purposes of improving
prediction accuracy), and it was not concerned with crea-
tivity and surprise assessment per se. While Frey and Fish-
Proceedings of the Fourth International Conference on Computational Creativity 2013 148
er used a variety of functional forms, most notably power
functions, as well as linear, logarithmic, and exponential,
we have thus far only used linear functions (i.e., univariate
linear regression) for projecting designs into the future for
purposes of surprise assessment.
In this paper we focus on regression models for recog-
nizing a surprising design: a regression analysis of the at-
tributes of existing designs against a temporal dimension is
used to predict the next value of the attributes. The dis-
tance from the observed value to the predicted value identi-
fies a surprising attribute-value pair.
We illustrate our use of regression models for identify-
ing surprising designs in an automobile design dataset,
which is composed of 572 cars that were produced be-
tween 1878 and 2009 (Dowlen, 2012). Each car is de-
scribed by manufacturer, model, type, year, and nine nu-
merically-valued attributes related to the mechanical de-
sign of the car. In this dataset only 190 entries contain val-
ues for all nine attributes. These complete entries all occur
after 1934 and are concentrated between 1966 and 1994. A
summary of the number of designs and the number of at-
tributes in our dataset is shown in Table 1.
Table 1: List of the mechanical design attributes and the
number of automobile design records with an entry for
each of the nine attributes in our dataset.
A variety of linear regression models are considered.
The first model uses linear regression over the entire time
period of the design data and fits a line to each attribute as
a function of time. The results for one attribute, maximum
speed, are shown in Figure 1. This analysis identifies the
outliers, and therefore potentially surprising designs. For
example, the Ferrari 250LM had a surprising maximum
speed in 1964, and the Bugati Type 41 Royale has a sur-
prising engine size (another attribute, and another regres-
sion analysis) in 1995.
This first model works well for identifying outliers
across a time period but does not identify trendsetters (or
black swans as Rissland might call them) since data
points that occurred later in the timeline were included in
the regression analysis when evaluating the surprise of a
design. A trendsetter is a surprising design that changes the
expectations for designs in the future, and is not simply an
outlier for all time. In other words, using the entire time
line to identify surprising automobile designs does not help
us identify those designs that influenced future designers.
A design that is an outlier in its own time, but inspires fu-
ture generations of designers to do something similar can
only be found if we dont use designs which came out after
the model being measured in the training data.
Figure 1. Regression analysis for maximum speed over the
entire time period of car design data.
Thus, we considered a second strategy that performs a
linear regression only on previously created designs and
measures surprise of a new design as the distance from that
designs attribute value to the projection of the line at the
year of the design in question. This second regression
strategy, where the time period used to fit the line for a
single attribute was limited to the time before each design
was released (see Figure 2), found roughly the same sur-
prising designs as the first model (over the entire time pe-
riod) for most attributes, but there were two exceptions:
torque displacement and maximum speed. In these excep-
tions, outliers earlier in time were sufficiently extreme so
as to significantly move the entire regression line from
before the early outliers to after, whereas in other cases the
rough form of the regression lines created over time did not
change much.
Figure 2: Using strategy 2, linear models are constructed
using all previous-year designs. The circles show the pre-
dicted (or projected) values for EACH year from the indi-
vidual regression lines; the dots show actual values. We
show three sample regression lines, each ending at the year
(circle) it is intended to predict, but there is actually one
regression line for each year.
Attribute Number of Designs
Engine Displacement 438
Bore Diameter 407
Stroke Length 407
Torque Force 236
Torque Displacement 235
Weight 356
Frontal Area 337
Maximum Speed 345
Acceleration 290
Proceedings of the Fourth International Conference on Computational Creativity 2013 149
When training this second model, designs from every
previous year were weighted equally for predicting future
designs. Thus, outliers in the beginning of the dataset per-
petually shifted the model and skewed the surprise meas-
urements for all subsequent designs. And why shouldnt
they these early designs correspond roughly to what
Rissland called black swans, which understandably dimin-
ish the surprise value of subsequent grey cygnets. How-
ever, it is also the case that when using model 2, taking
into account all past history, that a large mass of bland
designs earlier can exaggerate the perceived surprise of a
design, even when that design is in the midst of a spurt of
like designs.
These observations inspired a third linear regression
strategy that makes predictions (or sets expectations) by
only including designs within a specified time range before
the designs being measured. We use a sliding window,
rather than disjoint bins. In either case though, limited time
intervals can mimic perceptions of surprise when the ob-
server has a limited memory, only remembering up to a
myopic horizon into the past.
The window (aka interval) size used for the cars dataset
was ten years. This number was chosen because histo-
grams of the data revealed that all ten-year periods after
1934 contained at least one design with all nine attributes
while smaller periods were very sparsely populated in the
1950s. Larger window sizes converged to the second re-
gression model as window size increased.
In general, the size of windows has a large influence on
the results. Though we wont delve into the results of this
final strategy here, its sensitivity has appeal. In fact, rela-
tive to our longer-term goal of modeling human surprise,
this sensitivity to window size may map nicely on to dif-
ferent perceptions by people with different experiences. An
older adult may have a very different surprise reaction than
a young person, depending on past experience. In general,
the selection of an appropriate range of years for the third
regression model can be correlated with typical periods of
time over which a person can remember. That is, if we
want to compare our computational model of surprise with
human expectations, we should use time intervals that are
meaningful to people rather than based on the distribution
of data. People will be surprised when expectations based
on a time period relevant to their personal knowledge and
experience of a series of designs is not met, rather than on
the entire time period for all designs.
Directions for Further Research
This paper presents an approach to evaluating whether a
design is surprising, and therefore creative, by including a
temporal analysis of the conceptual space of existing de-
signs and using regression analysis projected into the fu-
ture to identify surprising designs. There are a number of
directions we plan to follow.
1. We want to further develop the regression models,
and in particular move beyond linear regression, to include
other functional forms such as polynomial, power, and
logarithmic. After all, a design might be regarded as sur-
prising if we used linear regression to project into the fu-
ture, but not at all surprising if we used a higher-order pol-
ynomial regression into the future! Identifying means of
distinguishing when one functional form over another is
most appropriate for regression will be a key challenge.
2. We want to move beyond our current univariate as-
sessments of surprise through univariate regression, to ho-
listic, multivariate model assessments of surprise through
multivariate regression. We can apply multivariate regres-
sion methods to designs as a function of time, or combine
our earlier work on clustering approaches (Maher and
Fisher, 2012) with our regression approaches, perhaps by
performing multivariate regression over multivariate sum-
maries of design clusters (e.g., centroids).
3. We have thus far been investigating novelty and value
(Maher and Fisher, 2012) and surprise as decoupled char-
acteristics of creativity, but an important next step is to
consider how measures of these three characteristics can be
integrated into a single holistic measure of creativity, prob-
ably parameterized to account for individual differences
among observers.
4. Assessments of creativity are conditioned on individ-
ual experiences; such individual differences in measures of
surprise, novelty, and value are critical surprise to one
person is hardly so to another. We made a barest beginning
of this study in Maher and Fisher (2012), where we viewed
clustering as the means by which an agent organized its
knowledge base, and against which creativity would be
judged. The methods for regression that we have presented
in this paper will allow us to build in an imagining ca-
pacity to an agent, adding expectations for designs that do
not yet exist to the knowledge base of agents responsible
for assessing creativity.
5. In all the variants that we plan to explore, we want to
match the results of our models in identifying surprising
designs to human judgments of surprise, and of course to
assessments of creativity (novelty, value, surprise) of the
designs, generally.
6. Finally, our work to date assumes that designs are
represented as attribute-value vectors; these propositional
representations are clustered in Maher and Fisher (2012),
or time-based regression is used in this paper. We want to
move to relational models, however, perhaps first-order
representations and richer representations still. Relational
representations would likely be required in Risslands legal
domain, if in fact that domain were formalized.
A domain that we find very attractive for exploring rela-
tional representations is the domain of computer programs,
which follow a formal representation and for which a
number of well established tools exist for evaluating novel-
ty, value, and surprise. For example, consider that tools for
identifying plagiarism in computer programs measure
deep similarity between programs, and can be adapted as
novelty detectors), and for assessing surprise as well.
An ability to measure creativity of generic computer
programs will allow us to move into virtually any (com-
putable) domain that we want. For example, consider
mathematical reasoning in students. In an elementary
Proceedings of the Fourth International Conference on Computational Creativity 2013 150
course, we can imagine seeing a large number of programs
that are designed to compute the variance of data values, as
composed of two sequential loops the first to compute
the mean of the data, and the subsequent loop to compute
the variance given the mean. These programs will be very
similar at a deep level. Imagine then seeing a program that
computes the variance (and mean) with ONE loop, relying
on a mathematical simplification. These are the kinds of
assessments of creativity that we can expect in more so-
phisticated relational domains, all enabled by capabilities
to assess computer programs.
Acknowledgements: We thank our anonymous reviewers
for helpful comments, which guided our revision.
References
Amabile, T. 1982. Social psychology of creativity: A con-
sensual assessment technique. Journal of Personality and
Social Psychology 43:9971013.
Amabile, T. 1996. Creativity in Context: Update to The
Social Psychology of Creativity. Boulder, CO: Westview
Press.
Besemer, S., and OQuin, K. 1987. Creative product analy-
sis: Testing a model by developing a judging instrument.
Frontiers of creativity research: Beyond the basics 367
389.
Besemer, S. P., and OQuin, K. 1999. Confirming the
three-factor creative product analysis matrix model in an
American sample. Creativity Research Journal 12:287
296.
Boden, M. 2003. The Creative Mind: Myths and Mecha-
nisms, 2nd edition. Routledge.
Brown, D. C. 2012. Creativity, surprise and design: An
introduction and investigation. In The 2nd International
Conference on Design Creativity (ICDC2012), 7584.
Cropley, D. H., and Cropley, A. J. 2005. Engineering crea-
tivity: A systems concept of functional creativity. In Crea-
tivity Across Domains: Faces of the muse, 169185. Hills-
dale, NJ: Lawrence Erlbaum.
Cropley, D. H.; Kaufman, J. C.; and Cropley, A. J. 1991.
The assessment of creative products in programs for gifted
and talented students. Gifted Child Quarterly 35:128134.
Cropley, D. H.; Kaufman, J. C.; and Cropley, A. J. 2011.
Measuring creativity for innovation management. Journal
Of Technology Management & Innovation
Csikszentmihalyi, M., and Wolfe, R. 2000. New concep-
tions and research approaches to creativity: Implications of
a systems perspective for creativity in education. Interna-
tional handbook of giftedness and talent 2:8191.
Dowlen, C. 2012. Creativity in Car Design The Behavior
At The Edges. A. Duffy, Y. Nagai, T. Taura (eds) Proceed-
ings of the 2nd International Conference on Design Crea-
tivity (ICDC2012), 253-262.
Forster, E., and Dunbar, K. 2009. Creativity evaluation
through latent semantic analysis. In Proceedings of the
Annual Conference of the Cognitive Science Society, 602
607.
Frey, L., and Fisher, D. 1999. Modeling decision tree per-
formance with the power law. In Proceedings of the Sev-
enth International Workshop on Artificial Intelligence and
Statistics, 5965. Ft. Lauderdale, FL: Morgan Kaufmann.
Goldenberg, J., and Mazursky, D. 2002. Creativity In
Product Innovation. Cambridge University Press.
Horn, D., and Salvendy, G. 2003. Consumer-based as-
sessment of product creativity: A review and reappraisal.
Human Factors and Ergonomics in Manufacturing & Ser-
vice Industries 16:155175.
Horvitz, E.; Apacible, J.; Sarin, R.; and Liao, L. 2005. Pre-
diction, expectation, and surprise: Methods, designs, and
study of a deployed traffic forecasting service. In Proceed-
ings of the 2005 Conference on Uncertainty and Artificial
Intelligence. AUAI Press.
Itti L. and Baldi P. (2004). A Surprising Theory of
Attention, IEEE Workshop on Applied Imagery and
Pattern Recognition.
Maher, M. L., and Fisher, D. 2012. Using AI to evaluate
creative designs. In A. Duffy, Y. Nagai, T. Taura (eds)
Proceedings of the 2nd International Conference on Design
Creativity (ICDC2012), 45-54.
Oman, S., and Tumer, I. 2009. The potential of creativity
metrics for mechanical engineering concept design. In
Bergendahl, M. N.; Grimheden, M.; Leifer, L.; P., S.; and
U., L., eds., Proceedings of the 17th International Confer-
ence on Engineering Design (ICED09), Vol. 2, 145156.
Ranasinghe, N., and Shen, W.-M. 2004. A surprising theo-
ry of attention. In IEEE Workshop on Applied Imagery and
Pattern Recognition.
Ranasinghe, N., and Shen, W.-M. 2008. Surprise-based
learning for developmental robotics. In Proceedings of the
2008 ECSIS Symposium on Learning and Adaptive Be-
haviors for Robotic Systems.
Rissland, E. (2009). Black Swans, Gray Cygnets and Other
Rare Birds. In L. McGinty and D.C. Wilson (Eds.): ICCBR
2009, LNAI 5650, pp. 613, 2009. Springer-Verlag Berlin
Heidelberg. Retrieved from
http://link.springer.com/chapter/10.1007%2F978-3-642-
02998-1_2?LI=true#page-1
Runco, M. A. 2007. Creativity: Theories and Themes: Re-
search, Development and Practice. Amsterdam: Elsevier.
Shah, J.; Smith, S.; and Vargas-Hernandez, N. 2003. Met-
rics for measuring ideation effectiveness. Design Studies
24:111134.
Wiggins, G. 2006. A preliminary framework for descrip-
tion, analysis and comparison of creative systems.
Knowledge-Based Systems 16:449458.
Proceedings of the Fourth International Conference on Computational Creativity 2013 151
Less Rhyme, More Reason:
Knowledge-based Poetry Generation with Feeling, Insight and Wit
Tony Veale
Web Science & Technology Division, KAIST / School of Computer Science and Informatics, UCD
Korean Advanced Institute of Science & Technology, South Korea / University College Dublin, Ireland.
[email protected]
Abstract
Linguistic creativity is a marriage of form and content
in which each works together to convey our meanings
with concision, resonance and wit. Though form clearly
influences and shapes our content, the most deft formal
trickery cannot compensate for a lack of real insight.
Before computers can be truly creative with language,
we must first imbue them with the ability to formulate
meanings that are worthy of creative expression. This is
especially true of computer-generated poetry. If readers
are to recognize a poetic turn-of-phrase as more than a
superficial manipulation of words, they must perceive
and connect with the meanings and the intent behind
the words. So it is not enough for a computer to merely
generate poem-shaped texts; poems must be driven by
conceits that build an affective worldview. This paper
describes a conceit-driven approach to computational
poetry, in which metaphors and blends are generated
for a given topic and affective slant. Subtle inferences
drawn from these metaphors and blends can then drive
the process of poetry generation. In the same vein, we
consider the problem of generating witty insights from
the banal truisms of common-sense knowledge bases.
Ode to a Keatsian Turn
Poetic licence is much more than a licence to frill. Indeed,
it is not so much a licence as a contract, one that allows a
speaker to subvert the norms of both language and nature
in exchange for communicating real insights about some
relevant state of affairs. Of course, poetry has norms and
conventions of its own, and these lend poems a range of
recognizably poetic formal characteristics. When used
effectively, formal devices such as alliteration, rhyme and
cadence can mold our meanings into resonant and incisive
forms. However, even the most poetic devices are just
empty frills when used only to disguise the absence of real
insight. Computer models of poem generation must model
more than the frills of poetry, and must instead make these
formal devices serve the larger goal of meaning creation.
Nonetheless, is often said that we eat with our eyes, so
that the stylish presentation of food can subtly influence
our sense of taste. So it is with poetry: a pleasing form can
do more than enhance our recall and comprehension of a
meaning it can also suggest a lasting and profound truth.
Experiments by McGlone & Tofighbakhsh (1999, 2000)
lend empirical support to this so-called Keats heuristic, the
intuitive belief named for Keats memorable line Beauty
is truth, truth beauty that a meaning which is rendered in
an aesthetically-pleasing form is much more likely to be
perceived as truthful than if it is rendered in a less poetic
form. McGlone & Tofighbakhsh demonstrated this effect
by searching a book of proverbs for uncommon aphorisms
with internal rhyme such as woes unite foes and by
using synonym substitution to generate non-rhyming (and
thus less poetic) variants such as troubles unite enemies.
While no significant differences were observed in subjects
ease of comprehension for rhyming/non-rhyming forms,
subjects did show a marked tendency to view the rhyming
variants as more truthful expressions of the human
condition than the corresponding non-rhyming forms.
So a well-polished poetic form can lend even a modestly
interesting observation the lustre of a profound insight. An
automated approach to poetry generation can exploit this
symbiosis of form and content in a number of useful ways.
It might harvest interesting perspectives on a given topic
from a text corpus, or it might search its stores of common-
sense knowledge for modest insights to render in immodest
poetic forms. We describe here a system that combines
both of these approaches for meaningful poetry generation.
As shown in the sections to follow, this system named
Stereotrope uses corpus analysis to generate affective
metaphors for a topic on which it is asked to wax poetic.
Stereotrope can be asked to view a topic from a particular
affective stance (e.g., view love negatively) or to elaborate
on a familiar metaphor (e.g. love is a prison). In doing so,
Stereotrope takes account of the feelings that different
metaphors are likely to engender in an audience. These
metaphors are further integrated to yield tight conceptual
blends, which may in turn highlight emergent nuances of a
viewpoint that are worthy of poetic expression (see Lakoff
and Turner, 1989). Stereotrope uses a knowledge-base of
conceptual norms to anchor its understanding of these
metaphors and blends. While these norms are the stuff of
banal clichs and stereotypes, such as that dogs chase cats
and cops eat donuts. we also show how Stereotrope finds
and exploits corpus evidence to recast these banalities as
witty, incisive and poetic insights.
Proceedings of the Fourth International Conference on Computational Creativity 2013 152
Mutual Knowledge: Norms and Stereotypes
Samuel Johnson opined that Knowledge is of two kinds.
We know a subject ourselves, or we know where we can
find information upon it. Traditional approaches to the
modelling of metaphor and other figurative devices have
typically sought to imbue computers with the former (Fass,
1997). More recently, however, the latter kind has gained
traction, with the use of the Web and text corpora to source
large amounts of shallow knowledge as it is needed (e.g.,
Veale & Hao 2007a,b; Shutova 2010; Veale & Li, 2011).
But the kind of knowledge demanded by knowledge-
hungry phenomena such as metaphor and blending is very
different to the specialist book knowledge so beloved of
Johnson. These demand knowledge of the quotidian world
that we all tacitly share but rarely articulate in words, not
even in the thoughtful definitions of Johnsons dictionary.
Similes open a rare window onto our shared expectations
of the world. Thus, the as-as-similes as hot as an oven,
as dry as sand and as tough as leather illuminate the
expected properties of these objects, while the like-similes
crying like a baby, singing like an angel and swearing
like a sailor reflect intuitons of how these familiar entities
are tacitly expected to behave. Veale & Hao (2007a,b) thus
harvest large numbers of as-as-similes from the Web to
build a rich stereotypical model of familiar ideas and their
salient properties, while zbal & Stock (2012) apply a
similar approach on a smaller scale using Googles query
completion service. Fishelov (1992) argues convincingly
that poetic and non-poetic similes are crafted from the
same words and ideas. Poetic conceits use familiar ideas in
non-obvious combinations, often with the aim of creating
semantic tension. The simile-based model used here thus
harvests almost 10,000 familiar stereotypes (drawing on a
range of ~8,000 features) from both as-as and like-similes.
Poems construct affective conceits, but as shown in Veale
(2012b), the features of a stereotype can be affectively
partitioned as needed into distinct pleasant and unpleasant
perspectives. We are thus confident that a stereotype-based
model of common-sense knowledge is equal to the task of
generating and elaborating affective conceits for a poem.
A stereotype-based model of common-sense knowledge
requires both features and relations, with the latter showing
how stereotypes relate to each other. It is not enough then
to know that cops are tough and gritty, or that donuts are
sweet and soft; our stereotypes of each should include the
clich that cops eat donuts, just as dogs chew bones and
cats cough up furballs. Following Veale & Li (2011), we
acquire inter-stereotype relationships from the Web, not by
mining similes but by mining questions. As in zbal &
Stock (2012), we target query completions from a popular
search service (Google), which offers a smaller, public
proxy for a larger, zealously-guarded search query log. We
harvest questions of the form Why do Xs <relation> Ys,
and assume that since each relationship is presupposed by
the question (so why do bikers wear leathers presupposes
that everyone knows that bikers wear leathers), the triple
of subject/relation/object captures a widely-held norm. In
this way we harvest over 40,000 such norms from the Web.
Generating Metaphors, N-Gram Style!
The Google n-grams (Brants & Franz, 2006) is a rich
source of popular metaphors of the form Target is Source,
such as politicians are crooks, Apple is a cult, racism
is a disease and Steve Jobs is a god. Let src(T) denote
the set of stereotypes that are commonly used to describe a
topic T, where commonality is defined as the presence of
the corresponding metaphor in the Google n-grams. To
find metaphors for proper-named entities, we also analyse
n-grams of the form stereotype First [Middle] Last, such as
tyrant Adolf Hitler and boss Bill Gates. Thus, e.g.:
src(racism) = {problem, disease, joke, sin, poison,
crime, ideology, weapon}
src(Hitler) = {monster, criminal, tyrant, idiot,
madman, vegetarian, racist, }
Let typical(T) denote the set of properties and behaviors
harvested for T from Web similes (see previous section),
and let srcTypical(T) denote the aggregate set of properties
and behaviors ascribable to T via the metaphors in src(T):
(1) srcTypical (T) =
M!src(T)
typical(M)
We can generate conceits for a topic T by considering not
just obvious metaphors for T, but metaphors of metaphors:
(2) conceits(T) = src(T) "
M!src(T)
src(M)
The features evoked by the conceit T as M are given by:
(3) salient (T,M) = [srcTypical(T) " typical(T)]
#
[srcTypical(M) " typical(M)]
The degree to which a conceit M is apt for T is given by:
(4) aptness(T, M) = |salient(T, M) # typical(M)|
|typical(M)|
We should focus on apt conceits M ! conceits(T) where:
(5) apt(T, M) = |salient(T,S) # typical(M)| > 0
and rank the set of apt conceits by aptness, as given in (4).
The set salient (T,M) identifies the properties / behaviours
that are evoked and projected onto T when T is viewed
through the metaphoric lens of M. For affective conceits,
this set can be partitioned on demand to highlight only the
unpleasant aspects of the conceit (you are such a baby!)
or only the pleasant aspects (you are my baby!). Veale &
Li (2011) futher show how n-gram evidence can be used to
selectively project the salient norms of M onto T.
"
"
Proceedings of the Fourth International Conference on Computational Creativity 2013 153
Once More With Feeling
Veale (2012b) shows that it is a simple matter to filter a set
of stereotypes by affect, to reliably identify the metaphors
that impart a mostly positive or negative spin. But poems
are emotion-stirring texts that exploit much more than a
crude two-tone polarity. A system like Stereotrope should
also model the emotions that a metaphorical conceit will
stir in a reader. Yet before Stereotrope can appreciate the
emotions stirred by the properties of a poetic conceit, it
must model how properties reinforce and imply each other.
A stereotype is a simplified but coherent representation
of a complex real-world phenomenon. So we cannot model
stereotypes as simple sets of discrete properties we must
also model how these properties cohere with each other.
For example, the property lush suggests the properties
green and fertile, while green suggests new and fresh. Let
cohere(p) denote the set of properties that suggest and
reinforce p-ness in a stereotye-based description. Thus e.g.
cohere(lush) = {green, fertile, humid, dense, } while
cohere(hot) = {humid, spicy, sultry, arid, sweaty, }. The
set of properties that coherently reinforce another property
is easily acquired through corpus analysis we need only
look for similes where multiple properties are ascribed to a
single topic, as in e.g. as hot and humid as a jungle. To
this end, an automatic harvester trawls the Web for
instances of the pattern as X and Y as, and assumes for
each X and Y pair that Y ! cohere(X) and X ! cohere(Y).
Many properties have an emotional resonance, though
some evoke more obvious feelings than others. The
linguistic mapping from properties to feelings is also more
transparent for some property / feeling pairs than others.
Consider the property appalling, which is stereotypical of
tyrants: the common linguistic usage feel appalled by
suggests that an entity with this property is quite likely to
make us feel appalled. Corpus analysis allows a system
to learn a mapping from properties to feelings for these
obvious cases, by mining instances of the n-gram pattern
feel P+ed by where P can be mapped to the property of a
stereotype via a simple morphology rule. Let feeling(p)
denote the set of feelings that is learnt in this way for the
property p. Thus, feeling(disgusting) = {feel_disgusted_by}
while feeling(humid} = {}. Indeed, because this approach
can only find obvious mappings, feeling(p) = {} for most p.
However, cohere(p) can be used to interpolate a range of
feelings for almost any property p. Let evoke(p) denote the
set of feelings that are likely to be stirred by a property p.
We can now interpolate evoke(p) as follows:
(6) evoke(p) = feeling(p) "
c ! cohere(p)
feeling(c)
So a property p also evokes a feeling f if p suggests
another property c that evokes f. We can predict the range
of emotional responses to a stereotype S in the same way:
(7) evoke(S) =
p ! typical(S)
evoke(p)
If M is chosen from conceits(T) to metaphorically describe
T, the metaphor M is likely to evoke these feelings for T:
(8) evoke(T, M) =
p ! salient(T, M)
evoke(p)
For purposes of gradation, evoke(p) and evoke(S) denote a
bag of feelings rather than a set of feelings. Thus, the more
properties of S that evoke f, the more times that evoke(S)
will contain f, and the more likely it is that the use of S as a
conceit will stir the feeling f in the reader. Stereotrope can
thus predict that both feel disgusted by and feel thrilled by
are two possible emotional responses to the property
bloody (or to the stereotype war), and also know that the
former is by far the more likely response of the two.
The set evoke(T, M) for the metaphorical conceit T is M
can serve the goal of poetry generation in different ways.
Most obviously, it is a rich source of feelings that can be
explicitly mentioned in a poem about T (as viewed thru M).
Alternately, these feelings can be used in a meta-text to
motivate and explain the viewpoint of the poem. The act of
crafting an explanatory text to showcase a poetry systems
creative intent is dubbed framing in Colton et al. (2012).
The current system puts the contents of evoke(T, M) to
both of these uses: in the poem itself, it expresses feelings
to show its reaction to certain metaphorical properties of T;
and in an accompanying framing text, it cites these feelings
as a rationale for choosing the conceit T is M. For example,
in a poem based on the conceit marriage is a prison, the
set evoke(marriage, prison) contains the feelings bored_by,
confined_in, oppressed_by, chilled_by and intimidated_by.
The meta-text that frames the resulting poem expresses the
following feelings (using simple NL generation schema):
Gruesome marriage and its depressing divorces appall
me. I often feel disturbed and shocked by marriage and
its twisted rings. Does marriage revolt you?
Atoms, Compounds and Conceptual Blends
If linguistic creativity is chemistry with words and ideas,
then stereotypes and their typical properties constitute the
periodic table of elements that novel reactions are made of.
These are the descriptive atoms that poems combine into
metaphorical mixtures, as modeled in (1) (8) above. But
poems can also fuse these atoms into nuanced compounds
that may subtly suggest more than the sum of their parts.
Consider the poetry-friendly concept moon, for which
Web similes provide the following descriptive atoms:
typical(moon) = {lambent, white, round, pockmarked,
shimmering, airless, silver, bulging,
cratered, waning, waxing, spooky,
eerie, pale, pallid, deserted, glowing,
pretty, shining, expressionless, rising}
Corpus analysis reveals that authors combine atoms such
as these in a wide range of resonant compounds. Thus, the
Google 2-grams contain such compounds as pallid glow,
"
"
"
Proceedings of the Fourth International Conference on Computational Creativity 2013 154
lambent beauty, silver shine and eerie brightness, all
of which can be used to good effect in a poem about the
moon. Each compound denotes a compound property, and
each exhibits the same linguistic structure. So to harvest a
very large number of compound properties, we simply scan
the Google 2-grams for phrases of the form ADJ NOUN,
where ADJ and NOUN must each denote a property of the
same stereotype. While ADJ maps directly to a property, a
combination of morphological analysis and dictionary
search is needed to map NOUN to its property (e.g. beauty
! beautiful). What results is a large poetic lexicon, one
that captures the diverse and sometimes unexpected ways
in which the atomic properties of a stereotype can be fused
into nuanced carriers of meaning. Compound descriptions
denote compound properties, and those that are shared by
different stereotypes reflect the poetic ways in which those
concepts are alike. For example, shining beauty is shared
by over 20 stereotypes in our poetic lexicon, describing
such entries as moon, star, pearl, smile, goddess and sky.
A stereotype suggests behaviors as well as properties,
and a fusion of both perspective can yield a more nuanced
view. The patterns VERB ADV and ADV VERB are
used to harvest all 2-grams where a property expressed as
an adverb qualifies a related property expressed as a verb.
For example, the Google 2-gram glow palely unites the
properties glowing and pale of moon, which allows moon
to be recognized as similar to candle and ghost because
they too can be described by the compound glow palely. A
ghost, in turn, can noiselessly glide, as can a butterfly,
which may sparkle radiantly like a candle or a star or a
sunbeam. Not every pairing of descriptive atoms will yield
a meaningful compound, and it takes common-sense or a
poetic imagination to sense which pairings will work in a
poem. Though an automatic poet is endowed with neither,
it can still harvest and re-use the many valid combinations
that humans have added to the language trove of the Web.
Poetic allusions anchor a phrase in a vivid stereotype
while shrouding its meaning in constructive ambiguity.
Why talk of the pale glow of the moon when you can
allude to its ghostly glow instead? The latter does more
than evoke the moons paleness it attributes this paleness
to a supernatural root, and suggests a halo of other qualities
such as haunting, spooky, chilling and sinister. Stereotypes
are dense descriptors, and the use of one to convey a single
property like pale will subtly suggest other readings and
resonances. The phrase ghostly glow may thus allude to
any corpus-attested compound property that can be forged
from the property glowing and any other element of the set
typical(ghost). Many stereotype nouns have adjectival
forms such as ghostly for ghost, freakish for freak, inky
for ink and these may be used in corpora to qualify the
nominal form of a property of that very stereotype, such as
gloom for gloomy, silence for silent, or pallor for pale. The
2-gram inky gloom can thus be understood as an allusion
either to the blackness or wetness of ink, so any stereotype
that combines the properties dark and wet (e.g. oil, swamp,
winter) or dark and black (e.g. crypt, cave, midnight) can
be poetically described as exhibiting an inky gloom.
Let compounds() denote a function that maps a set of
atomic properties such as shining and beautiful to the set of
compound descriptors such as the compound property
shining beauty or the compound allusion ghostly glow
that can be harvested from the Google 2-grams. It follows
that compounds(typical(S)) denotes the set of corpus-
attested compounds that can describe a stereotype S, while
compounds(salient(T, M)) denotes the set of compound
descriptors that might be used in a poem about T to suggest
the poetic conceit T is M. Since these compounds will fuse
atomic elements from the stereotypical representations of
both T and M, compounds(salient(T, M)) can be viewed as
a blend of T and M. As described in Fauconnier & Turner
(2002), and computationally modeled in various ways in
Veale & ODonoghue (2000), Pereira (2007) and Veale &
Li (2011), a blend is a tight conceptual integration of two
or more mental spaces. This integration yields more than a
mixture of representational atoms: a conceptual blend often
creates emergent elements new molecules of meaning
that are present in neither of the input representations but
which only arise from the fusion of these representations.
How might the representations discussed here give rise
to emergent elements? We cannot expect new descriptive
atoms to be created by a poetic blend, but we can expect
new compounds to emerge from the re-combination of
descriptive atoms in the compound descriptors of T and M.
Just as we can expect compounds(typical(T) " typical(M))
to suggest a wider range of descriptive possibilities than
compounds(typical(T)) " compounds(typical(M)), we say:
(9) emergent(T, M) = {p ! compounds(salient(T, M))
| p $ compounds(typical(T)) %
p $ compounds(typical(M))}
In other words, the compound descriptions that emerge
from the blend of T and M are those that could not have
emerged from the properties of T alone, or from M alone,
but can only emerge from the fusion of T and M together.
Consider the poetic conceit love is the grave. The
resulting blend as captured by compounds(salient(T, M))
contains a wide variety of compound descriptors. Some
of these compounds emerge solely from the concept grave,
such as sacred gloom, dreary chill and blessed stillness.
Many others emerge only from a fusion of love and grave,
such as romantic stillness, sweet silence, tender darkness,
cold embrace, quiet passion and consecrated devotion. So
a poem that uses these phrases to construct an emotional
worldview will not only demonstrate an understanding of
its topic and its conceit, but will also demonstrate some
measure of insight into how one can complement and
resonate with the other (e.g., that darkness can be tender,
passion can be quiet and silence can be sweet). While the
system builds on second-hand insights, insofar as these are
ultimately derived from Web corpora, such insights are
fragmentary and low-level. It still falls to the system to
stitch these into its own emotionally coherent patchwork of
poetry. What use is poetry if we or our machines cannot
learn from it the wild possibilities of language and life?
Proceedings of the Fourth International Conference on Computational Creativity 2013 155
Generating Witty Insights from Banal Facts
Insight requires depth. To derive original insights about the
topic of a poem, say, of a kind an unbiased audience might
consider witty or clever, a system needs more than shallow
corpus data; it needs deep knowledge of the real world. It
is perhaps ironic then that the last place one is likely to find
real insight is in the riches of a structured knowledge base.
Common-sense knowledge-bases are especially lacking in
insight, since these are designed to contain knowledge that
is common to all and questioned by none. Even domain-
specific knowledge-bases, rich in specialist knowledge, are
designed as repositories of axiomatic truths that will appear
self-evident to their intended audience of experts.
Insight is both a process and a product. While insight
undoubtedly requires knowledge, it also takes work to craft
surprising insights from the unsurprising generalizations
that make up the bulk of our conventional knowledge.
Though mathematicians occasionally derive surprising
theorems from the application of deductive techniques to
self-evident axioms, sound reasoning over unsurprising
facts will rarely yield surprising conclusions. Yet witty
insights are not typically the product of an entirely sound
reasoning process. Rather, such insights amuse and
provoke via a combination of over-statement, selective use
of facts, a mixing of distinct knowledge types, and a clever
packaging that makes maximal use of the Keats heuristic.
Indeed, as has long been understood by humor theorists,
the logic of humorous insight is deeply bound up with the
act of framing. The logical mechanism of a joke a kind of
pseudological syllogism for producing humorous effects
is responsible for framing a situation in such a way that it
gives rise to an unexpected but meaningful incongruity
(Attardo & Raskin, 1992; Attardo et al., 2002). To craft
witty insights from innocuous generalities, a system must
draw on an arsenal of such logical mechanisms to frame its
observations of the world in appeallingly discordant ways.
Attardo and Raskin view the role of a logical mechanism
(LM) as the engine of a joke: each LM provides a different
way of bringing together two overlapping scripts that are
mutually opposed in some pivotal way. A joke narrative is
fully compatible with one of these scripts and only partly
compatible with the other, yet it is the partial match that
we, as listeners, jump to first to understand the narrative. In
a well-structured joke, we only recognize the inadequacy
of this partially-apt script when we reach the punchline, at
which point we switch our focus to its unlikely alternative.
The realization that we can easily duped by appearances,
combined with the sense of relief and understanding that
this realization can bring, results in the AHA! feeling of
insight that often accompanies the HA-HA of a good joke.
LMs suited to narrative jokes tend to engineer oppositions
between narrative scripts, but for purposes of crafting witty
insights in one-line poetic forms, we will view a script as a
stereotypical representation of an entity or event. Armed
with an arsenal of stereotype scripts, Stereotrope will
seek to highlight the tacit opposition between different
stereotypes as they typically relate to each other, while also
engineering credible oppositions based on corpus evidence.
A sound logical system cannot not brook contradictions.
Nonetheless, uncontroversial views can be cleverly framed
in such a way that they appear sound and contradictory, as
when the columnist David Brooks described the Olympics
as a peaceful celebration of our warlike nature. His form
has symmetry and cadence, and pithily exploits the Keats
heuristic to reconcile two polar opposites, war and peace.
Poetic insights do not aim to create real contradictions, but
aim to reveal (and reconcile) the unspoken tensions in
familiar ideas and relationships. We have discussed two
kinds of stereotypical knowledge in this paper: the property
view of a stereotype S, as captured in typical(S), and the
relational view, as captured by a set of question-derived
generalizations of the form Xs <relation> Ys. A blend of
both these sources of knowledge can yield emergent
oppositions that are not apparent in either source alone.
Consider the normative relation bows fire arrows. Bows
are stereotypically curved, while arrows are stereotypically
straight, so lurking beneath the surface of this innocuous
norm is a semantic opposition that can be foregrounded to
poetic effect. The Keats heuristic can be used to package
this opposition in a pithy and thought-provoking form: thus
compare curved bows fire straight arrows (so what?)
with straight arrows do curved bows fire (more poetic)
and the most curved bows fire the straightest arrows
(most poetic). While this last form is an overly strong
claim that is not strictly supported by the stereotype model,
it has the sweeping form of a penetrating insight that grabs
ones attention. Its pragmatic effect a key function of
poetic insight is to reconcile two opposites by suggesting
that they fill complementary roles. In schematic terms,
such insights can be derived from any single norm of the
form Xs <relation> Ys where X and Y denote stereotypes
with salient properties such as soft and tough, long and
short that can be framed in striking opposition. For
instance, the combination of the norm cops eat donuts with
the clichd views of cops as tough and donuts as soft yields
the insight the toughest cops eat the softest donuts. As
the property tough is undermined by the property soft, this
may be viewed as a playful subversion of the tough cop
stereotype. The property toughness is can be further
subverted, with an added suggestion of hypocrisy, by
expressing the generalization as a rhetorical question:
Why do the toughest cops eat the softest donuts?
A single norm represents a highly simplified script, so a
framing of two norms together often allows for opposition
via a conflict of overlapping scripts. Activists, for example,
typically engage in tense struggles to achieve their goals.
But activists are also known for the slogans they coin and
the chants they sing. Most slogans, whether designed to
change the law or sell detergent, are catchy and uplifting.
These properties and norms can now be framed in poetic
opposition: The activists that chant the most uplifting
slogans suffer through the most depressing struggles.
While the number of insights derivable from single norms
is a linear function of the size of the knowledge base, a
combinatorial opportunity exists to craft insights from
pairs of norms. Thus, angels who fight the foulest demons
Proceedings of the Fourth International Conference on Computational Creativity 2013 156
play the sweetest harps, surgeons who wield the most
hardened blades wear the softest gloves, and celebrities
who promote the most reputable charities suffer the
sleaziest scandals all achieve conflict through norm
juxtaposition. Moreover, the order of a juxtaposition
positive before negative or vice versa can also sway the
reader toward a cynical or an optimistic interpretation.
Wit portrays opposition as an inherent part of reality, yet
often creates the oppositions that it appears to reconcile. It
does so by elevating specifics into generalities, to suggest
that opposition is the norm rather than the exception. So
rather than rely wholly on stereotypes and their expected
properties, Stereotrope uses corpus evidence as a proxy
imagination to concocts new classes of individuals with
interesting and opposable qualities. Consider the Google
2-gram short celebrities, whose frequency and plurality
suggests that shortness is a noteworthy (though not typical)
property of a significant class of celebrities. Stereotrope
already possesses the norm that celebrities ride in
limousines, as well as a stereotypical expectation that
limousines are long. This juxtaposition of conventions
allows it to frame a provocatively sweeping generalization:
Why do the shortest celebrities ride in the longest
limousines? While Stereotrope has no evidence for this
speculative claim, and no real insight into the status-
anxiety of the rich but vertically-challenged, such an
understanding may follow in time, as deeper and subtler
knowledge-bases become available for poetry generation.
Poetic insight often takes the form of sweeping claims
that elevate vivid cases into powerful exemplars. Consider
how Stereotrope uses a mix of n-gram evidence and norms
to generate these maxims: The most curious scientists
achieve the most notable breakthroughs and The most
impartial scientists use the most accurate instruments.
The causal seeds of these insights are mined from the
Google n-grams in coordinations such as hardest and
sharpest and most curious and most notable. These n-
gram relationships are then be projected onto banal norms
such as scientists achieve breakthroughs and scientists
use instruments for whose participants these properties
are stereotypical (e.g. scientists are curious and impartial,
instruments are accurate, breakthroughs are notable, etc.).
Such claims can be taken literally, or viewed as vivid
allusions to important causal relationships. Indeed, when
framed as explicit analogies, the juxtaposition of two such
insights can yield unexpected resonances. For example,
the most trusted celebrities ride in the longest limousines
and the most trusted preachers give the longest sermons
are both inspired by the 4-gram most trusted and longest.
This common allusion suggests an analogy: Just as the
most trusted celebrities ride in the longest limousines, the
most trusted preachers give the longest sermons. Though
such analogies are driven by superficial similarity, they can
still evoke deep resonances for an audience. Perhaps a
sermon is a vehicle for a preachers ego, just as a limousine
is an obvious vehicle for a celebritys ego? Reversing the
order of the analogy significantly alters its larger import,
suggesting that ostentatious wealth bears a lesson for us all.
Tying it all together in Stereotrope
Having created the individual pieces of form and meaning
from which a poem might be crafted, it now falls to us to
put the pieces together in some coherent form. To recap,
we have shown how affective metaphors may be generated
for a given topic, by building on popular metaphors for that
topic in the Google n-grams; shown how a tight conceptual
blend, with emergent compound properties of its own, can
be crafted from each of these metaphors; shown how the
feelings evoked by these properties may be anticipated by
a system; and shown how novel insights can be crafted
from a fusion of stereotypical norms and corpus evidence.
We view a poem as a summarization and visualization
device that samples the set of properties and feelings that
are evoked when a topic T is viewed as M. Given T, an M
is chosen randomly from conceits(T). Each line of the text
renders one or more properties in poetic form, using tropes
such as simile and hyperbolae. So if salient(T, M) contains
hot and compounds(salient(T, M)) contains burn brightly
for T=love and M=fire, say this mix of elements may be
rendered as No fire is hotter or burns more brightly. It
can also be rendered as an imperative, Burn brightly with
your hot love, or a request, Let your hot love burn
brightly. The range of tropes is best conveyed with
examples, such as this poetic view of marriage as a prison:
The legalized regime of this marriage
My marriage is an emotional prison
Barred visitors do marriages allow
The most unitary collective scarcely organizes so much
Intimidate me with the official regulation of your prison
Let your sexual degradation charm me
Did ever an offender go to a more oppressive prison?
You confine me as securely as any locked prison cell
Does any prison punish more harshly than this marriage?
You punish me with your harsh security
The most isolated prisons inflict the most difficult hardships
O Marriage, you disgust me with your undesirable security
Each poem obeys a semantic grammar, which minimally
indicates the trope that should be used for each line. Since
the second-line of the grammar asks for an apt <simile>,
Stereotrope constructs one by comparing marriage to a
collective; as the second-last line asks for an apt <insight>,
one is duly constructed around the Google 4-gram most
isolated and most difficult. The grammar may also dictate
whether a line is rendered as an assertion, an imperative, a
request or a question, and whether it is framed positively or
negatively. This grammar need not be a limiting factor, as
one can choose randomly from a pool of grammars, or
even evolve a new grammar by soliciting user feedback.
The key point is the pivotal role of a grammar of tropes in
mapping from the properties and feelings of a metaphorical
blend to a sequence of poetic renderings of these elements.
Consider this poem, from the metaphor China is a rival:
Proceedings of the Fourth International Conference on Computational Creativity 2013 157
No Rival Is More Bitterly Determined
Inspire me with your determined battle
The most dogged defender scarcely struggles so much
Stir me with your spirited challenge
Let your competitive threat reward me
Was ever a treaty negotiated by a more competitive rival?
You compete with me like a competitively determined athlete
Does any rival test more competitively than this China?
You oppose me with your bitter battle
Can a bitter rival suffer from such sweet jealousies?
O China, you oppress me with your hated fighting
Stereotypes are most eye-catching when subverted, as in
the second-last line above. The Google 2-gram sweet
jealousies catches Stereotropes eye (and ours) because it
up-ends the belief that jealousy is a bitter emotion. This
subversion nicely complements the sterotype that rivals are
bitter, allowing Stereotrope to impose a thought-provoking
opposition onto the banal norm rivals suffer from jealousy.
Stereotype emphasises meaning and intent over sound
and form, and does not (yet) choose lines for their rhyme
or metre. However, given a choice of renderings, it does
choose the form that makes best use of the Keats heuristic,
by favoring lines with alliteration and internal symmetry
Evaluation
Stereotrope is a knowledge-based approach to poetry, one
that crucially relies on three sources of inspiration: a large
roster of stereotypes, which maps a slew of familiar ideas
to their most salient properties; a large body of normative
relationships which relate these stereotypes to each other;
and the Google n-grams, a vast body of language snippets.
The first two are derived from attested language use on the
web, while the third is a reduced view of the linguistic web
itself. Stereotrope represents approx. 10,000 stereotypes in
terms of approx. 75,000 stereotype-to-property mappings,
where each of these is supported by a real web simile that
attests to the accepted salience of a given property. In
addition, Stereotrope represents over 50,000 norms, each
derived from a presupposition-laden question on the web.
The reliability of Stereotropes knowledge has been
demonstrated in recent studies. Veale (2012a) shows that
Stereotropes simile-derived representations are balanced
and unbiased, as the positive/negative affect of a stereotype
T can be reliably estimated as a function of the affect of the
contents of typical(T). Veale (2012b) further shows that
typical(T) can be reliably partitioned into sets of positive or
negative properties as needed, to reflect an affective spin
imposed by any given metaphor M. Moreover, Veale (ibid)
shows that copula metaphors of the form T is an M in the
Google n-grams the source of srcTypical(T) are also
broadly consistent with the properties and affective profile
of each stereotype T. So in 87% of cases, one can correctly
assign the label positive or negative to a topic T using only
the contents of srcTypical(T), provided it is not empty.
Stereotrope derives its appreciation of feelings from its
understanding of how one property presupposes another.
The intuition that two properties X and Y that are found in
the pattern as X and Y as evoke similar feelings is
supported by the strong correlation (0.7) observed between
the positivity of X and of Y over the many X/Y pairs that
are harvested from the web using this acquisition pattern.
The fact that bats lay eggs can be found over 40,000
times on the web via Google. On closer examination, most
matches form part of a larger question, do bats lay eggs?
The question why do bats lay eggs? has zero matches. So
Why do questions provide an effective superstructure for
acquiring normative facts from the web: they identify facts
that are commonly presupposed, and thus stereotypical,
and clearly mark the start and end of each presupposition.
Such questions also yield useful facts: Veale & Li (2011)
shows that when these facts are treated as features of the
stereotypes for which they are presupposed, they provide
an excellent basis for classifying different stereotypes into
the same ontological categories, as would be predicted by
an ontology such as WordNet (Fellbaum, 1998). Moreover,
these features can be reliably distributed to close semantic
neighbors to overcome the problem of knowledge sparsity.
Veale & Li demonstrate that the likelihood that a feature of
stereotype A can also be assumed of stereotype B is a clear
function of the WordNet similarity of A and B. While this
is an intuitive finding, it would not hold at all if not for the
fact that these features are truly meaningful for A (and B).
The problem posed by bats lay eggs is one faced by
any system that does not perceive the whole context of an
utterance. As such, it is a problem that plagues the use of
n-gram models of web content, such as Googles n-grams.
Stereotrope uses n-grams to suggest insightful connections
between two properties or ideas, but if these n-grams are
mere noise, not even the Keats heuristic can disguise them
as meaningful signals. Our focus is on relational n-grams,
of a kind that suggests deep tacit relationships between two
concepts. These n-grams obey the pattern X <rel> Y,
where X and Y are adjectives or nouns and <rel> is a
linking phrase, such as a verb, a preposition, a coordinator,
etc. To determine the quality of these n-grams, and to
assess the likelihood of extracting genuine relational
insights from them, we use this large subset of the Google
n-grams as a corpus for estimating the relational similarity
of the 353 word pairs in the Finklestein et al. (2002)
WordSim-353 data set. We estimate the relatedness of two
words X and Y as the PMI (pointwise mutual information
score) of X and Y, using the relational n-grams as a corpus
for occurrence and co-occurrence frequencies of X and Y.
A correlation of 0.61 is observed between these PMI scores
and the human ratings reported by Finklestein et al. (2002).
Though this is not the highest score achieved for this task,
it is considerably higher than any than has been reported
for approaches that use WordNet alone. The point here is
that this relational subset of the Google n-grams offers a
reasonably faithful mirror of human intuitions for purposes
of recognizing the relatedness of different ideas. We thus
believe these n-grams to be a valid source of real insights.
Proceedings of the Fourth International Conference on Computational Creativity 2013 158
The final arbiters of Stereotropes poetic insights are the
humans who use the system. We offer the various services
of Stereotrope as a public web service, via this URL:
http://boundinanutshell.com/metaphor-magnet
We hope these services will also allow other researchers to
reuse and extend Stereotropes approaches to metaphor,
blending and poetry. Thus, for instance, poetry generators
such as that described in Colton et al. (2012) which
creates topical poems from fragments of newspapers and
tweets can use Stereotropes rich inventories of similes,
poetic compounds, feelings and allusions in its poetry.
Summary and Conclusions
Poets use the Keats heuristic to distil an amorphous space
of feelings and ideas into a concise and memorable form.
Poetry thus serves as an ideal tool for summarizing and
visualizing the large space of possibilities that is explored
whenever we view a familiar topic from a new perspective.
In this paper we have modelled poetry as both a product
and an expressive tool, one that harnesses the processes of
knowledge acquisition (via web similes and questions),
ideation (via metaphor and insight generation), emotion
(via a mapping of properties to feelings), integration (via
conceptual blending) and rendering (via tropes that map
properties and feelings to poetic forms). Each of these
processes has been made publicly available as part of a
comprehensive web service called Metaphor Magnet.
We want our automated poets to be able to formulate
real meanings that are worthy of poetic expression, but we
also want them to evoke much more than they actually say.
The pragmatic import of a creative formulation will always
be larger than the systems ability to model it accurately.
Yet the human reader has always been an essential part of
the poetic process, one that should not be downplayed or
overlooked in our desire to produce computational poets
that fully understand their own outputs. So for now, though
there is much scope, and indeed need, for improvement, it
is enough to know that an automated poem is anchored in
real meanings and intentional metaphors, and to leave
certain aspects of creative interpretation to the audience.
Acknowledgements
This research was supported by the WCU (World Class
University) program under the National Research
Foundation of Korea (Ministry of Education, Science and
Technology of Korea, Project no. R31-30007).
References
Attardo, S. and Raskin, V. 1991. Script theory revis(it)ed: joke
similarity and joke representational model. Humor: International
Journal of Humor Research, 4(3):293-347.
Attardo, S., Hempelmann, C.F. & Di Maio, S. 2002. Script
oppositions and logical mechanisms: Modeling incongruities and
their resolutions. Humor: Int. J. of Humor Research, 15(1):3-46.
Brants, T. & Franz, A. 2006. Web 1T 5-gram Version 1.
Linguistic Data Consortium.
Colton, S., Goodwin, J. and Veale, T. 2012. Full-FACE Poetry
Generation. In Proc. of ICCC 2012, the 3
rd
International
Conference on Computational Creativity. Dublin, Ireland.
Fass, D. 1997. Processing Metonymy & Metaphor. Contemporary
Studies in Cognitive Science & Technology. New York: Ablex.
Fauconnier, G. & Turner, M. 2002. The Way We Think. Concep-
tual Blending and the Mind's Hidden Complexities. Basic Books.
Fellbaum, C. (ed.) 2008. WordNet: An Electronic Lexical Data-
base. MIT Press, Cambridge.
Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z.,
Wolfman, G. and Ruppin, E. 2002. Placing Search in Context:
The concept revisited. ACM Transactions on Information
Systems, 20(1):116-131.
Lakoff, G. & Turner, M. 1989. More than cool reason: a field
guie to poetic metaphor. University of Chicago Press.
McGlone, M.S. & Tofighbakhsh, J. 2000. Birds of a feather flock
conjointly (?): rhyme as reason in aphorisms. Psychological
Science 11 (5): 424428.
Fishelov, D. 1992. Poetic and Non-Poetic Simile: Structure,
Semantics, Rhetoric. Poetics Today 14(1):123.
McGlone, M.S. & Tofighbakhsh, J. 1999. The Keats heuristic:
Rhyme as reason in aphorism interpretation. Poetics 26(4):235-44.
zbal, G. and C. Strapparava. 2012. A computational approach to
automatize creative naming. In Proc. of the 50
th
annual meeting of
the Association of Computational Linguistics, Jeju, South Korea.
Pereira, F. C. 2007. Creativity and artificial intelligence: a
conceptual blending approach. Walter de Gruyter.
Shutova, E. 2010. Metaphor Identification Using Verb and Noun
Clustering. In Proceedings of the 23
rd
International Conference
on Computational Linguistics, 1001-10.
Veale, T. & D. ODonoghue. 2000. Computation and Blending.
Cognitive Linguistics, 11(3-4):253-281.
Veale, T. and Hao, Y. 2007a. Making Lexical Ontologies
Functional and Context-Sensitive. In Proceedings of the 46
th
Ann.
Meeting of Assoc. of Computational Linguistics.
Veale T. & Hao, Y. 2007b. Comprehending and generating apt
metaphors: a web-driven, case-based approach to figurative
language. In Proceedings of AAAI2007, the 22nd national
conference on Artificial intelligence, pp.1471-1476.
Veale, T. & Li, G. 2011. Creative Introspection and Knowledge
Acquisition: Learning about the world thru introspective
questions and exploratory metaphors. In Proc. of AAAI2011, the
25
th
Conference of the Association for the Advancement of AI.
Veale, T. 2012a. Exploding the Creativity Myth: Computational
Foundations of Linguistic Creativity. London: Bloomsbury.
Veale, T. 2012b. A Context-sensitive, Multi-faceted model of
Lexico-Conceptual Affect. In Proc. of the 50
th
annual meeting of
the Association of Computational Linguistics. Jeju, South Korea.
Proceedings of the Fourth International Conference on Computational Creativity 2013 159
Harnessing Constraint Programming for Poetry Composition
Jukka M. Toivanen and Matti J arvisalo and Hannu Toivonen
HIIT and Department of Computer Science
University of Helsinki
Finland
Abstract
Constraints are a major factor shaping the conceptual space
of many areas of creativity. We propose to use constraint pro-
gramming techniques and off-the-shelf constraint solvers in
the creative task of poetry writing. We show how many as-
pects essential in different poetical forms, and partially even
in the level of language syntax and semantics can be repre-
sented as interacting constraints.
The proposed architecture has two main components. One
takes input or inspiration from the user or the environment,
and based on it generates a specication of the space and
aesthetic of a poem as a set of declarative constraints. The
other component explores the specied space using a con-
straint solver.
We provide an elementary set of constraints for composition
of poetry, we illustrate their use, and we provide examples of
poems generated with different sets of constraints.
Introduction
Rules and constraints can be seen as an essential ingredi-
ent of creativity. First, there typically are strong constraints
on the creative artefacts. For instance, consider traditional
western music. In order for a composition to be recognized
as (western) music in the rst place, it must meet a number
of requirements concerning, e.g., timbre, scale, melody, har-
mony, and rhythm. For any specic genre of western music,
the constraints usually become much tighter.
Similarly, the composition of many types of poetry is gov-
erned by numerous rules specifying such things as strict
stress and syllable patterns, rhyming and alliteration struc-
tures, and selection of words with certain associations
in addition to the basic constraints of syntax and semantics
that are needed to make the expressions understandable and
meaningful.
However, constraints are not just a nuisance that creative
agents need to cope with in order to produce plausible re-
sults. On the contrary, constraints are often considered to be
an essential source of creativity for humans. For instance,
composer Igor Stravinsky associated constraints with creat-
ing freedom, not containment:
The more constraints one imposes, the more one frees
ones self of the chains that shackle the spirit.
(Stravinsky 1947)
Constraints can also be used as computational tools for
studies of creativity or creative artefacts. Articial intelli-
gence researcher Marvin Minsky suggested that a good way
to learn about how music worked was to represent musical
compositions as interacting constraints, then modify these
constraints and study their effects on the musical structures
(Roads 1980). This essential idea has been explored exten-
sively in the eld of computer music research afterwards.
Our domain of interest in this paper is composition of
poetry. We envision a computational environment where
formally expressed constraints and constraint programming
methods are used to (1) specify a conceptual search space,
(2) dene an aesthetic of concepts in the space, (3) explore
the space to nd the most aesthetic concepts in it.
Any given set of (hard) constraints on poems species a
space of possible poems. For instance, the number of lines
and the number of syllables per line could be such con-
staints, contributing to the style of poetry. Soft constraints,
in turn, can be used to indicate (aesthetical) preferences over
poems and to rank poems that match the hard constraints.
For instance, rhyme could be a soft constaint, giving pref-
erence to poems that follow a given rhyme structure but not
absolutely requiring it.
In this paper we study and illustrate the power of con-
straint programming for creating poems. In our current set-
up, the creative system consists of two subcomponents. One
takes input from user or from some other source of inspi-
ration, and based on it species the space and poetical aes-
thetic (as a set of constraints). The other subcomponent ex-
plores the specied space using the aesthetic, i.e., produces
optimally aesthetic poems in the space (using a constraint
solver).
We show how poems can be generated by applying dif-
ferent kinds of constraints and constraint combinations us-
ing an off-the-shelf constraint programming tool. The ele-
gance of this approach is that it is not based on specifying
a step sequence to produce a certain kind of a poem, but
rather on declaring the properties of a solution to be found
using mathematical constraints. An empirical evaluation of
the obtained poetry is left for future work.
We next briey review some related work on constraint
programming in creative applications, and on poetry genera-
tion. Then we provide a description of a constraint model for
composing poems, illustrating the ideas with examples. We
Proceedings of the Fourth International Conference on Computational Creativity 2013 160
discuss the results and conclude by outlining future work.
Related Work
Constraint-based methods have been applied in various
elds such as conguration and verication, planning, and
evolution of language, to name a few. In the area of com-
putational creativity, constraints have been used mostly to
describe the composition of various aspects of music. For
example, Boenn et al. (2011) have developed an extensive
music composition system called Anton which uses Answer
Set Programming to represent the musical knowledge and
the rules of the system. Anton describes a model of mu-
sical composition as a collection of interacting constraints.
The system can be used to compose short pieces of music as
well as to assist the composer by making suggestions, com-
pletions, and verications to aid in the music composition
process.
On the other hand, composition of poetry with constraint
programming techniques has received little if any atten-
tion. Several different approaches have been used (Manu-
rung, Ritchie, and Thompson 2000; Gerv as 2001; Manurung
2003; Diaz-Agudo, Gerv as, and Gonz alez-Calero 2002;
Wong and Chun 2008; Netzer et al. 2009; Colton, Good-
win, and Veale 2012; Toivanen et al. 2012), many involving
constraints in one form or another, but we are not aware of
any other work systematically based on constraints and im-
plemented using a constraint solver.
The system developed by Manurung et al. (2003) uses
a grammar-driven formulation to generate metrically con-
strained poetry out of a given topic. This approach performs
stochastic hillclimbing search within an explicit state-space,
moving from one solution to another. The explicit repre-
sentation is based on a hand-crafted transition system. In
contrast, we employ constraint-programming methodology
based on searching for optimal solutions over an implicit
representation of the conceptual space. Our approach should
scale better to large numbers of constraints and a large input
vocabulary than explicit state-space search.
The ASPERA poetry composition system (Gerv as 2001),
on the other hand, uses a case-based reasoning approach.
This system generates poetry out of a given input text via
composition of poetic fragments retrieved from a case-base
of existing poetry. These fragments are then combined to-
gether by using additional metrical rules.
The Full-FACE poetry generation system (Colton, Good-
win, and Veale 2012) uses a corpus-based approach to gen-
erate poetry according to given constraints on, for instance,
meter and stress. The system is also argued to invent its
own aesthetics and framings of its work. In contrast to our
system, this approach uses constraints to shape only some
aspects of the poetry composition procedure whereas our ap-
proach is fully based on expressing various aspects of poetry
as mutually interacting constraints and using a constraint-
solver to efciently search for solutions.
The approach of this paper extends and complements our
previous work (Toivanen et al. 2012). We proposed a
method where a template is extracted randomly from a given
corpus, and words in the template are substituted by words
related to a given topic. Here we show how such basic func-
tionality can be expressed with constraints, and more inter-
estingly, how constraint programming can be used to add
control for rhyme, meter, and other effects.
Simpler poetry generation methods have been proposed,
as well. In particular, Markov chains have been widely used
to compose poetry. They provide a clear and simple way
to model some syntactic and semantic characteristics of lan-
guage (Langkilde and Knight 1998). However, the resulting
poetry tends to have rather poor sentence and poem struc-
tures due to only local syntax and semantics.
Overview
The proposed poetry composition system has two subcom-
ponents: a conceptual space specier and a conceptual space
explorer. The former one determines what poems can be like
and what kind of poems are preferred, while the latter one
assumes the task of producing such poems.
The modularity and the explicit specication of the con-
ceptual search space have great potential benets. Modular-
ity allows one to (partially) separate the content and form
of poetry from the computation needed to produce matching
poetry. An explicit, declarative specication, in turn, gives
the creative system a potential to introspect and modify its
own goals and intermediate results (a topic to which we will
return in the conclusion).
A high-level view to the internal structure of the poetry
composition system considered in this work is shown in
Figure 1. In this paper, our focus is on the explorer com-
ponent and on the interface between the components. Our
specier component is built on the principles of Toivanen
et al. (2012), but ideas from many other poetry generation
systems (Gerv as 2001; Manurung 2003; Colton, Goodwin,
and Veale 2012) could be used in the specier component as
well.
The assumption in the model presented here is that the
specier can generate a large number of mutually depen-
dent choices of words for different positions in the poem,
as well as dependencies between them. The specier uses
input from the user and potentially other sources as its inspi-
ration and parameters and automatically generates the input
for the explorer component, shielding user from the details
of constraint programming.
The automatically generated data or facts are con-
veyed to the explorer component that consists of a con-
straint solver and a static library of constraints. The library
is provided by the system designers, i.e., by us, and any
constraints that the specier component wishes to use are
triggered by the data it generates. The user of the system
does not need to interact directly with the constraint library
(but the specier component may offer the user options for
choosing which constraints to use).
Our focus in this paper is on the explorer component, and
in the constraint specications that it receives from the spec-
ier component or from the static library:
The number of lines, and the number of words on each
line (we call this the skeleton of the poem).
Proceedings of the Fourth International Conference on Computational Creativity 2013 161
Figure 1: Overview of the poetry composition workow. The user provides some inspiration and parameters, based on which
the space specier component generates a set of constraints, used as data by the constraint solver in the explorer component.
The explorer component additionally contains a static library of constraints that are dynamically triggered by the data. Explorer
component then outputs a poem that best fullls wishes of the user.
For each word position in the skeleton, a list of words that
potentially can be used in the position (collectively called
the candidates).
Possible additional requirements on the desired form of
the poem (e.g., rhyming structure).
Possible additional requirements on the syntax and con-
tents of the poem (e.g., interdependencies between words
to make valid expressions).
We will next describe these in more detail.
Poetry Composition via Answer Set
Programming
The explorer component takes as input specications dy-
namically generated by the specier, affecting both the
search space and the aesthetic. In addition, it uses a static
constraint library. Together, the dynamic specications and
the constraint library form a constraint satisfaction problem
(or, by extension, an optimization problem; see end of the
section). The constraint satisfaction problem is built so that
the solutions to the problem are in one-to-one correspon-
dence with the poems that satisfy the requirements imposed
by the specier component of the system (as potentially in-
structed by the user). Highly optimized off-the-shelf con-
straint satisfaction solvers can then be used to nd the solu-
tions, i.e., to produce poems.
In this work, we employ answer set programming
(ASP) (Gelfond and Lifschitz 1988; Niemel a 1999; Simons,
Niemel a, and Soininen 2002) as the constraint programming
paradigm, since ASP allows for expressing the poem con-
struction task in an intuitively appealing way. At the same
time, state-of-the-art ASP solvers, such as Clasp (Gebser,
Kaufmann, and Schaub 2012), provide an efcient way of
nding solutions to the poem construction task. Further-
more, ASP offers in-built support for constraint optimiza-
tion, which allows for searching for a poem of high quality
with respect to different imposed quality measures.
We will not provide formal details on answer set program-
ming and its underlying semantics; the interested reader
is referred to other sources (Gelfond and Lifschitz 1988;
Niemel a 1999; Simons, Niemel a, and Soininen 2002) for a
detailed account. Instead, we will in the following provide
a step-by-step intuitive explanation on how the task of po-
etry generation can be expressed in the language of ASP. For
more hands-on examples on how to express different com-
putational problems in ASP, we refer the interested reader to
Gebser et al. (2008).
Answer set programming can be viewed as a data-centric
constraint satisfaction paradigm, in which the input data,
represented via predicates, expresses the problem instance.
In our case, this dynamically generated data will express,
for example, basic information on the poem skeleton (such
as length of lines), and the candidate words within the in-
put vocabulary that can be used in different positions within
the poem. The actual computational problem (in our case
poetry generation) is expressed via rule-based constraints
which are used for inferring additional knowledge based on
the input data, as well as for imposing constraints over the
solutions of interest. The rule-based constraints constitute
the static constraint library: once written, they can be re-
used in any instances of poem generators just by generating
data that activates the constraints. Elementary constraints
are an integral part of the system comparable to program
code. More rule-based constraints can be added by the spec-
ier component if needed. The end-user does not need to
write any constraints.
The Basic Model
We next describe a constraint library, starting with elemen-
tary constraints. We also illustrate dynamically generated
specications. While these are already sufcient to gener-
ated poetry comparable to that of Toivanen et al. (2012), we
remind the reader that these constraints are examples illus-
trating the exibility of constraint programming in compu-
Proceedings of the Fourth International Conference on Computational Creativity 2013 162
Table 1: The predicates used in the basic ASP model
Predicate Interpretation
rows(X) the poem has X rows
positions(X,Y) the poem contains Y words on row X
candidate(W,I,J,S) the word W, containing S syllables, is a candidate for the Jth word of the Ith line
word(W,I,J,S) the word W, containing S syllables, is at position J on row I in the generated poem
% Generator part
{ word(W,I,J,S) } :- candidate(W,I,J,S). (G1)
% Testing part: the constraints
:- not 1 { word(W,I,J,S) } 1, rows(X), I = 1..X, positions(I,Y), J=1..Y. (T1)
Figure 2: Answer set program for generating poetry: the basic model
tational poetry composition, and different sets of constraints
can be used for different effects.
We will rst give a two-line basic model of the constraint
library that takes the skeleton and candidates as input. This
model simply states that exactly one of the given candidate
words must be selected for each word position of the poem.
Predicates The predicates used in the basic answer set
program are listed in Table 1, together with their intuitive
interpretations.
The input predicates rows/1 and positions/2 char-
acterize the number of rows and the number of words al-
lowed on the individual rows of the generated poems. The
input predicate candidate/4 represents the input vocab-
ulary, i.e., the words that may be considered as candidates
for words at specic positions.
The output predicate word/4 represents the solutions to
the answer set program, i.e., the individual words and their
positions in the generated poem.
Example. The following is an example of the basic structure
of a data le representing a possible input to the basic ASP
model
rows(6).
positions(1,6).
positions(2,8).
positions(3,8).
positions(4,5).
positions(5,6).
positions(6,6).
candidate("I",1,1,1).
candidate("melt",1,2,1).
candidate("weed",1,2,1).
candidate("teem",1,2,1).
candidate("kidnap",1,2,2).
candidate("perspire",1,2,2).
candidate("shut",1,2,1).
candidate("eclipse",1,2,1).
candidate("sea",1,2,1).
candidate("plan",1,2,1).
candidate("hang",1,2,1).
candidate("police",1,2,2).
candidate("revamp",1,2,2).
candidate("flip",1,2,1).
candidate("wring",1,2,1).
candidate("sting",2,2,2).
. . .
Rules The answer set program that serves as our basic
model for generating poetry is shown in Figure 2. The pro-
gram can be viewed in two parts: the generator part (Rule
G1) and the testing part (Rule T1). The test part consists of
rule-based constraints that lter out poems that do not sat-
isfy the conditions for acceptable poems characterized by
the program.
In the generator part, Rule G1 states that each candidate
word for a specic position of the poem may be consid-
ered to be chosen as the word at that position in the gen-
erated poem (expressed using the so-called choice construct
{ word(W,I,J,S) }).
In the testing part, Rule T1 imposes the most fundamental
constraint that exactly one candidate word should be chosen
for each word position in the poem: the empty left-hand-
side of the rule is interpreted as falsum, a contradiction. The
rule then states that, for each row and each position on the
row, it is a contradiction if it is not the case that exactly one
word is chosen for that position (expressed as the cardinality
construct 1 { word(W,I,J,S) } 1).
Example. Given the data presented above these basic rules
are now grounded as follows. There are six lines in the poem
as described by the rows predicate and each of these lines has
a certain number of positions to be lled with words as de-
scribed by the positions predicate. The candidate predicates
specify which words are suitable choices for these positions.
During grounding the solver tries to nd a suitable candi-
date for each position, which is trivial in the basic model
that lacks any constraints between the words. We consider
more interesting models next.
Controlling the Form of Poems
We will now describe examples of how the form of the po-
ems being generated can be further controlled in a modu-
lar fashion by introducing additional predicates and rules
over these predicates to the basic ASP model. The addi-
tional predicates introduced for these examples are listed in
Proceedings of the Fourth International Conference on Computational Creativity 2013 163
Table 2: Predicates used in extending the basic ASP model
Predicate Interpretation
must rhyme(I,J,K,L) the word at position J on row I and the word at position L on row K are required to rhyme
rhymes(X,Y) the words X and Y rhyme
nof syllables(I,C) the Ith row of the poem is required to contain C syllables
min occ(W,L) L is the lower bound on the number of occurrence of the word W
max occ(W,U) U is the upper bound on the number of occurrence of the word W
% Generator part
{ word(W,I,J,S) } :- candidate(W,I,J,S). (G1)
rhymes(Y,X) :- rhymes(X,Y). (G2)
syllables(W,S) :- candidate(W,_,_,S). (G3)
% Testing part: the constraints
:- not 1 { word(W,I,J,S) } 1, rows(X), I = 1..X, positions(I,Y), J=1..Y. (T1)
:- word(W,I,J,S), word(V,K,L,Q), must_rhyme(I,J,K,L), not rhymes(W,V). (T2)
:- Sum = #sum [ word(W,I,J,S) = S ], Sum != C, nof_syllables(I,C), (T3)
I = 1..X, rows(X).
:- not L { word(W,_,_,_) } U, min_occ(W,L), max_occ(W,U). (T4)
Figure 3: Answer set program for generating poetry: extending the basic model
Table 2. Using these predicates, rules that rene the basic
model are shown in Figure 3 (Rules G2, G3, and T2T4).
Rhyming The predicate must rhyme/4 is used for pro-
viding pairwise word positions that should rhyme. Knowl-
edge on the pairwise relations of the candidate words,
namely, which pairs of candidate words rhyme, is pro-
vided via the rhymes/2 predicate. Rule G2 enforces that
rhyming of two words is a symmetry relation. In the testing
part Rule T2 imposes the constraint that, in case two words
chosen for specic positions in a poem must rhyme, but the
chosen two words do not rhyme, a contradiction is reached.
Numbers of Syllables The basic model can also be ex-
tended to generate poetical structures with more specic
constraints. As an example, one can consider forms of po-
etry that have strict constraints on the numbers of syllables
in every line, such as haikus, tankas, and sonnets.
We use the additional predicate nof syllables/2 for
providing as input the required number of syllables on the
individual rows. At the same time, Rule G3 projects the
information on the number of syllables of each candidate
word to the syllables/2 predicate. Rule T3 can then
be used to ensure that the number of syllables on each
row (line) of the poem (computed and stored in the Sum
variables using the counting construct Sum = #sum [
word(W,I,J,S) = S ]) matches the number of sylla-
bles specied for the row by the nof syllables/2 pred-
icate.
Word Occurrences The simple model above does not
control possible repetitions of words at all. Such con-
trol can be easily added by introducing input predicates
min occ(W,L) and max occ(W,U), which are then
used to state for each word W the minimum L (respectively,
maximum U) number of occurrences allowed for the word.
Using these additional predicates, Rule T4 then constrains
the number of occurrences to be within these lower and up-
per bounds (expressed by the cardinality constraint L {
word(W, , , ) } U).
Further Possibilities for Controlling Form The possi-
bilities of controlling poetical forms are not of course lim-
ited to simple requirements for fullment of certain sylla-
ble structures or rules for rhyming and alliteration. Besides
strict constraints on numbers of syllables on verse, classi-
cal forms of poetry usually obey a specic stress pattern,
as well. Stress can be handled with constraints similar to the
ones governing syllables. Metric feet like iamb, anapest, and
trochee can be used by specifying constraints that describe
positions where the syllable stress must lie in every line of
verse.
Controlling poetical form also provides interesting possi-
bilities for using constraint optimization techniques (to be
described below). As an example, consider different forms
of lipograms i.e. poems that avoid a particular letter like e or
univocal poems where the set of possible vowels in the poem
is restricted to only one vowel. Similarly, more complex op-
timisations of the speech sound structure can be handled de-
pending on whether the wished poetry is required to have
soft or brutal sound, or to have characteristics of a tongue-
twister.
Controlling the Contents and Syntax of Poems
While the example constraints presented above focus on
controlling the form of poems, linguistic knowledge of
phonology, morphology, and syntax (as examples) can simi-
larly be controlled by introducing additional constraints in a
modular fashion. This includes rules of syntax that specify
Proceedings of the Fourth International Conference on Computational Creativity 2013 164
failed_rhyme(I,J,K,L) :- word(W,I,J,S), word(V,K,L,Q),
must_rhyme(I,J,K,L), not rhymes(W,V). (T2)
failed_syllable_count(I) :- Sum = #sum [ word(W,I,J,S) = S ], Sum != C,
nof_syllables(I,C), I = 1..X, rows(X). (T3)
failed_occount(W) :- not L { word(W,_,_,_) } U, min_occ(W,L), max_occ(W,U). (T4)
#minimize [ failed_rhyme(I,J,K,L) @3 ]. (O2)
#minimize [ failed_syllable_count(I) : I=1..X : rows(X) @2 ]. (O3)
#minimize [ failed_occount(W) @1 ]. (O4)
Figure 4: Handling inconsistencies by relaxing the constraints and introducing optimization criteria
how linguistic elements are sequenced to form valid state-
ments and rules of semantics which specify how valid refer-
ences are made to concepts.
Consider, for example, transitive and intransitive verbs,
i.e., verbs that either require or do not require an object to
be present in the same sentence. Here one can impose addi-
tional constraints for declaring which words can or cannot be
used in the same sentence where a transitive verb requiring
certain preposition and an object has been used. Similarly
other constraints not directly related to the poetical forms
but rather to linguistic structures like idioms, where several
words are always bundled together, can be effectively de-
clared as constraints. The same holds for syntactic aspects
such as rules governing the constituent structure of sentences
(Lierler and Sch uller 2012).
As a simple, more concrete example, consider the follow-
ing. In order to declare that the poems of interest start with
the word I, the fact word("I",1,1,1). can be added
to the constraint model. In order to ensure that all verbs as-
sociated with the rst person should be in past tense, the ad-
ditional predicate in past tense/1 can be introduced,
and specied for each past-tense verb in the data. Combin-
ing the above, one can as an example declare that the word
following any I is in a past tense, using the following two
rules.
:- word("I",I,J,1), word(W,I,J+1, ),
not in past tense(W).
:- word("I",I,J,1), positions(I,J),
word(W,I+1,1, ), not in past tense(W).
Here the rst rule handles the case that the occurrence of
I is not the last word on a row. The second rule handles
the case that I is the last word on a row, in which case the
rst word on the following row should be in past tense.
More generally, one can pose constraints that ensure that
two (or more) words within a poem are compatible (in some
specied sense), even if the words are not next to each
other. For an example, consider the additional predicated
pronoun/1 and verb/1 that hold for words that are pro-
nouns and verbs, respectively, and the predicate person/2
that species the grammatical person, expressed as an inte-
ger value, of a given word: person(W,P) is true if and
only if the word W has person P. Using these predicates, one
can enforce that, for the rst verb following any pronoun
(not necessarily immediately after the pronoun), the pronoun
and the verb have to have the same person. For instance, af-
ter the pronoun she the rst following verb has to be in
the third person singular form. This can be expressed as the
following rule:
:- word(W,I,J, ), pronoun(W), person(W,P),
0{ word(U,I,L, ) : verb(U) : L>J : L<K }0,
word(V,I,K, ), verb(V), person(V,Q),
K>J, P!=Q.
Similarly, by specifying the additional predicate verb/1
for each verb in the input data, one can require that the whole
poem should be in past tense:
:- word(W, , , ), verb(W),
not in past tense(W).
Specifying an Aesthetic via Optimization
Up to now, we have only considered hard constraints, and
did not address how to assess the aesthetics of generated po-
ems, or how to generate poems that are maximally aesthetic
by some measures.
In the constraint programming framework, an aesthetic
can be specied using soft constraints. The constraint solver
then attempts to look for poems which maximally satisfy the
soft constraints. In ASP, this is achieved by using optimiza-
tion statements offered by the language.
As concrete examples, we will now explain how Rules
T2T4 can be turned into soft constraints. The soft vari-
ants, Rules T2T4, are shown in Fig. 4, together with the
associated optimization statements O2O4. Taking Rule
T3 as an example, the idea is to introduce a new predi-
cate failed syllable count/1 with the following in-
terpretation: Predicate failed syllable count(I) is
true for row I if and only if the number of syllables on the
row was not made to match the required number. In contrast
to Rule T3, which rules out all solutions of the model imme-
diately in such a case, Rule T3 simply results in assigning
failed syllable count(I) to true. Thus the predi-
cate failed syllable count/1 acts as an indicator of
failing to have the required number of syllables on a specic
row.
The optimization statement associated with Rule T3 is
Rule O3. This minimize statement declares that the num-
ber of rows I for which failed syllable count(I)
is assigned to true should be minimized, or equivalently,
that the numbers of syllables should conform to the required
numbers of syllables for as many rows as possible. The op-
timization variants T2 and T4 and the associated optimiza-
tion statements follow a similar scheme.
Proceedings of the Fourth International Conference on Computational Creativity 2013 165
When multiple such optimization statements are intro-
duced to the model, the relative importance of the statements
is declared using the @i attached to each of the optimization
statement. In the example of Figure 4, the primary objective
is to minimize the number of rhyming failures (specied us-
ing @3). The secondary objective is then to nd, among the
set of poems that minimize this primary objective, a poem
that has a minimal number of lines with a wrong number of
syllables, (using @2), and so forth.
Examples
We will now illustrate the results and effects of some com-
binations of constraints.
In the data generation phase (the specier component) we
use the methodology by Toivanen et al. (2012), including
the Stanford POS-tagger and morpha & morphg inectional
morphological analysis and generation tools (Toutanova et
al. 2003; Minnen, Carroll, and Pearce 2001). The poemtem-
plates are extracted automatically from a corpus of human-
written poetry. The only input by the user is a topic for the
poem, and some other parameters as described below.
As a test case for our current system we study how the ap-
proach manages to produce different types of quatrains. It is
a unit of four lines of poetry; it may either stand alone or be
used as a single stanza within a larger poem. The quatrain is
the most common type of stanza found in traditional English
poetry, and as such is fertile ground on which to test theories
of the rules governing poetry patterns.
The specier component randomly picks a quatrain from
a given corpus of existing poetry. It then automatically anal-
yses its structure, to generate a skeleton for a new poem.
The following poem skeleton is marked with the required
part-of-speech for every word position (PR = pronoun, VB
= verb, PR PS = possessive pronoun, ADJ = adjective, N SG
= singular noun, N PL = plural noun, C = conjunction, ADV
= adverb, DT = determiner, PRE = preposition):
N SG VB, N SG VB, N SG VB!
PR PS ADJ N PL ADJ PRE PR PS N SG:
C ADV, ADV ADV DT N SG PR VB!
DT N SG PRE DT N PL PRE N SG!
The specier component then generates a list of candidate
words for each position. If we give music as the topic of
the poem, the specier specically uses words related to mu-
sic as candidates, where possible (Toivanen et al. 2012). A
large number of poems are possible, in the absense of other
constraints, and the constraint solver in the explorer compo-
nent outputs this one (or any number of alternative ones, if
required):
Music swells, accent practises, traditionalism marches!
Her devote narrations bent in her improvisation:
And then, vivaciously directly a universe
she ventilates!
An anthem in the seasons of radio!
This example does not yet have any specic requirements
for the prosodical form. Traditional poetry often has its
prosodic structure advertised by one or more of several po-
etic devices, with rhyming and alliteration being best-known
of these. Let the specier component hence generate the
additional constraints that the rst and the third line must
rhyme, as well as the second and fourth line. As a result of
this more constrained specication we now get a very simi-
lar poem, but with some words changed to rhyme.
Music swells, accent practises, traditionalism hears!
Her devote narrations bent in her chord:
And then, vivaciously directly a universe
she disappears!
An anthem in the seasons of record!
Addition of this simple constraint adds rhyme to the
poem, which in turn draws attention to the prosodic struc-
ture of the poem. Use of prosodic techniques to advertise the
poetical nature of a given text can also enhance coherence of
the poetry as the elements are linked together more tightly.
For example, a rhyme scheme of ABAB would give the lis-
tener a strong sense that the rst and third as well as the
second and fourth lines belong together as a group, height-
ening the saliency of the alternating structure that may be
present in the content, as well.
The constraint on rhyming reects the intuition that
rhyme works by creating expectation and satisfaction of that
expectation. Upon hearing one line of verse, the listener
expects to hear another line that rhymes with it. Once the
second rhyme is heard, the expectation is fullled, and a
sense of closure is achieved. Similarly, adding constraints
that specify a more sophisticated prosodic structure or con-
tent related aspects may lead to improved quality of the gen-
erated poetry.
Let us conclude this section with an example of an aes-
thetic, an optimization task concerning the prosodic struc-
ture of poetry. Consider composition of lipograms, i.e., po-
ems avoiding a particular letter. (Also univocalism or more
complex optimizations of the occurrence of certain speech
sounds can be composed in a similar fashion.) The follow-
ing poem is an example of a lipogram that avoids the letter
o. As a result of this all words that contained that letter in
the previous example are changed to match the strengthened
constraints:
Music swells, accent practises, theatre hears!
Her delighted epiphanies bent in her universe:
And then, singing directly a universe she disappears!
An anthem in the judgements after verse!
Empirical results of Toivanen et al. (2012) indicate that
in Finnish, already the basic mechanism produces poems of
surprisingly high quality. The sequence of poems above il-
lustrates how their quality can be substantially improved by
relatively simple addition of new, declarative constraints.
Discussion and Conclusions
We have proposed harnessing constraint programming for
composing poetry automatically and exibly in different
styles and forms. We believe constraint programming has
high potential in describing also other creative phenomena.
Proceedings of the Fourth International Conference on Computational Creativity 2013 166
A key benet is the declarativity of this approach: the con-
ceptual space is explicitly specied, and so is the aesthetic,
and both are decoupled from the algorithm for exploring the
search space (an off-the-shelf constraint solver). Due to its
modular nature, the presented approach can be an effective
building block of more sophisticated poetry generation sys-
tems.
An interesting next step for this work is to build an inter-
active poetry composition system which makes use of con-
straint programming in an iterative way. In this approach the
constraint model is rened and re-solved based on user feed-
back. This can be seen as an iterative abstract-renement
process, in which the rst abstraction species a very large
search-space that is iteratively pruned by rening the con-
straint model with more intricate rules that focus search to
the most interesting parts of the conceptual space.
Another promising research direction is to consider a self-
reective creative system. Since the search space and aes-
thetic are expressed in an explicit manner as constraints,
they can also be observed and manipulated. We can envi-
sion a creative system that controls its own constraints. For
instance, after observing that a large amount of good results
is obtained with the current constraints, it may decide to add
new constraints to manipulate its own internal objectives.
Modication of the set of constraints may lead to different
conceptual spaces and eventually to transformational cre-
ativity (Boden 1992). Development of metaheuristics and
learning mechanisms that enable such self-supported behav-
ior is a great challenge indeed.
Acknowledgements
This work has been supported by the Academy of Finland
under grants 118653 (JT,HT), and 132812 and 251170 (MJ).
References
Boden, M. 1992. The Creative Mind. London: Abacus.
Boenn, G.; Brain, M.; vos, M. D.; and Ftch, J. 2011. Au-
tomatic music composition using answer set programming.
Theory and Practice of Logic Programming 11(2-3):397
427.
Colton, S.; Goodwin, J.; and Veale, T. 2012. Full-face poetry
generation. In International Conference on Computational
Creativity, 95102.
Diaz-Agudo, B.; Gerv as, P.; and Gonz alez-Calero, P. A.
2002. Poetry generation in COLIBRI. In ECCBR 2002,
Advances in Case Based Reasoning, 73102.
Gebser, M.; Kaminski, R.; Kaufmann, B.; Os-
trowski, M.; Schaub, T.; and Thiele, S. 2008. A
users guide to gringo, clasp, clingo, and iclingo.
http://downloads.sourceforge.net/
potassco/guide.pdf?use_mirror=.
Gebser, M.; Kaufmann, B.; and Schaub, T. 2012. Conict-
driven answer set solving: From theory to practice. Articial
Intelligence 187:5289.
Gelfond, M., and Lifschitz, V. 1988. The stable model
semantics for logic programming. In Logic Programming,
Proceedings of the Fifth International Conference and Sym-
posium, 10701080.
Gerv as, P. 2001. An expert system for the composition of
formal spanish poetry. Journal of Knowledge-Based Systems
14(34):181188.
Langkilde, I., and Knight, K. 1998. The practical value of
n-grams in generation. In Proceedings of the International
Natural Language Generation Workshop, 248255.
Lierler, Y., and Sch uller, P. 2012. Parsing combinatory cat-
egorial grammar via planning in answer set programming.
In Erdem, E.; Lee, J.; Lierler, Y.; and Pearce, D., eds., Cor-
rect Reasoning, volume 7265 of Lecture Notes in Computer
Science, 436453. Springer.
Manurung, H. M.; Ritchie, G.; and Thompson, H. 2000.
Towards a computational model of poetry generation. In
Proceedings of AISB Symposium on Creative and Cultural
Aspects and Applications of AI and Cognitive Science, 79
86.
Manurung, H. 2003. An evolutionary algorithm approach
to poetry generation. Ph.D. Dissertation, University of Ed-
inburgh, Edinburgh, United Kingdom.
Minnen, G.; Carroll, J.; and Pearce, D. 2001. Applied mor-
phological processing of English. Natural Language Engi-
neering 7(3):207223.
Netzer, Y.; Gabay, D.; Goldberg, Y.; and Elhadad, M. 2009.
Gaiku : Generating haiku with word associations norms.
In Proceedings of NAACL Workshop on Computational Ap-
proaches to Linguistic Creativity, 3239.
Niemel a, I. 1999. Logic programs with stable model se-
mantics as a constraint programming paradigm. Annals of
Mathematics and Articial Intelligence 25(3-4):241273.
Roads, C. 1980. Interview with Marvin Minsky. Computer
Music Journal 4.
Simons, P.; Niemel a, I.; and Soininen, T. 2002. Extend-
ing and implementing the stable model semantics. Articial
Intelligence 138(1-2):181234.
Stravinsky, I. 1947. Poetics of Music. Cambridge, MA:
Harvard University Press.
Toivanen, J. M.; Toivonen, H.; Valitutti, A.; and Gross, O.
2012. Corpus-based generation of content and form in po-
etry. In International Conference on Computational Cre-
ativity, 175179.
Toutanova, K.; Klein, D.; Manning, C.; and Singer, Y.
2003. Feature-rich part-of-speech tagging with a cyclic de-
pendency network. In Proceedings of HLT-NAACL, Human
Language Technology Conference of the North American
Chapter of the Association for Computational Linguistics,
252259.
Wong, M. T., and Chun, A. H. W. 2008. Automatic haiku
generation using VSM. In Proceedings of ACACOS, The 7th
WSEAS International Conference on Applied Computer and
Applied Computational Science, 318323.
Proceedings of the Fourth International Conference on Computational Creativity 2013 167
Slant: A Blackboard System to Generate
Plot, Figuration, and Narrative Discourse Aspects of Stories
Nick Montfort
The Trope Tank, MIT
77 Mass Ave, 14N-233
Cambridge, MA 02139 USA
nickmnickm.com
Rafael Prez y Prez
Division de Ciencias de la
Comunicacion y Diseo
Universidad Autonoma
Metropolitana, Cuajimalpa,
Mexico D. F.
rperezcorreo.cua.uam.mx
D. Fox Harrell
Imagination, Computation, &
Expression Laboratory, MIT
77 Mass Ave, 14N-207
Cambridge, MA 02139 USA
Iox.harrellmit.edu
Andrew Campana
Department oI East Asian
Languages & Civilizations
Harvard University
Cambridge, MA 02138
USA
campanaIas.harvard.edu
Abstract
We introduce Slant, a system that integrates more than a
decade oI research into computational creativity, and
speciIically story generation, by connecting subsystems
that deal with plot, Iiguration, and the narrative discourse
using a blackboard. The process oI integrating these sys-
tems highlights diIIerences in the representation oI story
and has led to a better understanding oI how story can be
useIully abstracted. The plot generator MEXICA and a
component oI Curveship are used with little modiIication
in Slant, while the Iiguration subsystem Fig-S and the tem-
plate generator GRIOT-Gen, inspired by GRIOT, are also
components. The development oI the new subsystem
Verso, which deals with genre, shows how diIIerent genres
can be computationally modeled and applied to in-develop-
ment stories to generate results that are surprising in terms
oI their connections and valuable in terms oI their relation-
ship to cultural questions. Example stories are discussed, as
is the potential oI the system to allow Ior broader collabo-
ration, the empirical testing oI how subsystems interrelate,
and possible contributions in literary and artistic contexts.
Introduction
Slant is a system Ior creative story generation that integrates
diIIerent types oI expertise and creativity; the Iramework it
provides also means that other systems, implementing other
approaches to story generation, can be integrated into it in
the Iuture. The development oI Slant has involved Iormaliz-
ing, reworking, and testing ideas about creative storytelling
and what is important to writing storiesspeciIically, the
poetics oI Iiguration, the poetics oI plot development, and
the poetics oI narrating. The system incorporates a new per-
spective on genre and integrates components Irom three ex-
isting systems: D. Fox Harrell`s GRIOT, RaIael Perez y
Perez`s MEXICA, and Nick MontIort`s Curveship.
Story generation systems have not yet used an archi-
tecture oI this sort to encapsulate diIIerent expertise and diI-
Ierent aspects oI creativity; nor have they incorporated ma-
jor components that are based on existing, proven systems
by diIIerent researchers.
Slant is a blackboard system in which diIIerent sub-
systems, each oI them inIormed by and modeling humanis-
tic theories, collaborate together, working incrementally to
Iully speciIy a story. An alternative, simpler process in-
volves making decisions in a 'pipeline, in which one sys-
tem oIIers, Ior instance, a plot and another system deter-
mines how the narrative discourse will be arranged. Al-
though this system seems to be a poor model oI human cre-
ativity, it is a reasonable Iirst step toward a 'blackboard
system. Two oI the Slant collaborators previously developed
such a pipelined system with two stages (MontIort and
Perez y Perez 2008). The current project involves Iive major
subsystems rather than two and uses a blackboard architec-
ture, allowing any oI the subsystems that work during the
main phase oI generation to augment the story representa-
tion at any point.
The generation oI stories in Slant begins with mini-
mal, partial proposals Irom a simple unit, the Seeder. In
turn, the subsystems MEXICA, Verso, and Fig-S read and
add to this set oI proposals, each according to its Iocus.
When the proposals are complete, the Iinished story speciIi-
cation is sent to GRIOT-Gen so conceptual blending can be
applied to the relevant templates and then to the three-stage
pipelined text generation component oI Curveship. Curve-
ship-Gen realizes a Iinished story in the Iorm oI a text Iile
that can be read and considered by human readers.
This paper introduces the architecture oI the system and
describes the subsystems that build and realize stories to-
gether. It includes a discussion oI what was learned by inte-
Proceedings of the Fourth International Conference on Computational Creativity 2013 168
grating three diIIerent lines oI research on story generation.
ReIlections are also oIIered on the Iirst set oI stories pro-
duced by the system, and some discussion oI the potential oI
the system is included as well. Slant will undergo more re-
Iinement and development, but the work that has been done
so Iar is oI relevance to those working to implement large-
scale computational creativity systems that integrate hetero-
geneous subsystems, to those developing representations oI
story and other creative representations, and to those work-
ing speciIically in story generation.
Creativity and the Architecture of Slant
Boden holds that creativity involves the production oI new,
surprising, and valuable results (Boden 2004). In the case oI
story generation and other literary endeavors, being new in-
volves not repeating what has been done beIore (by the sys-
tem or in the wider culture); surprise oIten maniIests itselI
in unusual juxtapositions that are eIIective, though one
would not have guessed it; and value, rather than indicating
that the story is oI didactic or economic value, means that a
story accomplishes some imaginative or poetic purposeit
connects in some way to cultural or psychological issues or
questions and allows the reader to think about them in new
ways. Stories that surprise readers by bringing unusual ele-
ments together and which provide Ior this sort oI reIlection,
but which do so in the same way as existing stories, are not
new. Stories that are innovative and could allow Ior reIlec-
tion, but which do not involve unusual juxtapositions or
connections, are not surprising. Stories that are Iresh and in-
volve unusual combinations oI elements, but do not ulti-
mately seem to have a point oI any sort, are not oI value.
Taking value to indicate relevance within culture means
that the value oI a story is similar to what has been called,
with regard to conversational stories oI the sort that are ut-
tered all the time by people, its 'point (Polanyi 1989).
While the point oI a story is understood in the context oI a
speciIic conversation, the ability oI a story to have a point at
all can be understood within the context oI culture. Valuable
stories are those that have a point to at least some readers
when they encounter them in some context.
Beyond Boden`s three components oI creativity, we also
consider a higher level oI creativity. Namely, the various
cognitive processes Ior conceptualization that enable people
to recognize and generate new, surprising, and valuable cul-
tural content are Iorms oI everyday creativity. Cognitive sci-
entist Gilles Fauconnier has referred to these process of
meaning construction as backstage cognition and asserts
that backstage cognition includes specific phenomena such
as viewpoints and reference points, figure-ground/profile-
bases/landmark-trajector organization, metaphorical, ana-
logical, and other mappings, idealized models, framing,
construal, mental spaces, counterpart connections, roles,
prototypes, metonymy, polysemy, conceptual blending, fic-
tive motion, [and] force dynamics (Fauconnier 1999).
These cognitive processes are especially important to note
here because the notion of creativity informing Fig-S and
GRIOT-Gen is based on a model of the creative backstage
cognition phenomenon of metaphorical mapping, most
prominently, but also mental spaces, counterpart connec-
tions, metaphor, analogy, and metonymy in the case of the
GRIOT system that inspired them.
To succeed repeatedly and reliably at creativity, a story-
telling system must have mechanisms relevant to each oI
these aspects oI creativity. It must have some model oI what
has happened beIore to prompt novelty, somehow provide
Ior stories that join aspects together in unusual and eIIective
ways, and somehow provide Ior stories that relate to culture
and have a point. The means oI accomplishing these aspects
oI creativity do not have to be abstracted into separate com-
ponents oI a system, but they do need to be somehow real-
ized by a creative system.
A simple way that systems can connect and to some ex-
tent collaborate involves organizing them in a pipeline. This
can model a regimented assembly-line process or
'waterIall model in which each subsystem participates in
one phase and interIaces only with the systems beIore and
aIter it. For certain processes, this may be adequate, but Ior
the nuanced process oI creativity, which involves making
interesting connections, the components oI a system proba-
bly need to interact in a less constrained and unidirectional
manner. This was the rationale Ior the blackboard architec-
ture used in Slant.
The Blackboard and Subsystems
In Slant, the three major story-building subsystems can
write to and read Irom a blackboard representation oI the
story in progress. Currently, the systems Iunction in practice
much as a pipeline does, with each oI the three subsystems
augmenting the story representation once. The systems can
inIluence each other 'backwards only via Verso examining
the current plot and proposing a new action (not just a speci-
Iication oI narrative discourse, which is always proposed.)
MEXICA can then incorporate that expanded plot into its
next ER cycle that it uses to elaborate the plot. Although the
interactions between subsystems are not intricate at this
Figure 1: The architecture oI Slant.
Proceedings of the Fourth International Conference on Computational Creativity 2013 169
point, the Iramework is in place Ior more elaborate black-
board interaction in Iuture versions oI Slant.
Currently, MEXICA contributes an initial, partial plot a
minimal, random one will eventually be provided at the Iirst
step by the Seeder. Then, Verso assigns a genre and a speci-
Iication oI the narrative discourse, and MEXCIA Iurther
elaborates the plot until it is complete. Verso may speciIy
constraints on how the story is to be developed. For in-
stance, it may speciIy that a particular character, who has
been designated as the narrator oI the story, should not die.
MEXICA will respect these in elaborating the story. Finally,
Fig-S determines what Iiguration will be used. Eventually,
another system, the Harvester, will check to see iI all as-
pects oI the story are complete, allowing the subsystems to
augment the story in several diIIerent orders. AIter the story
representation is complete, it is realized. GRIOT-Gen deter-
mines how to realize Iigurative representations and Curve-
ship-Gen does content selection, microplanning, and surIace
realization to produce the Iinal text.
The MEXICA subsystem has the most explicit model oI
an aspect oI creativity; it explicitly evaluates the novelty and
interestingness oI the component oI story that it develops,
the plot. Verso and Fig-S both aim to add surprise by com-
bining conventional genres and metaphors in unusual ways.
They do not currently measure how surprising their results
are, but they embody techniques Ior choosing appropriate
combinations that may be seen as creative by readers.
Foundational Systems
MEXICA. This system generates plots or Irameworks Ior
short stories about the Mexicas, the old inhabitants oI what
today is Mexico city, also known as the Aztecs. MEXICA`s
process is based on the engagement/reIlection cycle, a cog-
nitive account oI writing by Mike Sharples (Perez y Perez
and Sharples 1999, 2001, 2004). During engagement the
system Iocuses on generating sequences oI actions driven by
content and rhetorical constraints and avoids the use oI ex-
plicit goals or predeIined story-structures. During reIlection
MEXICA evaluates the novelty and interestingness oI the
material produced so Iar and veriIies the coherence oI the
story (see also Perez y Perez et al. 2011).
The design oI the system is based on structures known as
Linguistic Representations oI Actions (LIRAs), which are
sets oI actions that any character can perIorm in the story
and whose consequences produce some change in the story-
world context. There are two types oI possible pre-condi-
tions and postconditions in MEXICA: emotional links be-
tween characters and dramatic tensions in the story.
MEXICA is incorporated as the generator oI plot. It gen-
erates plot in stages, allowing other systems to interact with
the story representation as it does so. In the current system,
it can be inIluenced by actions added to the story by Verso.
GRIOT. This is a system that is the basis Ior interactive and
generative text and multimedia works using Harrell`s Alloy
algorithm Ior conceptual blending. These works include po-
etic, animated, and documentary systems that themselves
produce diIIerent output each time they are run. While
GRIOT allows authors to implement narrative and poetic
structures (e.g., plots), a major contribution oI GRIOT is its
orientation toward the dynamic generation oI content result-
ing Irom modeling aspects oI Iigurative thought that can be
described Iormally. That is, GRIOT allows authors to Iix el-
ements such as narrative structure while varying output in
terms oI theme, metaphor, emotional tone, and related types
oI what is here called 'Iiguration (results oI Iigurative
thought).
Rather than being based on a single knowledge base or
ontology, as is the case with many classic AI systems,
GRIOT creates blends between diIIerent ontologies (Harrell
2006, 2007). Indeed, a key Ieature oI GRIOT is the ability
oI authors to construct subjective ontologies based in spe-
ciIic authorial worldviews, elements oI which are then
blended in a manner that maintains coherence based on sev-
eral Iormal optimality principles inspired by a subset oI
those proposed by Gilles Fauconnier and Mark Turner
(1999). This approach allows Ior novel, surprising, and
valuable content to be generated that retains conceptual co-
herence. GRIOT, like MEXICA, has also been used to im-
plement cultural Iorms oI narrative that are not oIten privi-
leged in computer science, in this case oral traditions oI nar-
rative Irom the AIrican diaspora (Harrell 2007a). This is im-
portant because some Iorms oI oral narrative have more in
common with narratives in virtual worlds than the grapho-
centric (text-biased) Iorms oI narrative privileged in most
research in the Iield oI narratology in literary studies.
The implemented GRIOT system, and experience with it,
have inIormed the development oI Fig-S, a component oI
Slant that proposes what types oI Iiguration, mainly
metaphor, will be used in telling the story. GRIOT also in-
spires GRIOT-Gen, the component that generates natural
language representations Ior Iiguratively enriched versions
oI particular actions aIter the story representation is com-
pletely developed (see also Goguen and Harrell 2008).
Curveship. This is an interactive Iiction system that pro-
vides a world model (oI characters, objects, locations, and
things that happen) while also modeling the narrative dis-
course, so that the narration and description oI the simulated
world can change (MontIort 2009, 2011). Curveship can tell
events out oI order, using Ilashback and other techniques,
and can tell the story Irom the standpoint oI particular char-
acters and their perceptions and understandings. It is based
on Genette`s theories (Genette 1983) and incorporates other
ideas Irom narratology. The architecture oI Curveship draws
on well-established techniques Ior simulating an IF world,
separating these Irom the subsystem Ior narrating, which in-
Proceedings of the Fourth International Conference on Computational Creativity 2013 170
cludes a standard three-stage natural language generation
pipeline. To make use oI the system, either Ior interactive
Iiction authoring or story generation, one speciIies high-
level narrative aspects; the system does appropriate content
selection, works out grammatical speciIics, and realizes the
text with, Ior instance, proper verb Iormation.
Some world simulation abilities and the narrative text
generation capabilities oI Curveship are used directly in
Slant in Curveship-Gen, the component that outputs the Iin-
ished, realized story.
The Slantstory XML Format
Connecting diIIerent systems so that they can work together
means establishing shared representations. For Slant, that
representation is an XML Iormat called Slantstory. It con-
tain all oI the inIormation that is needed in the Iinal steps to
represent each action and realize the story, meaning that it
must contain suIIiciently granular inIormation about the
plot, the narrative discourse, and the types oI conceptual
blending that are to be done. This inIormation is not only
needed at the last stage, where the generation oI text is done.
It can also be read by the diIIerent subsystems during story
generation, when the story is not yet complete, and can in-
Iluence the next stage oI story augmentation. Because oI
this, Slantstory is a Iormat not only Ior representing entire,
complete stories but also Ior representing partial stories, the
composition oI which is in progress. In the current imple-
mentation, subsystems can augment a story and declare it
complete, but cannot revise or remove what has already
been contributed.
To declare a common representation Ior (both partial and
complete) stories, an agreement had to be reached between
diIIerent perspectives on what the elements oI a story are,
what is to be represented about each, and how granular the
representation oI each element is. The Slantstory DTD spec-
iIies Iive elements that occur within the root:
<!ELEMENT slantstory
(existents, actions, spin?, genre?, figuration?)>
A story cannot be complete without all Iive oI these present,
but only existents and actions are required at every stage oI
story development. The existents are oI three types: loca-
tions, characters, and things. Actions each have a verb
(which might be a phrase such as 'try to Ilee) and may
have any or all oI agent, direct object, and indirect object
speciIied. The 'instantaneous tension level, or change in
the tension associated with an action, is also represented
there. The actions also have a unique ID number which indi-
cates their chronological order in the story world, as in:
<action verb="cure" agent="virgin" direct="enemy"
indirect="curative plant" location="Texcoco Lake"
tension="0" id="42" />
One challenge in developing and using this blackboard rep-
resentation involves the diIIerent models oI existents and
actions that the three Ioundational systems use. Characters
and locations, but nothing like props or 'things, are repre-
sented in MEXICA, while Curveship represents all three
sorts oI existents to provide the type oI simulation that is
typical in interactive Iiction, where objects can typically be
acquired, given to other characters, placed on surIaces and
in containers, and so on. MEXICA was modiIied Ior use in
Slant to produce appropriate representations oI whatever
things were mentioned in actions.
The representation oI action was also not consistent be-
tween the Ioundational systems. Curveship has a typology
oI Iour actions: ConIigure (move some existent into, onto,
out oI, oII, or to a diIIerent location), ModiIy (change the
state oI some existent), Sense (gain inIormation about the
world Irom sensing), and Behave (any other action, not re-
sulting in any change oI state in the world). Although they
may be quite diIIerent, all actions are meant to correspond
to a sentence with a single verb phrase when realized.
MEXICA`s actions, on the other hand, are not categorized
in this way and include many diIIerent sorts oI representa-
tions. There are, Ior instance, complex actions such as
FAKEDSTABINSTEADHURTHIMSELF, indications
that an action was not taken such as NOTCURE, and indi-
cations that a state is to be described at a certain point such
as WASBROTHEROF.
The Iirst oI these issues, the granularity oI action, was
handled by developing a mapping between MEXICA ac-
tions and Slantstory actions. A limitation oI this approach is
that actions cannot be inserted into the middle oI a series oI
Slantstory actions that correspond to a single MEXICA ac-
tion; this is enIorced by giving the actions consecutive IDs,
so that there is no room to add Iurther actions. Ideally, how-
ever, other subsystems would be able to modiIy the
Slantstory representation oI actions in any way. The second
oI these issues bring up the interesting issue oI disnarration
(Prince 1988), that it is possible in a story to not only tell
what has happened but to also tell what what did not hap-
pen, and that doing so can have an interesting eIIect on the
reader. Disnarration is not the representation oI action, how-
ever, so it cannot be represented in a straightIorward way in
a list oI actions, and should be handled elsewherein the
spin element, Ior instance. Resolving the Iinal issue related
to stative inIormation also requires Iurther work, since the
system should both represent Iacts about the story world
(probably in the existents element) and when to mention
them (probably in the spin element).
GRIOT transIorms, Ior instance, the 'agent and 'direct
attributes oI an action into conceptual categories. While
Slantstory uses a grammatical-sounding model oI actions,
with direct and indirect objects, Curveship can in Iact real-
ize sentences out oI these where the agent is the direct ob-
ject and the 'direct existent is the subjectwhen it realizes
a sentence in the passive, Ior instance. So, both GRIOT and
Curveship treat the seemingly grammatical attributes oI ac-
Proceedings of the Fourth International Conference on Computational Creativity 2013 171
tion in slightly diIIerent ways.
Furthermore, the templates that are used to represent sen-
tences in Curveship, which is designed Ior narrative varia-
tion, are not well-designed Ior the generation oI Iigurative
text. Curveship`s templates are set up to allow a slot Ior an
agent, Ior example, which might eventually be Iilled with
'the jaguar knight 'I 'he or 'you depending upon how
narrator and narratee are set and whether the noun phrase is
pronominalized. Fig-S, however, may determine that the ad-
jective 'enIlamed should be used with this noun phrase be-
cause it will participate in the conventional metaphor LOVE
IS FIRE. In this case, Curveship-Gen should generate either
'the enIlamed jaguar knight 'I, enIlamed, 'he, enIlamed,
or 'you, enIlamed. All the possibilities Ior combinations oI
Iiguration (not just the use oI an adjective) and all the exist-
ing ways that Curveship can generate noun phrases need to
be implemented in the next version oI Slant.
Verso: Augmenting a Story Based on Genre
Verso, like MEXICA and Fig-S, reads a Slantstory XML
Iile Irom the blackboard and outputs an updated one. While
MEXICA is Iocused on plot and Fig-S selects an appropri-
ate domain Ior blending particular representations oI action,
Verso`s operation is based on a model genre. This subsys-
tem operates by:
1. Detecting particular aspects oI the in-progress story
(typically actions with particular verbs, although
possibly series oI actions or sets oI characters) that
indicate the story`s suitability to a particular genre,
Ior all known genres.
2. Selecting the genre that is most appropriate.
3. Updating the story using rules speciIic to that
genre. The narrative discourse is always updated
by speciIying attributes oI and elements within
'spin. This determines elements such as the Iocal-
izer, narrator, time oI narrating, rhetorical style,
and beginning and/or ending phrases to Irame the
story. The update can also contribute new actions
to the story, which can inIluence the way that
MEXICA continues to develop the plot.
This procedure brings a model oI genre awareness into
Slant, but it is an unusual process Irom the standpoint oI
conventional human creativity. More oIten than not, an au-
thor chooses a genre and then writes or tells something
within it, rather than begin with a partial story and Iinding a
genre that suits it. The overall eIIect, however, is to intro-
duce sensitivity to an important aspect oI human creativity.
Verso`s model does not seem completely aligned with the
direction oI genre studies in recent decades. This Iield has
moved Irom a Iormalist deIinitional Iramework oI genre to
one that is semiotic, Iocusing in particular on the 'rhetorical
study oI the generic actions oI everyday readers and
writers (Devitt 2008). Recently, genre studies has deem-
phasized and argued against the idea oI genres as distinct
categories with characteristic elements that identiIy them.
Scholars now dispute the idea that characteristics can be
identiIied and summed up to indicate the likelihood that a
text is part oI a certain genre. They note that Iew genres
have true Iundamental elements. Particularly in the case oI
literary genres (e.g. detective Iiction, science Iiction, horror,
Iantasy), even when there seem to be some core characteris-
tics that all works within a category share, almost any
'deIining characteristic could be countered by an example
work which lacks that element but is still undeniably oI that
genre. Furthermore, a Iundamental dilemma arises in the act
oI classiIication itselI, the problem oI 'whether these units
exist independently oI the taxonomical scheme, or arise as a
result oI the attempt to classiIy (Ryan 1981).
However, these recent concerns pertain most directly to
scholarly and critical work; they do not bear upon the way
genre is used in literary creativity. Sharp deIinitions oI
genre that are developed through writing practice have
served many authors well, including Raymond Queneau,
who used 99 diIIerent genres, modes, or styles to retell the
same simple story in Exercises in Stvle. The problem oI
whether classiIication compels texts into categories is a
problem Ior analysis, but it is a productive idea Ior literary
creativity. Additionally, as Steve Neale has pointed out,
'genres are instances oI repetition and diIIerence; it is pre-
cisely through the diIIerentiation Irom the established norms
oI a genre that a work can become part oI it (Neale 1984).
Verso, while making use oI those 'instances oI repetition,
also aims to eIIectively model the production oI this neces-
sary diIIerence.
The genres that have been implemented so Iar are not lit-
erary, either in the sense oI broad diIIerentiations such as
'prose and 'poetry, or in the sense oI categories such as
'romance, 'cyberpunk, 'noir, and so on. Instead, Verso
uses a broader deIinition oI what constitutes genre, one
which includes categories that may very well be alterna-
tively thought oI as styles, modes, or even distinct media,
and which relate to both Iiction and non-Iiction as well as to
oral and written communication. In the introduction to Writ-
ing Genres, Devitt provides many examples oI the inIluence
oI genre in our daily lives, including such wide-ranging cat -
egories as the joke, lecture, mystery novel, travel brochure,
small talk, sales letter, and, most appropriately, the research
paper (Devitt 2008). It is this broader conception oI genre,
rather than a strictly literary one, that Verso aims to model.
The genres implemented in Verso tend towards the stylis-
tic rather than the thematic. In part due to the pre-existing
capabilities oI Curveship, and in part because oI the domain
in which MEXICA operates, the genres used are those that
can be identiIied and produced through changes in the narra-
tive discourse (Iocalization, time oI narrating, order oI
events in the telling, etc.) rather than the story world domain
(which could incorporate dragons, spaceships, magic, etc.).
A concrete example is provided by the 'conIession
Proceedings of the Fourth International Conference on Computational Creativity 2013 172
genre, which casts a story so that it sounds like it is being
told to a priest at conIession. To determine iI this genre is
applicable, the system checks to see iI one or more actions
are likely 'sins (robbing, killing, etc.) based on a list oI
these. Each 'sin raises the suitability oI this genre. II 'con-
Iession is selected as the genre to use, the Slantstory XML
representation is updated. A 'sinner is locatedthe agent
oI the last sinIul action. This sinner is speciIied as the narra-
tor (the 'I oI the story). There is no narratee (or 'you),
since we presume that the priest was not part oI the events
that were being told. The time oI narrating is set to 'aIter,
which results in past-tense narration, and the 'hesitant style
is used, injecting 'um and 'er into the story as iI the
speaker were nervous and reticent. Finally, a conventional
opening ('Forgive me, Father, Ior I have sinned. It has been
a month since my last conIession.) and a conventional con-
clusion ('Ten Hail Marys? Thank you, Father.) are added.
The 'conIession genre produces plausible and amusing
results. Some oI this has to do with the Iormulaic nature oI
the genre. As one reads additional conIessions, the rigid,
repetitive opening and conclusion can be amusing, because
they model the ritualized interaction oI conIession. Read in
this light, it is only more amusing that ten Hail Marys are al-
ways given Ior penance, whether the penitent tried to swipe
something or committed a murder. Finally, because Spanish
conquerors came to the Americas and imposed Catholicism
on the natives, MEXICA-generated plots that are told in this
genre can be read as a comment upon, or at least a provoca-
tion about, the colonial history oI Mexico. Importantly, these
two subsystems did not invent this juxtaposition oI the
MEXICA and Catholic ritual; rather, humans decided many
years go to develop a story generator about the Mexica and
decided recently to develop a 'conIession genre template.
However, the subsystems` collaboration as part oI Slant in-
volves automatically Iinding occasions when the juxtaposi-
tion oI these two is particularly eIIective. Verso`s work and
MEXICA`s work combine in Slant to provide more cultural
resonance, to be more surprising and also to be more valu-
able by virtue oI being thought-provoking.
In the current system 10 genres have been implemented:
conIession, diary, dream, Iragments, hangover, joke, letter,
memento, memoir, play-by-play, prophecy, and the deIault
'standard story. These take advantage oI only a limited
range oI Curveship`s narrative variation capabilities. For in-
stance, the Iocalization oI a story can be varied, but we have
not yet implemented genres that Iocalize stories based on
particular characters; similarly, Curveship is already capable
oI narrating with Ilashbacks and making other more elabo-
rate changes in order. There are now only two prose styles
that are used, 'excited Ior play-by-play and 'hesitant Ior
conIession. It would also be straightIorward to elaborate the
Slantstory representation and to modiIy Curveship-Gen to
allow Ior expression that better relates to a wider variety oI
genres. In discussions so Iar we have already listed more
than 100 genres, most oI which we believe will be to some
extent recognizable and applicable to the short stories pro-
duced by Slant.
Fig-S and GRIOT-Gen for Figuration
Fig-S reads a Slantstory XML Iile Irom the blackboard and
updates it to include metaphorical content. Metaphor here
can be understood as an asymmetrical conceptual blend in
which all content Irom one domain called the 'target space
is integrated with a subset oI content Irom another called the
'source space (Grady, Oakley, and Coulson 1999). Fig-S
currently implements ontologies representing several do-
mains empirically identiIied as important in poetry such as
'death and 'love (LakoII and Turner 1989) that can be
used to generate metaphors such as REJECTION IS
DEATH or ADMIRATION IS LOVE.
Fig-S begins by processing each oI the actions Irom the
Slantstory XML Iile to assess whether they will be replaced
by metaphorical versions oI the same action. Currently,
there are two modes in which this processing can be done. II
ONE-METAPHOR is set to true, then the Slantstory is ana-
lyzed to Iind which single source domain is appropriate to
map onto the greatest number oI actions in order to produce
metaphors. Otherwise, each action will be analyzed individ-
ually in order to Iind an appropriate source domain to map
onto it. The Iirst mode typically results in more coherent
output, the second mode typically results in a greater degree
and variety oI metaphorical output. As an example oI an ac-
tion that has been mapped onto by the source domain LOVE
in order to produce a metaphorical action, the Slantstory ac-
tion:
<action agent="virgin" direct="princess" id="61"
location="Texcoco Lake" tension="40" verb="get
jealous of" />
could be processed by Fig-S and added to the Slantstory as:
<figuration domain="fire">
<blend id="61" verb="get jealous of/burn for"
agent="virgin/burning one*agent"
direct="princess/hot*direct">
</figuration>
While Fig-S currently has implemented simple, metaphori-
cal Iorm oI blending as a Iirst step, it could be extended to
use a more robust blending algorithm such as Alloy, or even
to extend Alloy to result in even more novel, surprising,
and/or culturally valued blends using an extended set oI op-
timality principles.
GRIOT-Gen is used to produce speciIic output template
Irom metaphorical actions in a Curveship-Gen Iormat. For
example, the metaphorical action above could be realized in
a number oI ways. The deIault produced by GRIOT-Gen,
Ior a story in which neither virgin nor princess are narrator
or narratee, would be structured as:
Proceedings of the Fourth International Conference on Computational Creativity 2013 173
'61': 'the burning virgin [become/v] jealous-of
the incendiary princess',
however, it can alternatively be structured as:
'61': '[@virgin/s] like burning [get/v] jealous
of the incendiary [princess/o]',
iI there is a preIerence Ior a simile-oriented style Ior the
subject. It is also possible to use a 'source-element/target-el-
ement structure as in:
'61': 'the burning/virgin [get/v] jealous of and
[burn/v] for the incendiary/princess'
to be very explicit about every element that has been inte-
grated. GRIOT-Gen currently has multiple such exposition
Iorms implemented and is easily extensible.
Slant`s First Stories
In the current system some spin (narrative discourse speciIi-
cation) is necessary, although it may simply involve the de-
Iault settings, while Iigurative action representations are op-
tional. To begin with, this amusing but Ilawed story was
generated without Iiguration, but with contributions Irom
MEXICA and Verso:
Forgive me, Father, Ior I have sinned. It has been a month
since my last conIession. An enemy slid. The enemy Iell.
The enemy injured himselI. I located a curative plant. I
cured the enemy with the curative plant. The tlatoani kid-
napped me. The enemy sought the tlatoani. The enemy
travelled. The enemy, um, looked. The enemy Iound the
tlatoani. The enemy observed, uh, the tlatoani. The enemy
drew a weapon. The enemy attacked the tlatoani. The en-
emy killed the tlatoani with a dagger. The enemy rescued
me. The enemy entranced, uh, me. I became jealous oI the
enemy. I killed the enemy with the dagger. I killed myselI,
uh, with the dagger. Ten Hail Marys? Thank you, Father.
The 'sinner who narrates the story dies, a problem which
can also crop up when the 'diary genre issued. Since Verso
can assign the genre oI the story beIore the plot is complete,
there was initially no way that Verso be sure that the charac-
ter it selects as narrator will not die. This requires an inter-
action between the genre-selecting system, Verso, and the
plot-generating system, MEXIA. We implemented an addi-
tional set oI constraints on how the plotting could be done
which either require or prohibit that a certain tension, as de-
Iined in MEXICA, arise. One oI these tensions is 'actor
dead, letting Verso prohibit a narrator`s death.
A story with Iiguration Iollows. This one is generated
without the constraint Ior a single conventional metaphor to
be used (ONE-METAPHOR is Ialse), so there is a colorIul
diversity oI less consistent metaphors. The genre chosen is
'play-by-play, based on sports commentary, which may be
a suitable one Ior the range oI metaphor that is used:
This is Ehecatl, live Irom the scene. The cold-wind eagle
knight is despising the icy jaguar knight! The cold-wind
jaguar knight is despising the chilling eagle knight! Yes, an
eagle knight is Iighting a jaguar knight! Look at this, the
eagle knight is drawing a weapon! Look at this, the eagle
knight is closing on the jaguar knight! The gardener eagle
knight is wounding the weed jaguar knight! And now, the
jaguar knight is bleeding! Yes, the consumed eagle-knight
is panicking! And, eagle knight is hiding! Holy -- the
snowIlake slave is despising the chilling jaguar knight! The
Ireezing-wind jaguar knight is despising the cold slave!
And, yes, the cold-wind slave is detesting the chilling
jaguar knight! A slave is curing the jaguar knight! And, the
slave is returning to the city! And, the jaguar knight is suI-
Iering! The Irozen jaguar knight is dying! Back to you!
MEXICA`s stative descriptions oI characters could probably
be mentioned more rapidly, or perhaps not at all, to keep the
action going. This could be done with an existing Iacility in
Slantstory Ior omitting actions when narrating. This story
would also beneIit Irom pronominalization, which Curve-
ship-Gen is capable oI but which would need to be either
turned on Ior all stories or speciIied at an earlier stage.
Slant`s Research Potential
We plan to Iurther develop the system we have initiated to
explore new ways that computational creativity researchers
can collaborate, new models oI storytelling that abstract diI-
Ierent sorts oI expertise and emphasis, and new ways to
compare the importance oI and interaction between diIIerent
aspects oI story. We intend that the system will be used Ior
empirical studies oI how people receive generated stories
and will also be brought into literary and artistic contexts.
Using the Slantstory XML blackboard, many diIIerent
subsystems can be developed Ior Slant, which will allow
Slant to be run with any subset oI them. For instance, iI
Verso is turned oII so that the speciIication oI the narrative
discourse is not done by that subsystem, either a deIault nar-
rative discourse speciIication could be used (as would be the
case now, since Verso is the only subsystem that updates
this aspect) or that speciIication can be built up by one or
more other subsystems. This allows the eIIect oI each sub-
system, in the context oI Slant overall, to be careIully exam-
ined. Readers oI stories generated under diIIerent conditions
could be asked not only to rank the outputs in terms oI qual-
ity, but also to comment on what they thought about particu-
lar elements (such as characters) and high-level qualities
(whether the story was Iunny, Ior instance, or whether it
seemed plausible).
The project can also Iacilitate a broader collaboration be-
tween researchers oI story generation. As long as re-
searchers Iind the Slantstory XML representation adequate
Ior their purpose, they can develop new subsystems that
help to build stories based on other theories or concerns. For
instance, a researcher interested in how creativity occurs in
social contexts could model the process in a unit that reads
Irom and writes to the blackboard and models social inIlu-
Proceedings of the Fourth International Conference on Computational Creativity 2013 174
ence and awareness. As just discussed, this new system
could be tried in many combinations with existing systems
and the outputs could be compared. This would help to
show not only the importance oI social creativity as mod-
eled in this particular subsystem, but also how creativity oI
this sort interacts with plot generation using the engage-
ment-reIlection cycle, Iiguration based on conventional
metaphors, and awareness oI genre.
We also anticipate that Slant will supply stories Ior exhi-
bition and publication in arts contexts, and the Iunctional
system itselI could be part oI a digital media, electronic lit-
erature, or e-poetry exhibit. In this way, Slant can contribute
to creative practice, and reactions and discussion in this con-
text can help us Iurther develop a system that relates to con-
temporary literary concerns.
Acknowledgements
Thanks to Clara Fernandez-Vara and Ayse Gursoy Ior their
discussions oI genre and oI early ideas about Slant.
References
Boden, M.A. 2004. The Creative Mind. Mvths and Mecha-
nisms. 2
nd
Ed. London and New York: Routledge.
Devitt, A.J. 2008. Writing Genres. Carbondale: Southern
Illinois University Press.
Fauconnier, G. 1999. 'Methods and Generalizations. In
Cognitive Linguistics, Foundations, Scope, and Methodol-
ogv, ed. T. Janssen and G. Redeker, 95127. The Hague:
Mouton De Gruyter: 96.
Fauconnier, Gilles, and Turner, M. 2002. The Wav We
Think. Conceptual Blending and the Minds Hidden Com-
plexities. New York: Basic Books.
Goguen, J., and Harrell, D.F. 2008. Style, computation, and
conceptual blending. In Argamon, S., and Dubnov, S., eds.,
The Structure of Stvle. Algorithmic Approaches to Under-
standing Manner and Meaning. Berlin: Springer-Verlag.
291316.
Grady, J. E., Oakley, T., and Coulson, S. 1999. Blending
and Metaphor. In Metaphor in Cognitive Linguistics, ed.
Gerard Steen and Ray Gibbs, 101124. Amsterdam: John
Benjamins.
Genette, G. 1983. Narrative discourse. An essav in method.
Cornell University Press.
Harrell, D.F. 2006. Walking blues changes undersea: Imagi-
native narrative in interactive poetry generation with the
GRIOT system. In Proceeding of the AAAI 2006 Workshop
in Computational Aesthetics. Artificial Intelligence Ap-
proaches to Happiness and Beautv, 6169. AAAI Press.
Harrell, D.F. 2007. GRIOT`s tales oI haints and seraphs: A
computational narrative generation system. In Wardrip-
Fruin, N., and P. Harrigan, eds., Second Person. Role-Plav-
ing and Storv in Games and Plavable Media. Cambridge,
MA: MIT Press, 2007. 177182.
Harrell, D. F. 2007a. 'Cultural Roots Ior Computing: The
Case oI AIrican Diasporic Orature and Computational Nar-
rative in the GRIOT System, Fibreculture Journal, Vol.
11, http://journal.Iibreculture.org/issue11/issue11harrell.html
LakoII, G. and Turner, M. 1989. More than Cool Reason
A Field Guide to Poetic Metaphor. Chicago: University oI
Chicago Press.
MontIort, N. 2009. Curveship: An interactive Iiction system
Ior interactive narrating. In Proceedings of the NAACL HLT
Workshop on Computational Approaches to Linguistic Cre-
ativitv, 5562.
MontIort, N. 2011. Curveship: Adding control oI narrative
style. In Proceedings of the Second International Confer-
ence on Computational Creativitv, 163.
MontIort, N., and Perez y Perez, R. 2008. Integrating a plot
generator and an automatic narrator to create and tell stories.
In Proceedings of the 5th International Joint Workshop on
Computational Creativitv.
http://nickm.com/iI/mexica-nnijwcc08.pdI
Neale, S. 1980. Genre. London: British Film Institute.
Perez y Perez, R., Ortiz, O., Luna, W. A., Negrete, S.,
Pealoza, E., Castellanos, V., and Avila, R. 2011. A system
Ior evaluating novelty in computer generated narratives. In
Proceedings of the Second International Conference on
Computational Creativitv, 6368.
Perez y Perez, R., and Sharples, M. 1999. MEXICA: A
computational model oI the process oI creative writing. In
Proceedings of the AISB Svmposium on Creative Language.
Humour and Stories, 4651.
Perez y Perez, R., and Sharples, M. 2001. MEXICA: A
computer model oI a cognitive account oI creative writing.
Journal of Experimental and Theoretical Artificial Intelli-
gence. 13(2): 119139.
Perez y Perez, R., and Sharples, M. 2004. Three computer-
based models oI storytelling: BRUTUS, MINSTREL and
MEXICA. Knowledge Based Svstems Journal. 17(1):
1529.
Polanyi, L. 1989. Telling the American Storv. A Structural
and Cultural Analvsis of Conversational Storvtelling. Cam-
bridge, MA: The MIT Press.
Prince, G. 1988. The disnarrated. Stvle 22(1): 18.
Ryan, M-L. 1981. The why, what and how oI generic taxon-
omy. Poetics 10: 109126.
Ryan, M-L. 1991. Possible worlds, artificial intelligence,
and narrative theorv. Bloominton: Indiana University Press.
Proceedings of the Fourth International Conference on Computational Creativity 2013 175
Using Theory Formation Techniques
for the Invention of Fictional Concepts
Flaminia Cavallo, Alison Pease, Jeremy Gow, Simon Colton
Computational Creativity Group
Department of Computing
Imperial College, London
ccg.doc.ic.ac.uk
Abstract
We introduce a novel method for the formation of ctional
concepts based on the non-existence conjectures made by the
HR automated theory formation system. We further intro-
duce the notion of the typicality of an example with respect
to a concept into HR, which leads to methods for ordering c-
tional concepts with respect to novelty, vagueness and stimu-
lation. To test whether these measures are correlated with the
way in which people similarly assess the value of ctional
concepts, we ran an experiment to produce thousands of de-
nitions of ctional animals. We then compared the softwares
evaluations of the non-ctional concepts with those obtained
through a survey consulting sixty people. The results show
that two of the three measures have a correlation with hu-
man notions. We report on the experiment, and we compare
our system with the well established method of conceptual
blending, which leads to a discussion of automated ideation
in future Computational Creativity projects.
Introduction
Research in Articial Intelligence has always been largely
focused on reasoning about data and concepts which have a
basis in reality. As a consequence, concepts and conjectures
are generated and evaluated primarily in terms of their truth
with respect to a given a knowledge base. For instance, in
machine learning, learned concepts are tested for predictive
accuracy against a test set of real world examples. In Com-
putational Creativity research, much progress has been made
towards the automated generation of artefacts (painting, po-
ems, stories, music and so on). When this task is performed
by people, it might start with the conception of an idea, upon
which the artefact is then based. Often these ideas consist of
concepts which have no evidence in reality. For example, a
novelist could write a book centered on the question What
if horses could y? (e.g., Pegasus), or a singer could write a
song starting from the question What if there were no coun-
tries? (e.g., John Lennons Imagine). However, in Compu-
tational Creativity, the automated generation and evaluation
of such ctional concepts for a creativity purposes is still
largely unexplored.
The importance of evaluating concepts independently of
their truth value has been highlighted by some cognitive sci-
ence research. Some of the notions that often appear in the
cognitive science and psychology literature are those of nov-
elty, actionability, unexpectedness and vagueness. Novelty
is used to calculate the distance between a concept and a
knowledge base. In (Saunders 2002), interestingness is eval-
uated through the use of the Wundt Curve (Berlyne 1960),
a function that plots hedonistic values with respect to nov-
elty. The maximum value of the Wundt curve is located in a
region close to the y-axis, meaning, as Saunders points out,
that the most interesting concepts are those that are similar-
yet-different to the ones that have already been explored
(Saunders 2002). The notions of actionability and unexpect-
edness were rst introduced in (Silberschatz and Tuzhilin
1996) as measurements of subjective interestingness. Ac-
tionability evaluates the number of actions or thoughts that
an agent could undertake as a consequence of a discovery.
Unexpectedness is a measurement inversely proportional to
the predictability of a result or event. Finally, vagueness
is referred to as the difculty of making a precise decision.
Several measurements have been proposed in the literature
for the calculation of this value, particularly using fuzzy sets
(Klir 1987).
The importance of generating concepts which describe
contexts outside of reality was underlined by Boden when
she proposed her classication of creative activity. In par-
ticular, Boden identies three ways of creativity (Boden
2003): combinational creativity, exploratory creativity and
transformational creativity. Transformational creativity in-
volves the modication of a search space by breaking its
boundaries. One reading of this could therefore be the cre-
ation of concepts that are not supported by a given knowl-
edge base; we refer to these as ctional concepts herein.
Conceptual blending (Fauconnier and Turner 2002) offers
clear methods for generating ctional concepts, and we re-
turn to this later, specically with reference to the Divago
system which implemented aspects of conceptual blending
theory (Pereira 2007).
We propose a new approach to the formation and evalua-
tion of ctional concepts. Our method is based on the use of
the HR automated theory formation system (Colton 2002b)
(reviewed below), and on cognitive science notions of con-
cept representation. In particular, we explore how the notion
of typicality can improve and extend HRs concept forma-
tion techniques. In the eld of cognitive psychology, typi-
cality is thought of as one of the key notions behind concept
representation. Its importance was one of the main factors
that led to the rst criticisms of the classical view (Rosch
Proceedings of the Fourth International Conference on Computational Creativity 2013 176
1973), which argues that concepts can be represented by a
set of necessary and sufcient conditions. Current cognitive
theories therefore take into account the fact that exemplars
can belong to a concept with a different degree of member-
ship, and the typicality of an exemplar with respect to a con-
cept can be assessed.
In the following sections, we discuss the methods and re-
sults obtained by introducing typicality values into HR. We
argue that such typicality measures can be used to evalu-
ate and understand ctional concepts. In particular, we pro-
pose calculations for three measures which might sensibly
be linked to the level of novelty, vagueness and stimulation
associated with a ctional concept. We generated denitions
of ctional animals by applying our method to a knowledge
base of animals and we report the results. We then compare
the softwares estimate of novelty, vagueness and stimula-
tion with data obtained through a questionnaire asking sixty
people to evaluate some concepts with the same measures
in mind. The results were then used to test whether there
is a correlation between our measurements and the usual
(human) understanding of the terms novelty, vagueness and
stimulation. We then compare our approach and the well es-
tablished methods of conceptual blending. Finally, we draw
some conclusions and discuss some future work.
Automated Theory Formation
Automated theory formation concerns the formation of in-
teresting theories, starting with some initial knowledge then
enriching it by performing inventive, inductive and deduc-
tive reasoning. For our purposes, we have employed the HR
theory formation system, which has had some success in-
venting and investigating novel mathematical concepts, as
described in (Colton and Muggleton 2006). HR performs
concept formation and conjecture making by applying a con-
cise set of production rules and empirical pattern matching
techniques respectively. The production rules take as input
the denition of one or two concepts and manipulates them
in order to output the denition of the new concept. For ex-
ample, the compose production rule can be used to merge the
clauses of the denitions of two concepts into a new deni-
tion. It could, therefore, be given the concept of the number
of divisors of an integer and the concept of even numbers
and be used to invent the concept of integers with an even
number of divisors. The success set the collection of all
the tuples of objects which satisfy the denition of the new
dened concept is then calculated. Once this is obtained, it is
compared with all the previously generated success sets and
used to formulate conjectures about the new concept. These
conjectures take the form of equivalence conjectures (when
two success sets match), implication conjectures (when one
success set is a subset of another), or non-existence conjec-
tures (when a success set is empty).
In domains where the user can supply some axioms, HR
appeals to third party theorem provers and model generators
to check whether a conjecture follows from the axioms or
not. HR follows a best-rst non-goal-oriented search, dic-
tated by an ordered agenda and a set of heuristic rules used
to evaluate the interestingness of each concept. Each item in
the agenda represents a theory formation step, which is an
instruction about what production rule to apply to which ex-
isting concept(s) and with which parameters. The agenda
is ordered with respect to the interestingness of the con-
cepts in the theory, and the most interesting concepts are
developed rst. Overall interestingness is calculated as a
weighted sum (where the weights are provided by the user)
of a set of measurements, described in (Colton 2002b) and
(Colton, Bundy, and Walsh 2000). These were developed to
evaluate non-ctional concepts, but some of them could be
modied to evaluate ctional concepts for our system, and
we plan to do this in future work. HR was developed to
work in mathematical domains, but different projects have
demonstrated the suitability of this system to work in other
domains such as games (Baumgarten et al. 2009), puzzles
(Colton 2002a), HRs own theories (Colton 2001) and visual
art (Colton 2008).
Using HR to Generate Fictional Concepts
We are interested in the generation and evaluation of con-
cepts for which it is not possible to nd an exemplar in the
knowledge base that completely meets the concepts deni-
tion. Throughout this paper we use the term ctional con-
cepts to refer to this kind of concept. We use the HR system
for the generation of such ctional concepts. To do so, after
it has formed a theory of concepts and conjectures in a do-
main, we look at all the non-existence conjectures that it has
generated. These are based on the concepts that HR con-
structs which have an empty success set. Hence, the con-
cepts that lie at the base of these conjectures are ctional
with respect to the knowledge base given to HR as back-
ground information. For example, from the non-existence
conjecture:
@(x)(Reptile(x) &HasWings(x))
we extract the ctional concept:
C
0
(x) = Reptile(x) &HasWings(x)
To see whether typicality values can be used for the eval-
uation of these ctional concepts, we have introduced this
notion into HR. Typicality values are obtained by calculat-
ing the degree of membership of each user-given constant
(i.e., animals in the above example) with respect to every
ctional concept which specialises the concept of the type
of object under investigation (which is the concept of be-
ing an animal in this case). This is done by looking at the
proportion of predicates in a concept denition that are sat-
ised by each constant. Hence, for each constant a
j
and
for each ctional concept C
i
in the theory, we will have
Typicality(a
j
, C
i
) = t, where 0 t < 1. For example,
for the concept denition:
C
1
(x) = Mammal(x) &HasWings(x)
&LivesIn(x, Water)
the typicality values for the constants in the set
{Lizard, Dog, Dolphin, Bat} are as follows:
Proceedings of the Fourth International Conference on Computational Creativity 2013 177
Typicality(Lizard, C
1
) = 0;
Typicality(Dog, C
1
) = 0.3;
Typicality(Dolphin, C
1
) = 0.6;
Typicality(Bat, C
1
) = 0.6;
We see that the constant Dolphin has typicality of 0.6 with
respect to C
1
because a dolphin is a mammal which lives in
water but which doesnt have wings hence it satises two
of the three predicates ( 66.6%) in the denition of C
1
.
It is important to note that for each ctional concept C
there are at least n constants a
1
, ..., a
n
such that j, 0 <
Typicality(a
j
, C) < 1, where n is the number of predi-
cates in the concept denition. We refer to these as the atyp-
ical exemplars of ctional concept C, and we denote this
set of constants as atyp(C). The atypical exemplars of C
have typicality bigger than zero because they partly belong
to C, and less than one because the concept is ctional, and
hence by denition it doesnt have any real life examples.
The number of atypical exemplars of a ctional concept is
always more than or equal to the number of predicates in
the concept denition because ctional concepts originate
from the manipulation of non-ctional concepts, and hence,
given a well formed knowledge base each predicate in a
ctional concept denition will correspond to a non-ctional
concept with at least one element in its success set.
Evaluating Concepts Based on Typicality
We explain here how typicality can be used to evaluate c-
tional concepts along three axes which we claim can be sen-
sibly used to estimate how people will assess such concepts
in terms of vagueness, novelty and stimulation respectively.
This claim is tested experimentally in the next section. To
dene the measures for a ctional concept C produced as
above, we use E to represent the set of constants (exam-
ples) in the theory, e.g., animals, and we use NF to denote
the set of non-ctional concepts produced alongside the c-
tional ones. We use |C| to denote the number of conjunct
predicates in the clausal denition of concept C. We fur-
ther re-use atyp(C) to denote the set of atypical exemplars
of C and the Typicality measure we introduced above. It
should be noted that the proposed methods of evaluation of
ctional concepts have not been included into the HR pro-
gram to guide concept formation.It is, however, our ambi-
tion to turn these measurements into measures of interest for
ordering HRs agenda.
Using Atypical Exemplars
Our rst measure, M
V
, of ctional concept C, is suggested
as an estimate of the vagueness of C. It calculates the pro-
portion of constants which are atypical exemplars of C, fac-
tored by the size of the clausal denition of C, as follows:
M
V
(C) =
|atyp(C)|
|E| |C|
As previously discussed, vagueness is a measurement that
has been widely studied in the context of fuzzy sets. Klir
(1987) emphasises the difference between this measurement
and the one of ambiguity, and underlines how vagueness
should be used to refer to the difculty of making a precise
decision. While several more sophisticated measurements
have been proposed in the literature, as explained in
(Klir 1987), we chose the above straightforward counting
method, as this is consistent with the requirement that if
concept C
a
is intuitively perceived as more vague than
concept C
b
, then M
V
(C
a
) > M
V
(C
b
). To see this, suppose
we have the following two concepts:
C
1
(x) = Animal(x) &has(x, Wings)
C
2
(x) = Reptile(x) &has(x, Wings)
In this case, we can intuitively say that an animal with wings
is more vague than a reptile with wings, because for the rst
concept, we have a larger choice of animals than for the sec-
ond. In terms of typicality, this can be interpreted as the fact
that C
1
has a larger number of atypical exemplars than C
2
,
and it follows that M
V
(C
1
) > M
V
(C
2
).
Using Average Typicality
Our second measure, M
N
, of ctional concept C, is sug-
gested as an estimate of the novelty of C. It calculates the
complement of the average typicality of the atypical exem-
plars of C, as follows:
M
N
(C) = 1
1
|atyp(C)|
X
aE
Typicality(a, C)
!
Novelty is a term largely discussed in the literature, and can
be attached to several meanings and perspectives. In our
case, we interpret novelty as a measurement of distance to
the real world, as inferred in previous work in computational
creativity research, such as (Saunders 2002). As an example
of this measure, given the concepts:
C
1
(x) = Bear(x) &Furniture(x) &Has(x, Wings)
C
2
(x) = Bear(x) &Furniture(x) &Brown(x)
then, in a domain where all the constants are either exclu-
sively bears or furniture (but not both), and assuming that
all the bears and all the furniture are brown, we calculate:
M
N
(C
1
) = 0.6
M
N
(C
2
) = 0.3
This is because for C
1
, all exemplars will satisfy just one
of the three clauses (
1
3
) in the denition, hence this will be
their average typicality, and C
1
will score 1
1
3
= 0.6 for
M
N
. In contrast, all exemplars will satisfy two out of the
three clauses in C
2
, and hence it scores 0.3 for M
N
. Hence
we can say that C
1
is more distant from reality, and hence
more novel, than C
2
. Consistent with the literature, and in
particular with the Wundt Curve (which compares novelty
with the hedonic value), we assume that the most interesting
concepts have an average typicality close to 0.5. Note that
this implies that ctional concepts whose denition contains
two conjuncts are always moderately interesting in terms of
novelty, as their average typicality is always equal to 0.5.
Proceedings of the Fourth International Conference on Computational Creativity 2013 178
Using Non-Fictional Concepts
Our nal measure, M
S
, of ctional concept C is suggested
as an estimate of the stimulation that C might elicit when au-
diences are exposed to it (i.e., the amount of thought it pro-
vokes). It is calculated as the weighted sum of all the non-
ctional concepts, r, in NF that HR formulates for which
their success set, denoted ss(r), has a non-empty intersec-
tion with atyp(C). The weights are calculated as the sum of
the typicalities over atyp(C) with respect to C. M
S
(C) is
calculated as follows:
M
S
(C) =
X
rNF
X
aatyp(C)ss(r)
Typicality(a, C)
+
. Table 2 refers to the first author only due to space limi-
tations. Some papers are rather comprehensive, such as
Indurkhya (2012) and Maher (2012) which span across five
MLCC levels each, but the overall average is 2.18 indicat-
ing a reasonable distinction among types of CC models.
Although these results systematic validation, they sug-
gest a focus on generative processes in ICCC12 (60 en-
tries, including 43 existing and 17 target processes). Evalu-
ation processes constitute a minority (14 total entries, half
of them referring to target processes). These results are
consistent with the preceding finding that only a third of
systems presented as creative were actually evaluated on
how creative they are (Jordanous 2011).
MLCC level 8 is the most prevalent: 40% of all papers
discuss existing CC processes, and an additional 11% dis-
cuss target CC processes. Level 8 refers to methods and
techniques aimed at solving problems or generating crea-
tive solutions with no direct claims to model or being in-
spired by the other MLCC levels. Examples include asso-
ciation-based computational creative systems (Grace et al
2012); small-scale creative text generators (Montfort and
Fedorova 2012); and a music generator inspired by non-
musical audio signals (Smith et al 2012).
MLCC level 4 is present in 30% of the papers; these
present -or discuss approaches to generate- concrete arti-
facts identified as creative. They include Visual Narrator
which constructs short visual narratives (Prez y Prez et al
2012); machine-composed music (Eigenfeldt et al 2012);
and PIERRE which produces new crockpot recipes (Morris
et al 2012).
!"#$%&' ) * + , - . / 0 #1123
!4536787
9
9 9
:;<=78 9
9 9
:5>8?66
@
A
@ 9
BC;>82?<
9
A
9
A
9
A
B12618
9
9 9
D74?8E?2=6
@ 9
9
F;G1>; 9
9
9
A
9
F;667
9
F>;H?
9
9
$11I?>
9
J8=5>KC<; @
A
@
A
9
A
@
A
9
A
L?887843
9
9
L1C8318
9
A
9
A
9
A
L1>=;853
9
A
L5>37H
9
M?22?>
9
N7
9
N78318
@
A
O;C?> 9
A
9
A
9
A
9
A
9
A
O186?76C
9
9
O186E1>6
9
O1>>73 9
9
9
P1<
9
%QR1814C5? 9
9
%4;S; 9
A
@
A
TU>?V < TU>?V
@ 9
9
&;8K
9
&76HC7?
9
A
9
A
'W76C
9
9
'13;
9
#17I;8?8 9
@
9
X?;2? @
9
Y744783
9
9
ZC5 @
Table 2. Classification of the ICCC12 papers in MLCC levels
More than 30% of all papers address MLCC level 6,
cognition. Most of these refer to the cognitive processes
involved in the generation of creative artifacts, but a few
do suggest the study of cognitive processes related to the
evaluation of creativity (Ogawa et al 2012; Linson et al
2012; Indurkhya 2012).
MLCC level 1 is captured in 35% of all papers. In most,
culture is used as a source in the creation of creative arti-
facts (as corpora or as evolutionary models at the cultural
level). The remaining entries deal with culture as part of
the evaluation of creativity. These include the application
Proceedings of the Fourth International Conference on Computational Creativity 2013 202
of literary criticism and communication theory [] to
develop evaluation methods (Zhu 2012) and conceptual
mash-ups evaluated against semantic structures seeking
to replicate the semantic categories (Veale 2012). Nota-
bly, MLCC level 7 neural models of creativity- is not
represented in ICCC12, although progress is being made
elsewhere (Iyer et al 2009).
Evaluation processes are scarce and gravitate mainly
around MLCC levels 1 and 3 (Culture and Groups). 11%
report assessment by small groups (audiences, experts) and
the same number use culture as a metric for validating the
results of a CC system (by comparison against or recrea-
tion of concrete cultural achievements). Only a couple of
papers present potential ways of using societal factors or
cognitive studies to understand how an artifact is ascribed
creative value.
From an evaluation viewpoint, the ICCC12 papers do
not address the following MLCC levels: products (level 4),
personality (level 5), neural processes (level 7) and CC
processes (level 8). In this way, the MLCC model helps
suggest future research approaches including:
Models that incorporate explicit CC processes of evalua-
tion of creativity, for example automated critics or
automated audiences capable of replicating the as-
sessment patterns of human judges (different scales
and levels of domain expertise), as well as ultimately
predicting the creativeness of computer-generated arti-
facts (Maher and Fischer 2012). Sample research ques-
tion: How may a computational system identify a
masterpiece from mediocre artworks?
Models of neuro-mechanisms behind the creation as
well as the evaluation of creativity. Systems that cap-
ture the connections between neural and cognitive
processes. Sample research question: How do basic
functions such as short term memory or cognitive load
moderate the evaluation of creative artifacts?
Models of the role of personality and motivation in the
creation as well as the evaluation of creativity, for ex-
ample systems that create or evaluate artifacts based
on emotional predispositions, gender distinctions, and
other personality dimensions. Models where creative
behavior is moderated by environmental cues. Sample
research question: How do extraversion traits such as
assertiveness moderate the assessment of creativity?
Models of intrinsic artifact properties identified in the
evaluation of creativity according to intra and cross-
domain characteristics. Sample research question:
What common assessment criteria do people apply
when ascribing creativity in music, literature and ar-
chitectural works?
Beyond these "missing" levels (or ICCC gaps), this
analysis leads to interesting new possibilities and distinc-
tions in CC research:
Culture can be approached in several ways in both gen-
erative and evaluative models: as the source of
knowledge and generative techniques; as the standards
against which new artifacts are evaluated by the crea-
tor and by the evaluators; as the status-quo that pre-
vent or constrain acceptance of new artifacts; as fac-
tors exogenous to the domain from which creators can
draw from and introduce novelty into their creative
process; as rules and regulations that incentiv-
ize/inhibit creative processes; as market or cultural
outlets and vehicles of promotion of creative value;
etc.
Societal and group levels can equally be considered in
several ways: as large collectives or small groups
(teams) collaborating in creative endeavors; as opinion
leaders that influence both creators and evaluators; as
cliques that provide support but may also polarize
types of creators; as aggregate structures of behavior
that lead to segmentation, migration, institutionaliza-
tion; as temporal and spatial trends; etc.
As noted before, cognitive modeling may apply both to the
generation and the evaluation of creativity. Likewise, alt-
hough current computational tools are conceived for the
creation of creative artifacts, computational tools could
also support the individual and collective evaluation of
artificial and human-produced artifacts for example
through the automated extraction of evaluation functions
provided customer needs and requirements, which can then
be used to guide either a computational system or human
designers.
Discussion
How do works such as the Mona Lisa by Leonardo become
icons of creativity? Elements to consider range from its
intrinsic aesthetic and artistic qualities all the way to its
distinctive history including its theft from the Louvre in
1911 and the ensuing two-year international media notorie-
ty (Scotty 2010). This illustrative case exemplifies the en-
tangled art!market complex (Joy and Sherry 2003). Two
CC scenarios are compared here where MLCC modeling is
demonstrated:
1) The Next Mona Lisa CC model: a computational
generative system is pursued that captures MLCC levels 6,
7 and/or 8 implementing symbolic or neural techniques
(inspired or not by human capabilities) which aims to cre-
ate a work of art comparable to the Mona Lisa, i.e., that
receives the kind of appreciation and recognition gaining
the status of a global cultural icon. The problem is that not
only this approach seems rather implausible based on the
current state of CC, it would also require a vast number of
exogenous factors outside the reach of the systems authors
and would probably require very long time periods, con-
sidering that even La Gioconda path to prominence took
more than four centuries (Scotti 2010).
2) The Mona Lisa System CC model: a multilevel com-
putational system is based on the MLCC levels of choice
(two or more from 1 to 8), which aims to capture the crea-
tion of a large number of artifacts, some of which (most)
fall into complete oblivion, some of which (very few)
Proceedings of the Fourth International Conference on Computational Creativity 2013 203
make it to the equivalent of mediocre galleries, local mu-
seums and living rooms of elite audiences, and some of
which (an absolute minority) are preserved, disseminated
and capture broad attention and consensus. Some works in
this last category may gradually become part of the cultural
heritage, may be used as exemplars in specialized domain
training and in general education, may fetch high prices in
auctions or be considered invaluable in monetary terms,
and may ultimately play an influential role in shaping pub-
lic taste as well as future artifacts within and beyond the
domain of origin.
The latter approach opens interesting intellectual paths:
What types of processes are capable of generating such
diversity of artifacts? What commissioning, distribution
and exchange mechanisms are sufficient to account for the
observed skewed distributions of evaluation? What con-
nections are possible, in principle, between intrinsic char-
acteristics of artifacts and contextual conditions? What
cross-level dynamics apply to creative systems from dif-
ferent domains and times?
Such an MLCC model can include a large number of
elements, possibly derived from published studies for
example of art-market dynamics in this case (Debenedetti
2006; Joy and Sherry 2003). The output in such models
may not be (only or necessarily) the creative artifact itself,
but a deeper understanding of the principles that underlie
creative generation and evaluation. This may include two
or more MLCC levels, and over time, historical trajectories
that are likely to be context and time-dependent. Thus the
high relevance of CC approaches for the study of systems
based on stochastic processes which can be re-run over sets
of initial conditions in order to inspect causal relationships
and long-term effects.
Lastly, the following guidelines are provided when
building MLCC models, somehow extending the evalua-
tion guidelines proposed by Jordanus (2011).
1) Identify levels to be modeled
a) Define primary and complementary levels: realisti-
cally, empirical validation or data may be relevant
only for one or two levels, whilst computational ex-
plorations can target other levels of interest.
b) Identify level variables (experimental and depend-
ent) that represent target factors and observable be-
haviors or patterns of interest.
c) Define inputs and outputs at target levels, establish-
ing the bootstrapping strategies of the model.
2) Define relationships of interest between levels
a) Establish explicit connections above/below primary
levels in the model
b) Define irreducible factors, causal links and whether
the model is being used for holistic or reductionistic
purposes.
c) Identify internal/exogenous factors to the system.
3) Depending on modeling aims, define outputs
a) Define type and range of outputs, identifying ex-
treme points such as non-creative to creative arti-
facts
b) Capture and analyze aggregate data, model tuning
and refinement
4) Evaluation of a MLCC system
a) Validity may be achievable in some models where
relevant empirical data exists at the primary level(s)
of interest, but this may be inaccessible and even
undesirable for exploratory models.
Acknowledgements
This work was supported in part by the US National Sci-
ence Foundation under grant number SBE-0915482. Any
opinions, findings, and conclusions or recommendations
expressed in this material are those of the authors and do
not necessarily reflect the views of the National Science
Foundation.
References
Adesope, O., Lavin, T., Thompson, T., and Ungerleider, C.
2010. A systematic review and meta-analysis of the cogni-
tive correlates of bilingualism. Review of Educational Re-
search 80(2):207245.
Anderson, C. 2012. Makers: The New Industrial Revolu-
tion. London: Random House Business Books.
Anderson, C., and Kilduff, G. J. 2009. Why do dominant
personalities attain influence in face-to-face groups? The
competence-signaling effects of trait dominance. Journal
of Personality and Social Psychology, 96(2):491503.
Bassett-Jones, N. 2005. The paradox of diversity manage-
ment, creativity and innovation. Creativity and Innovation
Management 14(2):169175.
Cohen, H. 1999. Colouring without seeing: a problem in
machine creativity. AISB Quarterly 102:2635.
Debenedetti, S. 2006. The role of media critics in the cul-
tural industries. International Journal of Arts Management
8:3042.
Dollinger, S. J., Urban, K. K., and James, T. A. 2004. Cre-
ativity and openness: Further validation of two creative
product measures. Creativity Research Journal 16(1):35
47.
Duflou, J. R., and Verhaegen, P. A. 2011. Systematic in-
novation through patent based product aspect analysis.
CIRP Annals-Manufacturing Technology 60(1):203206.
Finke, R. A., Ward, T. B., and Smith, S. M. 1996. Creative
Cognition: Theory, Research, and Applications. Cam-
bridge: MIT Press.
Fischer, G., Scharff, E., and Ye, Y. 2004. Fostering social
creativity by increasing social capital. In Huysman, M. and
Wulf, V., eds., Social Capital and Information. Cam-
bridge: MIT Press. 55399.
Fontaine, R. G. 2006. Applying systems principles to mod-
els of social information processing and aggressive behav-
ior in youth. Aggression and Violent Behavior 11:6476.
Gero, J. S. 1990. Design prototypes: a knowledge represen-
Proceedings of the Fourth International Conference on Computational Creativity 2013 204
tation schema for design, AI Magazine 11(4):2636.
Gl!veanu, V. P. 2010. Paradigms in the study of creativity:
Introducing the perspective of cultural psychology. New
Ideas in Psychology 28(1):7993.
Hansen, H. K., and Niedomysl, T. 2009. Migration of the
creative class: evidence from Sweden. Journal of Econom-
ic Geography 9(2):191206.
Hennessey, B. A. 2003. Is the social psychology of creativ-
ity really social? Moving beyond a focus on the individual.
In Paulus, P. B., and Nijstad, B. A. Group Creativity: In-
novation through Collaboration. Oxford University Press.
181201.
Indurkhya, B. 2012. Whence is Creativity? In International
Conference on Computational Creativity, 6266.
Isaksen, S. G., Puccio, G. J., and Treffinger, D. J. 1993. An
ecological approach to creativity research: Profiling for
creative problem solving. The Journal of Creative Behav-
ior 27(3):149170.
Iyer, L. R., Doboli, S., Minai, A. A., Brown, V. R., Levine,
D. S., and Paulus, P. B. 2009. Neural dynamics of idea
generation and the effects of priming. Neural Networks
22(5):674686.
Jordanous, A. 2011. Evaluating evaluation: Assessing pro-
gress in computational creativity research. In Proceedings
of the Second International Conference on Computational
Creativity, 102107.
Joy, A., and Sherry Jr, J. F. 2003. Disentangling the para-
doxical alliances between art market and art world. Con-
sumption, Markets and Culture 6(3):155181.
Jung, R. E., Segall, J. M., Jeremy Bockholt, H., Flores, R.
A., Smith, S. M., Chavez, R. S., and Haier, R. J. 2010.
Neuroanatomy of creativity. Human Brain Mapping 31(3):
398409.
Kaufman, J. C., and Beghetto, R. A. 2009. Beyond big and
little: The four c model of creativity. Review of General
Psychology 13(1):112.
Lessig, L. 2008. Remix: Making Art and Commerce Thrive
in the Hybrid Economy. Penguin Press HC.
Liu, H., Tang, M., and Frazer, J. H. 2004. Supporting crea-
tive design in a visual evolutionary computing environ-
ment. Advances in Engineering Software 35(5):261271.
Lubart, T. 2010. Cross-cultural perspectives on creativity.
In Kaufman, J. C., and Sternberg, R. J., eds. 2010. The
Cambridge Handbook of Creativity. Cambridge University
Press. 265276.
Maher, M.L. 2010. Design creativity research: from the
individual to the crowd, In Design Creativity 2010, 4147.
London: Springer-Verlag.
Maher, M.L. 2012. Computational and collective creativi-
ty: Whos being creative?. In International Conference on
Computational Creativity, 6771.
Maher, M.L. and Fisher, D.H. 2012. Using AI to Evaluate
Creative Designs. In Proceedings of International Confer-
ence on Creative Design, 4554.
Maher, M.L., Hammond, K., Pease, A., Perez y Perez, R.,
Ventura, D., and Wiggins, G. (Eds.) 2012. Proceedings of
the Third International Conference on Computational Cre-
ativity. http://computationalcreativity.net/iccc2012
McCoy, J. M., and Evans, G. W. 2002. The potential role
of the physical environment in fostering creativity. Crea-
tivity Research Journal 14(3-4):409426.
McGrew, S. 2012. Creativity in Nature. In Swan, L;, Gor-
don, R., and Seckbach, J., eds., Origin(s) of Design in Na-
ture. Springer Netherlands. 4355.
Montfort, N., and Fedorova, N. 2012. Small-Scale Systems
and Computational Creativity. In International Conference
on Computational Creativity, 8286.
Moran, S., and John-Steiner, V. 2003. Creativity in the
making. In Sawyer, R. K., John-Steiner, V., Moran, S.,
Sternberg, R. J., Feldman, D. H., Nakamura, J., and
Csikszentmihalyi, M., eds., Creativity and Development,
Oxford: Oxford University Press. 6190.
Rogers, E. M. 1995. Diffusion of Innovations. Simon and
Schuster.
Russell, S. J., and Norvig, P. 1995. Artificial Intelligence:
A Modern Approach. Prentice Hall.
Scotti, R. A. 2010. Vanished Smile: The Mysterious Theft
of the Mona Lisa. Vintage.
Sosa, R., and Dong, A. 2013. The creative assessment of
rich ideas. In Proceedings of the Ninth ACM Conference
on Creativity and Cognition. ACM Press.
Sosa, R., and Gero, J. S. 2004. A computational framework
for the study of creativity and innovation in design: Effects
of social ties. Design Computing and Cognition 04, 499
517.
Sosa, R., and Gero, J. S. 2005a. A computational study of
creativity in design: the role of society. AIEDAM Artificial
Intelligence Engineering Design Analysis and Manufactur-
ing 19(4):229244.
Sosa, R., and Gero, J. S. 2005b. Innovation and design:
computational simulations. In ICED 05: 15th International
Conference on Engineering Design: Engineering Design
and the Global Economy, 1522-1528.
Sosa, R., Gero, J. S., and Jennings, K. 2009. Growing and
destroying the worth of ideas. In Proceedings of the Sev-
enth ACM Conference on Creativity and Cognition, 295
304. ACM Press.
Westmeyer, H. 2009. Kreativitt als relationales Konstrukt.
In Witte, E.H., and Kahl, C.H., eds., Sozialpsychologie der
Kreativitt und Innovation, 1126. Lengerich: Pabst Sci-
ence Publishers.
Young, H. P. 2009. Innovation diffusion in heterogeneous
populations: Contagion, social influence, and social learn-
ing. The American Economic Review 99(5):18991924.
Evaluating Human-Robot Interaction with Embodied Creative Systems
Rob Saunders
Design Lab
University of Sydney
NSW 2006 Australia
[email protected]
Emma Chee
Small Multiples
Surry Hills, Sydney
NSW 2010 Australia
[email protected]
Petra Gemeinboeck
College of Fine Art
University of NSW
NSW 2021 Australia
[email protected]
Abstract
As we develop interactive systems involving computa-
tional models of creativity, issues around our interaction
with these systems will become increasingly important.
In particular, the interaction between human and com-
putational creators presents an unusual and ambiguous
power relation for those familiar with typical human-
computer interaction. These issues may be particularly
pronounced with embodied articial creative systems,
e.g., involving groups of mobile robots, where humans
and computational creators share the same physical en-
vironment and enter into social and cultural exchanges.
This paper presents a rst attempt to examine these is-
sues of human-robot interaction through a series of con-
trolled experiments with a small group of mobile robots
capable of composing, performing and listening to sim-
ple songs produced either by other robots or by humans.
Introduction
Creativity is often dened as the generation of novel
and valuable ideas, whether expressed as concepts, theo-
ries, literature, music, dance, sculpture, painting or any
other medium of expression (Boden 2010). But creativ-
ity, whether or not it is computational, doesnt occur in a
vacuum, it is a situated activity that is connected with cul-
tural, social, personal and physical contexts that determine
the nature of novelty and value against which creativity is as-
sessed. The world offers opportunities, as well as presenting
constraints: human creativity has evolved to exploit the for-
mer and overcome the latter, and in doing both, the structure
of creative processes emerge (Pickering 2005).
There are three major motivations underlying the research
of developing computational creativity: (1) to construct ar-
ticial entities capable of human-level creativity; (2) to bet-
ter understand and formulate an understanding of creativ-
ity; and, (3) to develop tools to support human creative acts
(Pease and Colton 2011). The development of articial cre-
ative systems is driven by a desire to understand creativity
as interacting systems of individuals, social groups and cul-
tures (Saunders and Gero 2002).
The implementation of articial creative systems using
autonomous robots imposes constraints upon the hardware
and software used. These constraints focus the development
process on the most important aspects of the computational
model to support an embodied and situated form of creativ-
ity. At the same time, embodiment provides opportunities
for agents to experience the emergence of effects beyond the
computational limits that they must work within. Follow-
ing an embodied cognition stance, the environment may be
used to ofoad internal representation (Clark 1996) and al-
lowagents to take advantage of properties of the physical en-
vironment that would be difcult or impossible to simulate
computationally, thereby expanding the behavioural range
of the agents (Brooks 1990).
Interactions between human and articial creators within
a shared context places constraints on the design of the
human-robot interaction but provides opportunities for the
transfer of cultural knowledge through the sharing of arte-
facts. Embodiment allows computational agents to be cre-
ative in environments that humans can intuitively under-
stand. As Penny (1997) describes, embodied cultural agents,
whose function is self reexive, engage the public in a con-
sideration of the nature of agency itself. In the context of the
study of computational creativity, this provides an opportu-
nity for engaging a broad audience in the questions raised by
models of articial creative systems.
The Curious Whispers project (Saunders et al. 2010),
investigates the interaction between human and articial
agents within creative systems. This paper focuses on the
challenge of designing one-to-one and one-to-many inter-
actions within a creative system consisting of humans and
robots and provides a suitable method for examining these
interactions. In particular, the research presented in this pa-
per explores how humans interacting with an articial cre-
ative system construe the agency of the robots and how the
embodiment of simple creative agents may prolong the pro-
duction of potentially interesting artefacts through the inter-
action of human and articial agents. The research adopts
methods from interaction design to study the interactions be-
tween participants and the robots in open-ended sessions.
Background
Gordon Pasks early experiments with electromechanical
cybernetic systems provide an interesting historical prece-
dent for the development of computational creativity (Haque
2007). Through the development of conversational ma-
chines Pask explored the emergence of unique interaction
protocols between the machine and musicians. MusiColour,
Proceedings of the Fourth International Conference on Computational Creativity 2013 205
seen in Figure 1, was constructed by Gordon Pask and Robin
McKinnon-Wood in 1953. It was a performance system
comprising of coloured lights that illuminated in conjunc-
tion with audio input from a human performer.
But MusiColour did more than transcode sound into light,
it manipulated its coloured light outputs such that it became
a co-performer with the musician, creating a unique (though
non-random) output with every iteration (Glanville 1996).
The sequence of the outputs not only depended on the fre-
quencies and rhythms but also repetition: if a rhythm be-
came too predictable then MusiColour would enter a state of
boredom and seek more stimulating rhythms by producing
and stimulating improvisation. As such, it has been argued
that MusiColour acted more like a jazz co-performer might
when jamming with other band members (Haque 2007).
The area of musical improvisation has since provided a
number of examples of creative systems that model social
interactions within creative activies, e.g., GenJam (Biles
1994), MahaDeviBot (Kapur et al. 2009). The recent de-
velopment of Shimon (Hoffman and Weinberg 2010) pro-
vides a nice example of the importance of modelling social
interactions alongside the musical performance.
Figure 1: MusiColour: light display (left) and processing
unit (right) (Glanville 1996).
Performative Ecologies: Dancers by Ruairi Glynn is a
conversational environment, involving human and robotic
agents in a dialogue using simple gestural forms (Glynn
2008). The Dancers in the installation are robots suspended
in space by threads and capable of performing gestures
through twisting movements. The tness of gestures is eval-
uated as a function of audience attention, independently de-
termined by each robot through face tracking. Audience
members can directly participate in the evolution by manip-
ulating the robots, twisting them to record a new gesture.
Successful gestures, i.e., those observed to attract an audi-
ence, are shared between the robots over a wireless network.
The robotic installation Zwischenr aume employs em-
bodied curious agents that transform their environment
through playful exploration and intervention (Gemeinboeck
and Saunders 2011). A small group of robots is embedded in
the walls of a gallery space, they investigate their wall habi-
tat and, motivated to learn, use their motorised hammer to in-
Figure 2: Performative Ecologies: Dancers (Glynn 2008)
troduce changes to the wall and thus novel elements to study.
As the wall is increasingly fragmented and broken down, the
embodied agents discover, study and respond to human au-
diences in the gallery space. Unlike the social models em-
bodied in MusiColour and Performative Ecologies, the so-
cial interactions in Zwischenr aume focus on those between
the robots. Audience members still play a signicant role in
their exploration of the world but in Zwischenr aume visitors
are considered complex elements of the environment.
In The New Artist, Straschnoy (2008) explored issues
of what robots making art for robots could be like. In a se-
ries of interviews, the engineers involved in the development
of The New Artist expressed different interpretations of the
meaning and purpose of such a system. Some questioned the
validity of the enterprise, arguing that there is no reason to
constructs robots to make art for other robots. While others
considered it to be part of a natural progression in creative
development We started out with human art for humans,
then we can think about machine art for humans, or human
art for machines. But will we reach a point where theres ma-
chine art for machines, and humans dont even understand
what they are doing or why they even like it. Interview
with Jeff Schneider, Associate Research Professor, Robotics
Institute, Carnegie Mellon (Straschnoy 2008)
The following section describes the current implementa-
tion of the Curious Whispers, an embodied articial cre-
ative system. The implemented system is much simpler than
those described above, i.e., the robots employ a very simple
generative system to produce short note sequences, but it
provides a useful platform for the exploration of interaction
design issues that arise with the development of autonomous
creative systems involving multiple articial agents.
Implementation
The current implementation of Curious Whispers (version
2.0) uses a small group of mobile robots equipped with
speakers, microphones and a movable plastic hood, see Fig-
ure 3. Each robot is capable of generating simple songs,
evaluating the novelty and value of a song, and perform-
ing those songs that they determine to be interesting to
Proceedings of the Fourth International Conference on Computational Creativity 2013 206
other members of the society including human partici-
pants. Each robot listens to the performances of others and
if it values a song attempts to compose a variation. Clos-
ing their plastic hood, allows a robot to rehearse songs using
the same hardware and software that they use to analyse the
songs of other robots, removing the need for simulation.
Figure 3: The implemented mobile robots and 3-button syn-
thesiser.
A simple 3-button synthesiser allows participants to play
songs that the robots can recognise and if a robot consid-
ers a participants songs to be interesting it will adopt them.
Using this simple interface, humans are free to introduce do-
main knowledge, e.g., fragments of well-known songs, into
the collective memory of the robot society. For more in-
formation on the technical details of the implementation see
Chee (2011).
Methodology
To investigate the interactions between robots and human
participants we adopted a methodology from interaction
design and employed a technology probe. Technology
probes combine methods for collecting qualitative informa-
tion about user interaction, the eld-testing of technology,
and exploring design requirements. A well-designed tech-
nology probe should balance these different disciplinary in-
uences (Hutchinson et al. 2003). A probe should be tech-
nically simple and exible with respect to possible use: it is
not a prototype but a tool to explore design possibilities and,
as such, should be open-ended and explicitly co-adaptive
(Mackay 1990). The probe used in this research involved
three observational studies exploring different aspects of the
human-robot interaction with the embodied creative system.
The observational studies were conducted with different
arrangements of robots and human participants, allowing us
to observe how interaction patterns and user assessments of
the system changed in each conguration. Each session was
video recorded and at the end of each session the partici-
pants were interviewed using a series of open-ended ques-
tions. The interview was based on a similar one developed
by Bernsen and Dybkjr (2005) in their study of conversa-
tional agents. Employing a post-think-aloud method at the
end of each session the participants were rst asked to de-
scribe their experiences interacting with the robot. A similar
method was used in the evaluation of the Sonic City project
(Gaye, Maz e, and Holmquist 2003). The video recordings
were transcribed and interaction events noted on a timeline.
The post-think-aloud reports were correlated with events
in the video recordings where possible.
Six participants were observed in the studies. The par-
ticipants came from a variety of backgrounds and included
2 interaction designers, 2 engineers, 1 linguist, and 1 ani-
mator. All participants were involved in the 1:1 (1 human,
1 robot) observation study. Two participants (Participant 5
and 8) went on to be part of the 1:3 (1 human, 3 robots) ob-
servation study, the other four (Participant 6, 7, 9 and 10)
were involved in the 2:3 (2 humans, 3 robots) observation
study.
1:1 Interaction Observation Study The purpose of the
rst study was to observe the participants behaviour whilst
interacting with a single robot. Each participant was given
a 3-button synthesiser to communicate with the robot and
allowed to interact for as long as they wished, i.e., no time
limit was given.
1:3 Interaction Observation Study The second observa-
tional study involved each participant interacting with the
group of 3 robots to examine how participants interacted
with multiple creative agents at the same time and how
the participants were inuenced by the interactions between
robots. This study involved 2 participants, both participants
had previously completed the rst observation study.
2:3 Interaction Observation Study The third observa-
tional study involved pairs of participants interacting with
the system of 3 working robots. This study allowed for the
participants to not only interact and observe the working sys-
tem but to also interact with each other to share their expe-
riences. This study involved 4 participants working in two
groups of two. The 4 participants were chosen from those
who completed the 1:1 study but were not involved in the
1:3 observation study.
Results
This section presents a brief summary of the observational
studies, a more detail account can be found in Chee (2011).
1:1 Interaction The 1:1 interaction task allowed the par-
ticipants to form individual theories on how single robots re-
acted to them, most learned that the robots did not respond
to individual notes but sequences of them. Participants spent
between 2 and 4 minutes interacting with the robot, much
of that time was spent experimenting to determine how the
robot reacted to different inputs: [I] rst tried to see how
it would react, pressed a single button and then tried a se-
quence of notes (Participant 6). Several of the participants
learned to adopt a turn-taking behaviour with the robots,
e.g., when it started to play I stopped to watch, I only tried
to play when it stopped (Participant 5). Some of the partic-
ipants interpreted the opening and closing of the hood as a
cue for when they could play a song for the robot to learn, as
Participant 9 commented: I played a noise and it took that
song and closed up and was like alright Im gonna think of
Proceedings of the Fourth International Conference on Computational Creativity 2013 207
something better. It sounded like it was repeating what I
did but like a bit different. Like it was working out what Id
done. Most of the participants assumed the role of teacher
and attempted to get the robot to repeat a simple song. But
in the case of Participant 8 the roles were reversed as the
participant began copying the songs played by the robot.
1:3 Interaction For the 1:3 interaction studies the group
of robots were placed on a table in a quiet location, as shown
in Figure 4. The participants interacted with the group of
robots for approximately 5 minutes. Both participants al-
ready knew the robots were responsive to them from the 1:1
study, but they found it difcult to determine which robot
they were interacting with: you knew you could interact but
you were not really aware of the reaction as a group (Partic-
ipant 5). The participants noticed that the robots were differ-
ent: the green robots song was slightly different to blue and
purple (Participant 5); and, that they exhibited social be-
haviour amongst themselves: Noticed they didnt rely just
on the [synthesiser], the 3 of them were communicating. I
thought they sang in a certain order as one started and the
others would reply (Participant 8). Both participants came
to realise that system would continue to evolve new songs
without their input and spent time towards the end of their
sessions observing the group behaviour.
Figure 4: An example of the interaction in the 1:3 study.
2:3 Interaction Working together the participants in the
third study quickly arrived at the conclusion that they needed
to take turns in order to interact with the robots. Participant
6 saw that the robots moved towards Participant 7 and asked
to be given one of the robots, Participant 7 replied No, they
have to go to you on their own, suggesting that Partici-
pant 7 recognised that the robots could not be commanded.
Later, the participants became competitive in their attempts
to attract the robots away from each other. As the partic-
ipants shared observations about the system, they explored
the transference of songs. By observing the interactions be-
tween Participant 7 and the robots, Participant 6 was able
to determine that the robots responded to songs of exactly 8
notes and that the robots would repeat the song 3 times while
it learned. At one point Participant 9 commented: ...when I
pressed it like this beep beep beep beep it went beep beep
boop beep so it was like changing what I played. These ob-
servations suggest that over time the participants were able
to build relatively accurate mental models of the processes
of the robotic agents.
Figure 5: An example of the interaction in the 2:3 study.
Discussion
Unlike traditional interactive systems that react to human
participants (Dezeuze 2010), the individual agents within
articial creative systems are continuously engaged in so-
cial interactions: the robots in our study would continue
to interact and share songs without the intervention of the
participants. While initially confusing, participants discov-
ered through extended observation and interaction that they
could inject songs into the society by teaching them to a sin-
gle robot. Participants sometimes also assumed the role of
learner and copied the songs of the robots and consequently
adopted an interaction strategy more like that of a peer.
The autonomous nature of the embodied creative system
runs counter to typical expectations of human-robot interac-
tions; making interacting with a group of robots is signi-
cantly more difcult than interacting with one. The prelim-
inary results presented here suggest that simple social poli-
cies in articial creative systems, e.g., the turn-taking be-
haviour, coupled with cues that indicate state, e.g., closing
the hood while practicing and composing songs, allow for
conversational interactions to emerge over time.
Conclusion
The development of embodied creative system offers signif-
icant opportunities and challenges for researchers in com-
putational creativity. This paper has presented a possible ap-
proach for the study of interaction design issues surrounding
the development of articial creative systems.
The Curious Whispers project explores the possibility of
developing articial creative systems that are open to these
types of peer-to-peer interactions through the construction of
a common ground based on the expression and perception
of artefacts. The research presented has shown that even a
simple robotic platform can be designed to exploit its phys-
ical embodiment as well as its social situation, using easily
obtained components.
The implemented system, while simple in terms of the
computational ability of the agents, has provided a useful
Proceedings of the Fourth International Conference on Computational Creativity 2013 208
platform for studying interactions between humans and ar-
ticial creative systems. The technical limitations of the
robotic platform place an emphasis on the important role
that communication plays in the evolution of creative sys-
tems, even with the restricted notion of what constitutes a
song in this initial exploration. Above all, the technology
probe methodology used in our observational studies have
illustrated the usefulness of implementing simple policies
in articial creative systems to allow human participants to
adapt to the unusual interaction model.
Acknowledgements
The research reported in this paper was supported as part
of the Bachelor of Design Computing Honours programme
in the Faculty of Architecture, Design and Planning at the
University of Sydney.
References
Bernsen, N., and Dybkjr, L. 2005. User interview-based
progress evaluation of two successive conversational agent
prototypes. In Maybury, M.; Stock, O.; and Wahlster, W.,
eds., Intelligent Technologies for Interactive Entertainment,
volume 3814. Springer Berlin / Heidelberg. 220224.
Biles, J. A. 1994. Genjam: A genetic algorithm for generat-
ing jazz solos. In Proceedings of the International Computer
Music Conference, 131137.
Boden, M. A. 2010. Creativity and Art: Three Roads to
Surprise. Oxford: Oxford University Press.
Brooks, R. 1990. Elephants dont play chess. Robotics and
Autonomous Systems 6:315.
Chee, E. 2011. Curious Whispers 2.0: Human-robot
interaction with an embodied creative system. Hon-
ours Thesis, University of Sydney, Australia. Available
online at http://emmachee.com/Thesis/Emma_
Chee_Thesis_2011.pdf.
Clark, A. 1996. Being There: Putting Brain, Body, and
World Together Again. Cambridge, MA, USA: MIT Press.
Dezeuze, A. 2010. The do-it-yourself artwork: Participa-
tion from the Fluxus of new media. Manchester: Manchester
University Press.
Gaye, L.; Maz e, R.; and Holmquist, L. E. 2003. Sonic city:
the urban environment as a musical interface. In Proceed-
ings of the 2003 Conference on New interfaces For Musical
Expression, 109115.
Gemeinboeck, P., and Saunders, R. 2011. Zwischenr aume:
The machine as voyeur. In Proceedings of the First Interna-
tional Conference on Transdisciplinary Imaging at the In-
tersections between Art, Science and Culture, 6270.
Glanville, R. 1996. Robin mckinnon-wood and gordon pask:
A lifelong conversation. Journal of Cybernetics and Human
Learning 3(4).
Glynn, R. 2008. Performative Ecologies: Dancers,
http://www.ruairiglynn.co.uk/portfolio/performative-
ecologies/.
Haque, U. 2007. The architectural relevance of Gordon
Pask. In 4d Social: Interactive Design Environments. Wiley
& Sons.
Hoffman, G., and Weinberg, G. 2010. Shimon: an interac-
tive improvisational robotic marimba player. In CHI 10 Ex-
tended Abstracts on Human Factors in Computing Systems,
CHI EA 10, 30973102. New York, NY, USA: ACM.
Hutchinson, H.; Mackay, W.; Westerlund, B.; Bederson,
B. B.; Druin, A.; Plaisant, C.; Beaudouin-Lafon, M.; Con-
versy, S.; Evans, H.; Hansen, H.; Roussel, N.; and Ei-
derb ack, B. 2003. Technology probes: inspiring design for
and with families. In Proceedings of the SIGCHI Conference
on Human Factors in Computing Systems, CHI 03, 1724.
New York, NY, USA: ACM.
Kapur, A.; Eigenfeldt, A.; Bahn, C.; and Schloss, W. A.
2009. Collaborative composition for musical robots. Journal
of Science and Technology of the Arts 1(1):4852.
Mackay, W. 1990. Users and Customizable Software: A Co-
Adaptive Phenomenon. Ph.D. Dissertation, Massachusetts
Institute of Technology.
Pease, A., and Colton, S. 2011. On impact and evaluation in
computational creativity: A discussion of the turing test and
an alternative proposal. In Proceedings of the AISB sympo-
sium on AI and Philosophy 2011.
Penny, S. 1997. Embodied cultural agents: At the inter-
section of art, robotics, and cognitive science. In Socially
Intelligent Agents: Papers from the AAAI Fall Symposium,
103105. AAAI Press.
Pickering, J. 2005. Embodiment, constraint and the cre-
ative use of technology. In Freedom and Constraint in the
Creative Process in Digital Fine Art.
Saunders, R., and Gero, J. S. 2002. How to study articial
creativity. In Proceedings of Creativity and Cognition 4.
Saunders, R.; Gemeinboeck, P.; Lombard, A.; Bourke, D.;
and Kocabali, B. 2010. Curious whispers: An embodied
articial creative system. In International Conference on
Computational Creativity 2010, 79 January 2010.
Straschnoy, A. 2008. The New Artist, http://www.the-new-
artist.info/.
Proceedings of the Fourth International Conference on Computational Creativity 2013 209
The role of motion dynamics in abstract painting
Alexander Schubert and Katja Mombaur
Interdisciplinary Center for Scientic Computing
University of Heidelberg
{alexander.schubert, katja.mombaur}@iwr.uni-heidelberg.de
Abstract
We investigate the role of dynamic motions performed
by artists during the creative process of art generation.
We are especially interested modern artworks inspired
by the Action Painting style of Jackson Pollock.
Our aim is to evaluate and model the role of these mo-
tions in the process of art creation. We are using mathe-
matical approaches from optimization and optimal con-
trol to capture the essence (cost functions of an opti-
mal control problem) of these movements, study it and
transfer it to feasible motions for a robot arm. Addition-
ally, we performed studies of human responses to paint-
ings assisted by an image analysis framework, which
computes several image characteristics. We asked peo-
ple to sort and cluster different action-painting images
and performed PCA and Cluster Analysis in order to
determine image traits that cause certain aesthetic expe-
riences in contemplators.
By combining these approaches, we can develop a
model that allows our robotic platform to monitor its
painting process using a camera system and based on
an evaluation of its current status to change its move-
ment to create human-like paintings. This way, we en-
able the robot to paint in a human-like way without any
further control from an operator.
Introduction
The cognitive processes of generating and perceiving ab-
stract art are in contrast to gurative art widely unknown.
When processing representational art works, the effect of
meaning is highly dominant. In abstract art, with the lack of
this factor, the processes of perception are much more am-
biguous, relying on a variety of more subtle qualities. In this
work, we focus on the role of dynamic motions performed
during the creation of an art work as one specic trait that
inuences our perception and aesthetic experience.
Action Paintings - Modern art works created by
dynamic motions
The term action painting was rst used in the essay The
American Action Painters (Rosenberg 1952). While the
term action painting is commonly used in public, art his-
torians sometimes also use the term Gestural Abstraction.
Both terms emphasize the process of creating art, rather than
the resulting art work, which reects the key innovation that
Figure 1: An action painting in the style of Jackson Pollock,
painted by JacksonBot
arose with this new form of painting in the 1940s to the
1960s. The style of painting includes dripping, dabbing and
splashing paint on a canvas rather than being applied care-
fully and in a controlled way. Art encyclopedias describe
these techniques as depending on broad actions directed by
the artists sense of control interacting with chance or ran-
dom occurrences. The artists often consider the physical
act of painting itself as the essential aspect of the nished
work. Regarding the contemplators, action paintings intend
to connect to them on a subconscious level. In 1950, Pol-
lock said The unconscious is a very important side of mod-
ern art and I think the unconscious drives do mean a lot in
looking at paintings(Ross 1990) and later, he stated Were
all of us inuenced by Freud, I guess Ive been a Jungian for
a long time(Rodman 1961). Clearly, artists like Pollock do
not think actively about dynamic motions performed by their
bodies the way as mathematicians from the area of model-
ing and optimal control do. But for us, it is very exciting,
that one of the main changes they applied to their painting
style in order to achieve their aim of addressing the subcon-
scious mind has been a shift in the manner they carry out
their motions during the creational process.
Proceedings of the Fourth International Conference on Computational Creativity 2013 210
Understanding the perception and generation of
action paintings
Since a human possesses much more degrees of freedom
than needed to move, human motions can often be seen
as a superposition of goal directed motions and implicit,
unconscious motions. The assumption, that elements of
human motions can be described in this manner has been
widely applied and veried, particularly in walking and run-
ning motions (Felis and Mombaur 2012),(Schultz and Mom-
baur 2010), but also (very recently) regarding emotional
body language during human walking (Felis, Mombaur, and
Berthoz 2012). If we transfer this approach to an artist, the
goal-directed motions are those carried out to direct his hand
(or rather a brush or tool) to the desired position, the implicit,
unconscious motions are the result of an implicit solved op-
timal control problem with a certain cost function like max-
imizing stability or minimizing energy costs.
When looking at action paintings, we note, that this form
of art generation is a very extreme form of this superposition
model with a widely negligible goal-directed part. There-
fore, it is a perfect basis to study the role of (unconscious)
motion dynamics on a resulting art work. Jackson Pollock
himself expressed similar thoughts when he said The mod-
ern artist... is working and expressing an inner world in
other words expressing the energy, the motion, and other
inner forces or When youre working out of your uncon-
scious, gures are bound to emerge... Painting is a state of
being (Rodman 1961).
However, the role of motion dynamics in the embodied
expression of artists has been poorly described so far, sup-
posedly due to the lack of an adequate method for the ac-
quisition of quantitative data. The goal of our project is to
use state-of-the-art tools from scientic computing to ana-
lyze the impact of motion dynamics both on the creational
and perceptual side of action-painting art works. Therefore,
we perform perception studies with contemplators and ex-
perimental studies concerning motion generation, which are
linked by a robotic platform as a tool that can precisely re-
produce different motion dynamics. Using this approach, we
want to determine key motion types inuencing a paintings
perception.
Models of art perception
The perception of art, especially abstract art, is still an area
of ongoing investigations. Therefore, no generally accepted
theory including all facets of art perception exists. There are,
however, different theories that can explain different aspects
of art perception. One example of a theory of art percep-
tion is the one presented in (Leder et al. 2004) (see gure
2). In the past, resulting from an increasing interest in em-
bodied cognition and embodied perception, there has been
a stronger focus on the nature of human motion and its dy-
namics regarding neuroscience or rather neuroaesthetics as
well as psychology and history of art. There are several re-
sults, showing that we perceive motion and actions with a
strong involvement of those brain regions that are responsi-
ble for motion and action generation (Buccino et al. 2001).
The mirror neurons located in these brain regions re both,
Figure 2: Overview of the aesthetic judgment model by
(Leder et al. 2004)
when an action is actively performed and when the same ac-
tion is being observed. These ndings support the theory,
that the neural representations for action perception and ac-
tion production are identical (Buxbaum, Kyle, and Menon
2005). The relation between perception and embodied ac-
tion simulation also exists for static scenes (Urgesi et al.
2006) and ranges even to the degree, where the motion is
implied only by a static result of this very motion. For ex-
ample, (Knoblich et al. 2002) showed, that the observation
of a static graph sign evokes in the brain a motor simulation
of the gesture, which is required to produce this graph sign.
Finally, in (Freedberg and Gallese 2007), it was proposed
that this effect of reconstructing motions by embodied sim-
ulation mechanisms will also be found when looking at art
works that are characterized by the particular gestural traces
of the artist, as in Fontana and Pollock.
Mathematical background
To perform mathematical computations on motion dynam-
ics, we rst need to create models of a human and the robot
arm. Both can be considered as systems of rigid bodies,
which are connected by different types of joints (prismatic or
revolute). By model, we mean a mathematical description
in terms of differential equations of the physical characteris-
tics of the human arm an the robot accordingly. Depending
on the number of bodies and joints, we end up with an cer-
tain number of degrees of freedom. For each body, we get a
set of generalized variables q (coordinates), q (velocities), q
(accelerations), and (joint torques). Given such a model,
we can fully describe its dynamics by means of
M(q) q + N(q, q) = (1)
where M(q) is the joint space inertia matrix and N(q, q)
contains the generalized non-linear effects. Once we have
such a model, we can formulate our optimal control problem
using x = [q, q]
T
as states and u = as controls. The OCP
Proceedings of the Fourth International Conference on Computational Creativity 2013 211
Figure 3: Interface for web-based similarity ratings
can be written in its general form as:
min
x,u,T
Z
T
0
L(t, x(t), u(t), p)dt +
M
(T, x(T))
(2)
subject to:
x = f(t, x(t), u(t), p)
g(x(t), p) = 0
h(t, x(t), u(t), p) 0
Note, that all the dynamic computation from our model
is included in the RHS of the differential equation x =
f(t, x(t), u(t), p). The rst part of our objective func-
tion,
R
T
0
L(t, x(t), u(t), p)dt is called the Lagrange term,
M
(T, x(T)) is called the Mayer term. The former is used
to address objectives that have to be evaluated over the
whole time horizon (such as minimizing jerk), the latter is
used to address objectives that only need to be evaluated
at the end of the time horizon (such as overall time). In
our case, we will often only use the Lagrange term. To
solve such a problem numerically, we apply a direct mul-
tiple shooting method which is implemented in the software
package MUSCOD-II. For a more detailed description of
the algorithm, see (Bock and J. 1984; Leineweber et al.
2003).
Experimental Data
Perception experiments
We performed two pre-studies to nd out, whether human
contemplators can distinguish robot paintings from human-
made paintings and how they evaluate robot paintings cre-
ated by different mathematical objective functions.
In the rst study, we showed nine paintings to 29 partici-
pants, most of whom were laymen in arts and only vaguely
familiar with Jackson Pollock. Seven paintings were orig-
inal art works by Jackson Pollock and two paintings were
generated by the robot platform JacksonBot. We asked the
participants to judge, which of the paintings were original
paintings by Pollock and which were not, but we inten-
tionally did not inform them about the robotic background
of the fake paintings. As might be expected, the orig-
inal works by Pollock had a higher acceptance rate, but,
Figure 4: Interface for web-based sorting studies
very surprisingly, the difference between Pollocks and Jack-
sonBots paintings was not very high (2.74 + / 0.09 vs.
2.85 + / 0.76, on a scale of 1 - 5).
In the second study, the participants were shown 10 paint-
ings created solely by the robot platform, but with two oppo-
site objective functions (maximum and minimum overall an-
gular velocity in the robot arm) in the optimal control prob-
lem. The participants easily distinguished the two different
painting styles.
Since the pre-studies were only conducted to get a rather
rough idea on this aspect, we developed a more sophisticated
web-based platform for further, more detailed investigations
on this subject. The data obtained from this tool can be used
to enhance the robots ability to monitor its painting process.
The set of stimuli used for our studies consists of original
action-art paintings by Pollock and other artists and images
that were painted by our robot platform.
In the rst task, contemplators are presented three ran-
domly chosen paintings
1
and asked to arrange them on the
screen according to their similarity (see gure 3). If they
want, they are free to add a commentary to indicate their
thoughts while arranging the paintings. As a result, we ob-
tain for every set of two paintings a measure for their sim-
ilarity in comparison with any other set of two paintings
2
.
Using standard procedures from statistics like cluster analy-
sis, we can determine which paintings are overall rated more
similar than others.
In the second task, people are asked to perform a standard
sorting study, i.e. they are asked to combine similar paint-
ings in groups and to give some information on why they
formed specic groups. The results of this task are used to
validate the information obtained by the previous one and,
additionally, they are used to gain more information about
the attributes and traits, people seem to use while grouping.
Therefore, the set of possible tags for the formed groups is
limited and chosen by us. Is includes very basic image char-
acteristics like colour as well as more interesting character-
1
more precisely, the paintings are not chosen purely random
but there is a slight correction to the probability of each painting to
be presented in order to get many different correlations even when
participants only complete few repetitions
2
Note that we do not use the absolute values of similarity but
quotients of these in order to avoid offset problems
Proceedings of the Fourth International Conference on Computational Creativity 2013 212
Figure 5: recorded acceleration data for a 3sec motion
istics like associated emotions.
Motion capture experiments
In order to study the way real human artists move during
action-painting, we chose to do motion-capture studies with
our collaborating artist. As a rst approach, we used three
inertia sensors to record dynamic data D
capture
. For each of
the three segments of the artists arm (hand, lower arm, up-
per arm), we recorded accelerations, angular velocities and
the rotation matrix
3
using three Xsens MTw inertial motion
trackers. The sensors were placed directly above the calcu-
lated center of mass of each arm segment. Figure 5 shows an
example of the raw data output obtained from the sensors.
We asked the artist to create different paintings and to de-
scribe her creative ideas as well as her thoughts and emo-
tions during the process with her own words. That way,
we can correlate identied objective functions with specic
emotions or creative ideas.
Robot painting experiments
For rst experiments, we created paintings with our robot
platform. In order to compute the robot joint trajectories
necessary to move along a desired end effector path, we use
an optimal control based approach to solve the inverse kine-
matics problem. Using our rst robotic platform, we created
several paintings using different cost functions in the opti-
mal control problem. Two of them maximizing and mini-
mizing the angular velocities in the robot joints resulted in
signicantly different paintings. These paintings were used
in the pre-study mentioned earlier.
Data Analysis
Motion reconstruction
To t the record dynamic data D
capture
to our 9 DOF model
of a human arm that is based on data from (De Leva 1996),
we formulated an optimal control problem which generates
the motion x(t) = [q(t), q(t)]
T
and the controls u(t) = (t)
that best t the captured data with respect to the model dy-
namics f.
min
x,u
1
2
||D
capture
(t) D
Simulated
(t)||
2
2
(3)
subject to:
x(t) = f(t, x(t), u(t), p)
g(x, p) = 0
h(x, p) 0
3
recording the euler angles is not sufcient due to potential sin-
gularities in the reconstruction process
Figure 6: Computed trajectories for joint angles (left) and
comparison of computed (lines) and measured (dots) accel-
erations (right).
The constraints in this case are given by the limited angles
of the human arm joints and torque limitations of the arm
muscles. The computed states and the t quality of the ac-
celeration data can bee seen in gure 6. Note that the angle
approach to the joint limitations is plausible for this type of
motion.
In the next step, we will use the motion capture data ob-
tained from experiments with our collaborating artist not
only reconstruct the motion, but use an inverse optimal con-
trol approach (like successfully used in a similar case in
(Mombaur, Truong, and Laumond 2010)) to retrieve the un-
derlying objective functions of these motions. To do so,
we will use an approach developed by K.Hatz in (Hatz,
Schl oder, and Bock 2012). This process is illustrated in g-
ure 7.
Conclusion and Outlook
We introduced a new way to analyze the creative process
of action painting by investigating the dynamic motions of
artists. We developed a mathematical model, which we
used to succesfully reconstructed an artists action-painting-
motions from inertia measurements. We used state-of-the-
art optimal control techniques to create new action-painting-
motions for a robotic platform and evaluated the result-
ing painting. Even with articial objective functions, we
were able to create action paintings that are indistinguish-
able from human-made action paintings for a human con-
templator.
In the next step, we will use an inverse optimal control ap-
proach to go one step further from reconstructing an artists
motions to identifying the underlying objective functions of
motion dynamics. That way, we will be able to generate spe-
cic painting motions corresponding to specic intentions as
formulated by the artist.
Since several studies, e.g. (Haak et al. 2008), have shown
that aesthetic experiences and judgments can up to a cer-
tain degree be explained by analyzing low-level image
features, we chose to develop an image analysis software
tool based on OpenCV that uses a variety of different l-
ters and image processing tools that are related to aesthetic
experience. Amongst other features, our tool analyzes the
paintings considering its power spectrum, different symme-
tries, color and fractal analysis (Taylor, Micolich, and Jonas
1999). We will include the information obtained from our
online perception studies in this tool and use it as feedback
Proceedings of the Fourth International Conference on Computational Creativity 2013 213
Figure 7: Transfer of human motion objectives to a robot
platform (schematic overview)
for the robot platform. That way, we will enable it to paint
autonomously with feedback only from an integrated cam-
era monitoring the process.
The presented approach of capturing the essence of dy-
namic motions using inverse optimal control theory is not
limited to the investigation of action paintings but can be
used to analyze human motions in other art forms like dance
or even in daily life by analyzing human gestures or full-
body motions.
References
Bock, H.-G., and J., P. K. 1984. A multiple shooting algo-
rithm for direct solution of optimal control problems. Pro-
ceedings 9th IFAC World Congress Budapest.
Buccino, G.; Binkofski, F.; Fink, G. R.; Fadiga, L.; Fogassi,
L.; Gallese, V.; Seitz, R. J.; Zilles, K.; Rizzolatti, G.; and
Freund, H.-J. 2001. Action observation activates premotor
and parietal areas in a somatotopic manner: an fmri study.
European Journal of Neuroscience 13:400404.
Buxbaum, L. J.; Kyle, K. M.; and Menon, R. 2005. On
beyond mirror neurons: internal representations subserv-
ing imitation and recognition of skilled object-related ac-
tions in humans. Brain research.Cognitive brain research
25(1):226239.
De Leva, P. 1996. Adjustments to zatsiorsky-seluyanovs
segment inertia parameters. Journal of Biomechanics
29(9):12231230.
Felis, M., and Mombaur, K. 2012. Modeling and optimiza-
tion of human walking. to appear in Springer LNEE.
Felis, M.; Mombaur, K.; and Berthoz, A. 2012. Mathe-
matical modeling of emotional body language during human
walking. submitted to Proceedings of HPSC 2012.
Freedberg, D., and Gallese, V. 2007. Motion, emotion and
empathy in esthetic experience. Trends in Cognitive Sci-
ences 11(5):197203.
Haak, K.; Jacobs, R.; Thumfart, S.; Henson, B.; and Cor-
nelissen, F. 2008. Aesthetics by numbers: computationally
derived features of visual textures explain their aesthetics
judgment. Perception 37.
Hatz, K.; Schl oder, J.; and Bock, H. G. 2012. Estimating pa-
rameters in optimal control problems. SIAM J. Sci. Comput.
34(3):A1707 A1728.
Knoblich, G.; Seigerschmidt, E.; Flach, R.; and Prinz, W.
2002. Authorship effects in the prediction of handwriting
strokes: evidence for action simulation during action per-
ception. Q J Exp Psychol A 55(3):102746.
Leder, H.; Belke, B.; Oeberst, A.; and Augustin, D. 2004.
A model of aesthetic appreciation and aesthetic judgments.
British Journal of Psychology 95(4):489+.
Leineweber, D. B.; Bauer, I.; Bock, H.-G.; and Schl oder,
J. P. 2003. An efcient multiple shooting based reduced sqp
strategy for large-scale dynamic process optimization. part
1: theoretical aspects. Computers & Chemical Engineering
27(2):157166.
Mombaur, K.; Truong, A.; and Laumond, J.-P. 2010. From
human to humanoid locomotion an inverse optimal control
approach. Autonomous Robots 28(3):369383.
Rodman, S. 1961. Conversations with Artists. Capricorn
Books.
Rosenberg, H. 1952. The american action painters. Art
News 51/8.
Ross, C. 1990. Abstract expressionism: creators and critics:
an anthology. Abrams.
Schultz, G., and Mombaur, K. 2010. Modeling and optimal
control of human-like running. IEEE/ASME Transactions
on Mechatronics 15(5):783792.
Taylor, R. P.; Micolich, A. P.; and Jonas, D. 1999. Fractal
analysis of Pollocks drip paintings. Nature 399(6735):422.
Urgesi, C.; Moro, V.; Candidi, M.; and Aglioti, S. 2006.
Mapping implied body actions in the human motor system.
J Neurosci 26(30):79429.
Proceedings of the Fourth International Conference on Computational Creativity 2013 214
Creative Machine Performance: Computational Creativity and Robotic Art
Petra Gemeinboeck
College of Fine Art
University of NSW
NSW 2021 Australia
[email protected]
Rob Saunders
Design Lab
University of Sydney
NSW 2006 Australia
[email protected]
Abstract
The invention of machine performers has a long tradi-
tion as a method of philosophically probing the nature
of creativity. Robotic art practices in the 20th Century
have continued in this tradition, playfully engaging the
public in questions of autonomy and agency. In this
position paper, we explore the potential synergies be-
tween robotic art practice and computational creativ-
ity research through the development of robotic per-
formances. This interdisciplinary approach permits the
development of signicantly new modes of interaction
for robotic artworks, and potentially opens up compu-
tational models of creativity to rich social and cultural
environments through interaction with audiences. We
present our exploration of this potential with the de-
velopment of Zwischenr aume (In-between Spaces), an
artwork that embeds curious robots into the walls of a
gallery. The installation extends the traditional relation-
ship between the audience and artwork such that visitors
to the space become performers for the machine.
Introduction
This paper looks at potential synergies between the prac-
tice of robotic art and the study of computational creativity.
Starting from the position that creativity and embodiment
are critically linked, we argue that robotic art provides a rich
experimental ground for applying models of creative agency
within a public forum. From the robotic art perspective, a
computational creativity approach expands the performative
capacity of a robotic artwork by enhancing its potential to
interact with its Umwelt (Von Uexk ull 1957).
In the 18th century, the Industrial Age brought with
it a fascination with mechanical performers: Jacques de
Vaucansons Flute Player automaton and Baron Wolfgang
von Kempelens infamous chess playing Mechanical Turk
clearly demonstrate a desire to create apparently creative au-
tomata. Through their work, both Vaucanson and von Kem-
pelen engaged the public in philosophical questions about
the nature of creativity, the possibilities of automation and,
crucially, perfection.
Moving from mechanical to robotic machine performers,
artists have deployed robotics to create apparently living and
behaving creatures for over 40 years. The two dominant
motivations for this creative practice have been to question
our premises in conceiving, building, and employing these
electronic creatures (Kac 2001), and to develop enhanced
forms of interactions between machine actors and humans
via open, non-determined modes (Reichle 2009).
The pioneering cybernetic work Senster by Edward Ihna-
towicz, for example, exhibited life-like movements and was
programmed to shy away from loud noises. In contrast
to the aforementioned automata, Ihnatowicz did not aim to
conceal the Sensters inner workings, and yet the publics
response was to treat it as if it were a wild animal (Rieser
2002). Norman Whites Helpless Robot (198796) was a
public sculpture, which asked for help to be moved, and
when assisted, continued to make demands and increasingly
abused its helpers (Kac 1997). Petit Mal by Simon Penny
resembled a strange kind of bicycle and reacted to and pur-
sued gallery visitors. With this work Penny aimed to explore
the aesthetics of machines and their interactive behaviour in
real world settings; Petit Mal was, in Pennys words, an ac-
tor in social space (Penny 2000). Ken Rinaldos Autopoe-
sis consisted of 15 robotic sculptures and evolved collective
behavior based on their capability to sense each others and
the audiences presence (Huhtamo 2004). The installation
Fish-Bird by Mari Velonaki comprised two robotic actors
in the form of wheelchairs whose movements and written
notes created a sense of persona. The relationship between
the robot characters and the audience evolved based on au-
tonomous movement, coordinated by a central controller,
and what appeared to be personal, handwritten messages,
printed by the robots (Rye et al. 2005).
Our fascination with producing artefacts that appear to be
creative has created a rich history for researchers of compu-
tational creativity to draw upon. What we learn from these
interdisciplinary artistic approaches is that, as performers,
the articial agents are embodied and situated in ways that
can be socially accessed, shared and experienced by audi-
ences. Likewise, embodied articial agents gain access to
shared social spaces with other creative agents, e.g., audi-
ence members.
The ability of robotic performers to interact with the au-
dience not only relies on the robots behaviours and respon-
siveness but also the embodiment and enactment of these
behaviours. It can be argued that the performer is most suc-
cessful if both embodiment and enactment reect its per-
ception of the world, that is, if it is capable of expressing
and communicating its disposition. Looking at robotic art-
Proceedings of the Fourth International Conference on Computational Creativity 2013 215
works that explore notions of autonomy and articial cre-
ativity may thus offer starting points for thinking about so-
cial settings that involve humans interacting and collaborat-
ing with creative agents.
Our exploration revolves around the authors collabora-
tion to develop the robotic artwork Zwischenr aume (In-
between Spaces), a machine-augmented environment, for
which we developed a practice embedding embodied curi-
ous agents into the walls of a gallery, turning them into a
playground for open-ended exploration and transformation.
Zwischenr aume
The installation Zwischenr aume embeds autonomous robots
into the architectural fabric of a gallery. The machine agents
are encapsulated in the wall, sandwiched between the ex-
isting wall and a temporary wall that resembles it. At the
beginning of an exhibition, the gallery space appears empty,
presenting an apparently untouched familiar space. From
the start, however, the robots movements and persistent
knockings suggest comprehensive machinery at work inside
the wall. Over the course of the exhibition, the wall increas-
ingly breaks open, and congurations of cracks and hole pat-
terns mark the robots ongoing sculpting activity (Figure 1).
Figure 1: Zwischenr aume: curious robots transform our fa-
miliar environment.
The work uses robotics as a medium for intervention: it is
not the spectacle of the robots that we are interested in, but
rather the spectacle of the transformation of their environ-
ment. The starting point for this interdisciplinary collabora-
tion was our common interest in the open-ended potential of
creative machines to autonomously act within the human en-
vironment. From the computational creativity researchers
point of view, the embodied nature of the agents allowed for
situating and studying the creative process within a complex
material context. For the artist, this collaboration opened up
the affective potential to materially intervene into our famil-
iar environment, bringing about a strange force, seemingly
with an agenda and beyond our control.
Each machine agent is equipped with a motorised ham-
mer, chisel or punch, and a camera to interact and net-
work with the other machines by re-sculpting its environ-
ment (Figure 2). The embodied agents are programmed to
be curious, and as such intrinsically motivated to explore
the environment. Once they have created large openings in
the wall the robots may study the audience members as part
of their environment. In the rst version of this work, the
robots used their hammer to both punch holes and for com-
municating amongst the collective. In a later version, we
experimented with a more formal sculptural approach that
used heuristic compositions of grafti glyphs to perforate
walls. Using the more stealthy movements of a chisel, the
work responded to the specic urban setting of the gallery
by adapting grafti that covered the exterior of the building
to become an inscription, pierced into the pristine interior
walls of the gallery space (Figure 3). The nal version of
Zwischenr aume used a punch to combine the force of the
hammer and the precision of the chisel.
Figure 2: Robot gantries are attached to walls.
Similar to Jean Tinguelys kinetic sculptures (Hult en
1975), Zwischenr aumes performance and what it produces
may easily evoke a sense of dysfunctionality. As the ma-
chines adaptive capability is driven by seemingly non-
rational intentions rather than optimisation, the work, in
some sense, subverts standard objectives for machine intel-
ligence and notions of machine agency. Rather, it opens up
the potential for imagining a machine that is free, a ma-
chine that is creative, see (Hult en 1987).
Machine Creativity
This section focuses on the development of the rst version
of Zwischenr aume as depicted in Figures 1 and 2. Each
robotic unit consisted of a carriage, mounted on a vertical
gantry, equipped with a camera mounted on an articulated
arm, a motorised hammer, and a contact microphone. The
control system for the robots combined machine vision to
detect features from the camera with audio processing to de-
tect the knocking of other robots and computational models
of intrinsic motivation based on unsupervised and reinforce-
Proceedings of the Fourth International Conference on Computational Creativity 2013 216
Figure 3: Inscription of adapted grafti glyphs.
ment machine learning to produce an adaptive, autonomous
and self-directed agency.
The robots vision system was developed to construct
multiple models of the scene in front of the camera; using
colour histograms to differentiate contexts, blob detection
to detect individual shapes, and frame differencing to de-
tect motion. Motion detection was only used to direct the
attention of the vision system towards areas of possible in-
terest within the eld of view. Face detection is also used
to recognise the presence of people to direct the attention
of the robots towards visitors. While limited, these percep-
tual abilities provide sufcient richness for the learning al-
gorithms to build models of the environment to determine
what is different enough to be interesting.
Movements, shapes, sounds and colours are processed,
learned and memorised, allowing each robotic agent to de-
velop expectations of events in their surrounds. The machine
learning techniques used in Zwischenr aume combine un-
supervised and reinforcement learning techniques (Russell
and Norvig 2003): a self-organizing map (Kohonen 1984)
is used to determine the similarity between images captured
by the camera; Q-learning (Watkins 1989) is used to allow
the robots to discover strategies for moving about the wall,
using the hammer and positioning the camera.
Separate models are constructed for colours and shapes
in images. To determine the novelty of a context, sparse his-
tograms are constructed from captured images based on only
32 colour bins with a high threshold, so only the most sig-
nicant colours are represented and compared using a self-
organising map. Blob detection in low-resolution (32x32
pixel) images, relative to a typical model image of the wall,
is used to discover novel shapes and encoded in a self-
organising map as a binary vector. In both cases, the differ-
ence between known prototypes in the self-organising map
provide a measure of novelty (Saunders 2001).
Reinforcement learning is used to learn the consequences
of movements within the visual eld of the camera. Error
in prediction between learned models of consequences and
observed results is used as a measure of surprise. As a result
systemthat is able to learn a small repertoire of skills and ap-
preciate the novelty of their results, e.g., knocking on wood
does not produce dents. This ability is limited to immedi-
ate consequences of actions and does not current extend to
sequences of actions.
The goal of the learning system is to maximise an in-
ternally generated reward for capturing interesting images
and to develop a policy for generating rewards through ac-
tion. Interest is calculated based on a computational model
that captures intuitive notions of novelty and surprise (Saun-
ders 2001): novelty is dened as a difference between an
image and all previous images taken by the robot, e.g., the
discovery of signicant new colours or shapes; and, sur-
prise is dened as the unexpectedness of an image within a
known situation, e.g., relative to a learned landmark or after
having taken an action within an expected outcome (Berlyne
1960). Learning plays a critical role in both the assessment
of novelty and surprise. In novelty, the robots have to learn
suitably general prototypes for the different types of images
that they encounter. In surprise, the situation against which
images are judged includes a learned model of the conse-
quences of actions (Clancey 1997).
Consequently, intrinsic motivation to learn directs both
the robots gaze and its actions, resulting in a feedback
process that increases the complexity of the environment
through the robots knocking relative to the perceptual
abilities of the agent. Sequences of knocking actions are
developed, such that the robots develop a repertoire of ac-
tions that produce signicant perceived changes in terms
of colour, shapes and motion. In this way, the robots ex-
plore their creative potential in re-sculpting their environ-
ment. Figure 4 presents a collage of images taken by a single
robot when it discovered something interesting, illustrating
how the evaluation of interesting evolved for this robot; it
shows how the agents interest is affected by: (a) positioning
of the camera, e.g., the discovery of lettering on the plaster-
board wall; (b) use of the hammer, e.g., the production of
dents and holes; and, (c) interaction of visitors.
Figure 4: Robot captures, showing the evolution of interest-
ing changes in the environment.
Proceedings of the Fourth International Conference on Computational Creativity 2013 217
Discussion
The robots creative process turns the wall into a playful
environment for learning, similar to a sandpit; while from
the audiences point of view, the wall is turned into a per-
formance stage. This opens up a scenario of encounter for
studying the potential of computational creativity and the
role of embodiment. Following Pickering (2005), we argue
that creativity cannot be properly understood, or modelled,
without an account of how it emerges from the encounter
between the world and intrinsically active, exploratory and
productively playful agents.
Embodiment and Creativity
The agents embodiment provides opportunities to expand
their behavioural range by taking advantage of properties of
the physical environment that would be difcult or impos-
sible to simulate computationally (Brooks 1990). In Zwis-
chenr aume the machines creative agency is not predeter-
mined but evolves based on what happens in the environ-
ment they examine and manipulate. As the agents embodi-
ment evolves based on its interaction with the environment,
the robots creative agency affects processes out of which it
itself is emergent.
This resonates with Barads argument that agency is a
matter of intra-acting: it is an enactment, not something that
someone or something has (Barad 2007). It also evokes
Maturana and Varelas notion of enaction, where the act
of bringing about a world occurs through the structural
coupling between the dynamical environment and the au-
tonomous agents (Maturana and Varela 1987). While the
machines perturb and eventually threaten the walls struc-
tural integrity, they adapt to their changing environment, the
destruction of the wall and how it changes their perception
of the world outside.
The connection to creativity is two-fold: Firstly, the
robots intrinsic motivation to explore, discover and con-
stantly produce novel changes to their environment demon-
strates a simplistic level of a creative process itself, akin to
the act of doodling, where the motivation is a reective ex-
ploration of possibilities rather than purposeful communica-
tion with others. Secondly, the audiences interpret the ma-
chines interactions based on their own context, producing
a number of possible meaningful relations and associations.
The agents embodiment and situatedness becomes a por-
tal for entering the human world, creating meaning. The
agents enacted perception also provides a window on the
agents viewpoint, thus possibly changing the perspective of
the audience.
Furthermore, an enactive approach (Barad 2003; Clark
1998; Thompson 2005) opens up alternative ways of think-
ing about creative human-machine collaborations. It makes
possible a re-thinking of human-machine creativity beyond
the polarisation of human and non-human, one that pro-
motes shared or distributed agency within the creative act.
Audience Participation
Autonomous, creative machine performances challenge the
most common interaction paradigm of primarily reacting to
what is sensed, often according to a pre-mapped narrative.
Zwischenr aumes curious agents proactively seek interac-
tion, rather than purely responding to changes in the sur-
rounds. Once the robots have opened up the wall, the ap-
pearance and behaviours of audience members are perceived
by the system as changes in their environment and become
an integral part of the agents intrinsic motivation system.
The agents behaviours adapt based on their perception
and evaluation of their environment, including the audience,
as either interesting or boring. A curious machine performer
whose behaviors are motivated by what it perceives and ex-
pects can be thought of as an audience to the audiences per-
formance. Thus, in Zwischenr aume it is not only the robots
that perform, but also the audience that provokes, entertains
and rewards the machines curiosity. This notion of audi-
ence participation expands common interaction paradigms
in interactive art and media environments (Paul 2003). The
robots dont only respond or adapt to the audiences pres-
ence and behaviours, but also have the capacity to perceive
the audience with a curious disposition.
By turning around the traditional relationship between au-
diences and machinic performers, the use of curious robotic
performers permits a re-examination of the machine spec-
tacle. Lazardig (2008) argues that spectacle, as a perfor-
mance aimed at an audience, was central to the conception
of the machine in the 17th century as a means of projecting
a perception of utility; allowing the machine to become an
object of admiration and therefore guaranteed to function.
Kinetic sculptures and robotic artworks exploit and promote
the power of the spectacle in their relationship with the audi-
ence. This is also the case in Zwischenr aume however, it is
not only the machines that are the spectacle for the audience
but also the audience that becomes an object of curiosity
for the machines (Figure 5). Thus the relationship with a
curious robot extends the notion of the spectacle, and, in a
way, brings it full circle.
Figure 5: Gallery visitor captured by one of the robots cam-
eras as he performs for the robotic wall.
Proceedings of the Fourth International Conference on Computational Creativity 2013 218
Concluding Remarks
A signicant aspect of Zwischenr aumes specic embodi-
ment is that it embeds the creative agents in our familiar (hu-
man) environment. This allowed us to direct both our, and
the audiences, attention to the autonomous process and cre-
ative agency, rather than the spectacle of the machine. The
integration of computational models of creativity into this
artwork extended the range of open-ended, non-determined
modes of interaction with the existing environment, as well
as between the artwork and the audience.
We argue that it is both, the embodied nature of the agents
and their autonomous creative capacity that allows for novel
meaningful interactions and relationships between the art-
work and the audience. The importance of embodiment for
computational creativity can also be seen in the improvising
robotic marimba player Shimon, which uses a physical ges-
ture framework to enhance synchronised musical improvi-
sation between human and nonhuman musicians (Hoffmann
and Weinberg 2011). The robot players movements not
only produce sounds but also play a signicant role in per-
forming visually and communicatively with the other (hu-
man) band members as well as the audience.
Embodying creative agents and embedding them in our
everyday or public environment is often messier and more
ambiguous than purely computational simulation. What we
gain, however, is not only a new shared embodied space for
audience experience but also a new experimentation space
for shared (human and non-human) creativity.
Acknowledgements
This research has been supported by an Australia Research
Council Grant, a New Work Grant from the Austrian Fed-
eral Ministry for Education, Arts and Culture, and a Faculty
Research Grant from COFA (University of NSW).
References
Barad, K. 2003. Posthumanist performativity: Toward an
understanding of how matter comes to matter. Signs: Jour-
nal of Women in Culture and Society 23(1):801831.
Barad, K. 2007. Meeting the Universe Halfway: Quan-
tum Physics and the Entanglement of Matter and Meaning.
Durham, NC: Duke University Press.
Berlyne, D. E. 1960. Conict, Arousal, and Curiosity. New
York, NY: McGraw Hill.
Brooks, R. 1990. Elephants dont play chess. Robotics and
Autonomous Systems 6:315.
Clancey, W. J. 1997. Situated Cognition: On Human Knowl-
edge and Computer Representations. Cambridge, England:
Cambridge University Press.
Clark, A. 1998. Where brain, body and world collide.
Daedalus: Journal of the American Academy of Arts and
Sciences 127(2):257280.
Hoffmann, G., and Weinberg, G. 2011. Interactive improvi-
sation with a robotic marimba player. In Autonomous Robots
31, 133153. Springer.
Huhtamo, E. 2004. Trouble at the interface, or the identity
crisis of interactive art. Framework: The Finnish Art Review
(2):3841.
Hult en, P. 1975. Tinguely. M eta. London, UK: Thames &
Hudson.
Hult en, P. 1987. Jean Tinguely. A Magic Stronger than
Death. New York, NY: Abbeville Press.
Kac, E. 1997. Foundation and development of robotic art.
Art Journal 56(3):6067.
Kac, E. 2001. Towards a chronology of robotic art. Conver-
gence: The Journal of Research into New Media Technolo-
gies 7(1).
Kohonen, T. 1984. Self-Organization and Associative Mem-
ory. Berlin: Springer.
Lazardig, J. 2008. The machine as spectacle: Function
and admiration in seventeenth-century perspectives on ma-
chines. In de Gryter, W., ed., Instruments in art and science:
on the architectonics of cultural boundaries in the 17th cen-
tury. 152175.
Maturana, H., and Varela, F. 1987. The Tree of Knowledge:
The biological roots of human understanding. Boston, MA:
Shambhala Publications.
Paul, C. 2003. Digital Art. London, UK: Thames & Hudson.
Penny, S. 2000. Agents as artworks and agent design as
artistic practice. In Dautenhahn, K., ed., Human Cognition
and Social Agent Technology. John Benjamins Publishing
Co. 395414.
Pickering, J. 2005. Embodiment, constraint and the cre-
ative use of technology. In Freedom and Constraint in the
Creative Process in Digital Fine Art.
Reichle, I. 2009. Art in the Age of Technoscience: Genetic
Engineering, Robotics, and Articial Life in Contemporary
Art. Wien: Springer.
Rieser, M. 2002. The art of interactivity: from gallery to
street. In Mealing, S., ed., Computers and Art. Bristol, UK:
Intellect. 8196.
Russell, S. J., and Norvig, P. 2003. Articial Intelligence: A
Modern Approach. New Jersey, NY: Prentice Hall.
Rye, D.; Velonaki, M.; Williams, S.; and Scheding, S. 2005.
Fish-bird: Human-robot interaction in a contemporary arts
setting. In Proceedings of the 2005 Australasian Conference
on Robotics and Automation.
Saunders, R. 2001. Curious Design Agents and Articial
Creativity. Ph.d. thesis, Faculty of Architecture, The Uni-
versity of Sydney.
Thompson, E. 2005. Sensorimotor subjectivity and the en-
active approach to experience. Phenomenology and the Cog-
nitive Sciences 4(4):407427.
Von Uexk ull, J. 1957. A stroll through the worlds of animals
and men: A picture book of invisible worlds. In Schiller,
C., ed., Instinctive Behavior: The Development of a Modern
Concept. New York, NY: Intl Universities Press. 580.
Watkins, C. 1989. Learning from Delayed Rewards. Phd
thesis, Cambridge University, Cambridge, England.
Proceedings of the Fourth International Conference on Computational Creativity 2013 219
An Artificial Intelligence System to Mediate the Creation of Sound and Light
Environments
Claudio Benghi
Northumbria University,
Ellison Building,
Newcastle upon Tyne, NE1 8ST, England
[email protected]
Gloria Ronchi
Aether & Hemera,
Kingsland Studios, Priory Green,
Newcastle upon Tyne, NE6 2DW, England
[email protected]
Introduction
This demonstration presents the IT elements of an art in-
stallation that exhibits intelligent reactive behaviours to
participant input employing Artificial Intelligence (AI)
techniques to create unique aesthetic interactions.
The audience is invited to speak into a set oI microphones;
the system captures all the sounds perIormed and uses
them to seed an AI engine for creating a new soundscape
in real time, on the base of a custom music knowledge re-
pository. The compositions is played back to the users
through surrounding speakers and accompanied with syn-
chronised light events of an array of coloured LEDs.
This art work allows viewers to become active participants
in creating multisensory computer-mediated experiences,
with the aim oI investigating the potential for creative
forms of inter-authorship.
Software Application
The installations software has been built as a custom event
manager developed under the .Net Iramework that can re-
spond to events Irom the users, timers, and the UI cascad-
ing them through the required algorithms and libraries as a
Iunction oI speciIied interaction settings; this solution al-
lowed swiIt changes to the behaviour oI the artwork in
response to the observation oI audience interaction pat-
terns.
Figure 1: Scheme of the modular architecture of the system
Different portions of the data flow have been externalised
to custom hardware to reduce computational load on the
controlling computer: a configurable number of real-time
devices converters transform the sounds of the required
number of microphones into MIDI messages and channel
them to the event manager; a cascade of Arduino devices
control the custom multi channel lighting controllers and
the sound output stage relies on MIDI standards.
A substantial amount of work has been put into the optimi-
sation of the UI console controlling the behaviour of the
installation; this turned out to be crucial for the success of
the project as it allowed to make use of the important feed-
back gathered in the first implementation of this participa-
tory art work.
Figure 2: GUI of the controlling system
The work was first displayed as part of a public event over
three weeks and allowed the co-generation of unpredicta-
ble soundscapes with varying levels of users appreciation.
The evaluation of any public co-creation environment is
itself a challenging research area and our future work will
investigate and evaluate methodologies to do so; further
developments to the AI are also planned to include feed-
back from past visitors.
More information about this project can be found at:
http://www.aether-hemera.com/s/aib
Proceedings of the Fourth International Conference on Computational Creativity 2013 220
Controlling Interactive Music Performance (CIM)
Andrew Brown, Toby Gifford and Bradley Voltz
Queensland Conservatorium of Music, Grifth University
[email protected], [email protected], [email protected]
Abstract
Controlling Interactive Music (CIM) is an interactive
music system for human-computer duets. Designed as
a creativity support system it explores the metaphor
of human-machine symbiosis, where the phenomeno-
logical experience of interacting with CIM has both
a degree of instrumentality and a sense of partner-
ship. Building on Pachets (2006) notion of reexiv-
ity, Youngs (2009) explorations of conversational in-
teraction protocols, and Whalleys (2012) experiments
in networked human-computer music interaction, as
well as our own previous work in interactive music
systems (Gifford & Brown 2011), CIM applies an ac-
tivity/relationality/prominence based model of musical
duet interaction. Evaluation of the system from both
audience and performer perspectives yielded consen-
sus views that interacting with CIM evokes a sense of
agency, stimulates creativity, and is engaging.
Description
The CIM system is an interactive music system for use in
human-machine creative partnerships. It is designed to sit at
a mid-point of the autonomy spectrum, according to Rowes
instrument paradigm vs player paradigm continuum. CIM
accepts MIDI input from a human performer, and impro-
vises musical accompaniment.
CIMs behaviour is directed by our model of duet interac-
tion, which utilises various conversational, contrapuntal and
accompaniment metaphors to determine appropriate musical
behaviour. An important facet of this duet model is the no-
tion of turn-taking where the system and the human swap
roles as the musical initiator.
To facilitate turn-taking, the system includes some mech-
anisms for detecting musical phrases, and their completion.
This way the system can change roles at musically appropri-
ate times. Our early implementation of this system simply
listened for periods of silence as a cue that the human per-
former had nished a phrase. Whilst this method is efcient
and robust, it limits duet interaction and leads to a discontin-
uous musical result.
This behaviour, whilst imbuing CIM with a sense of au-
tonomy and independence, detracts fromensemble unity and
interrupts musical ow. To address this deciency, we im-
plemented some enchronic segmentation measures, allow-
ing for inter-part elision. Inter-part elision is where phrase-
end in one voice coincides with (or is anticipated by) phrase-
start in a second voice.
In order to allow for inter-part elision, opportunistic deci-
sion making, and other synchronous devices for enhancing
musical ow, we have implemented some measures of mu-
sical closure as secondary segmentation indicators. Addi-
tionally these measures guide CIMs own output, facilitating
generation of coherent phrase structure.
The evaluation procedure
Our evaluation process involved six expert musicians, including staff and senior students at a
University music school and professional musicians from the State orchestra, who performed
with the system under various conditions. The setup of MIDI keyboard and computer used for
these sessions is shown in Figure 5.
Figure 5: A musician playing with CIM
Participants first played a notated score (see Figure 6). Next they engaged in free play with the
system, giving them an opportunity to explore the behaviour of the system. Finally, they
performed a short improvised duet with the system. The interactive sessions were video
recorded. Following the interactive session each performer completed a written questionnaire.
Figure 1: A musician interacting with the CIM system
References
Gifford T & Brown A R (2011). Beyond Reexivity: Me-
diating between imitative and intelligent action in an in-
teractive music system. In: Proceedings of the British
Computer Society Human-Computer Interaction Conference
2011, Newcastle Upon Tyne.
Pachet F (2006). Enhancing individual creativity with in-
teractive musical reective systems. In Musical Creativity:
Current Research in Theory and Practice. Wiggins G &
Deliege I (eds) London: Psychology Press
Whalley I (2012). Internet2 and global electroacoustic mu-
sic: Navigating a decision space of production, relationships
and languages. Organised Sound 17(01):4-15
Young M (2009). Creative Computers, Improvisation and
Intimacy. In Boden M et al (eds) Dagstuhl Seminar Proceed-
ings 09291: Computational Creativity: An Interdisciplinary
Approach
Proceedings of the Fourth International Conference on Computational Creativity 2013 221
Towards a Flowcharting System for Automated Process Invention
Simon Colton and John Charnley
Computational Creativity Group, Department of Computing, Goldsmiths, University of London
www.doc.gold.ac.uk/ccg
Figure 1: User-dened owchart for poetry generation.
Flowcharts
Ironically, while automated programming has had a long and
varied history in Articial Intelligence research, automat-
ing the creative art of programming has rarely been studied
within Computational Creativity research. In many senses,
software writing software represents a very exciting poten-
tial avenue for research, as it addresses directly issues re-
lated to novelty, surprise, innovation at process level and the
framing of activities. One reason for the lack of research in
this area is the difculty inherent in getting software to gen-
erate code. Therefore, it seems sensible to start investigating
how software can innovate at the process level with an ap-
proach less than full programming, and we have chosen the
classic approach to process design afforded by owcharts.
Our aim is to provide a system simple enough to be used
by non-experts to craft generative owcharts, indeed, sim-
ple enough for the software itself to create owcharts which
represent novel, and hopefully interesting new processes.
We are currently in the fourth iteration of development,
having found various difculties with three previous ap-
proaches, ranging from exibility and expressiveness of the
owcharts to the mismatching of inputs with outputs, the
storage of data between runs, and the ability to handle pro-
grammatic constructs such as conditionals and loops. In our
current approach, we represent a process as a script, onto
which a owchart can be grafted. We believe this offers the
best balance of exibility, expressiveness and usability, and
will pave the way to the automatic generation of scripts in
the next development stage. We have so far implemented
the natural language processing owchart nodes required to
model aspects of a previous poetry generation approach and
a previous concept formation approach.
The Flow System
In gure 1 we present a screenshot of the system, which is
tentatively called Flow. The owchart shown uses 18 sub-
processes which, in overview, do the following: a negative
valence adjective is chosen, and used to retrieve tweets from
Twitter; these are then ltered to remove various types, and
pairs are matched by syllable count and rhyme; nally the
lines are split where possible and combined via a template
into poems of four stanzas; multiple poems are produced and
the one with overall most negative valency is saved. Astanza
from a poem generated using malevolent is given in gure
2. Note in gure 1 that the node bordered in red (WordList
Categoriser) contains the sub-process currently running, and
the node bordered in grey (Twitter) has been clicked by the
user, which brings up the parameters for that sub-process in
the rst black-bordered box and the output from it in the sec-
ond black-bordered box. We see the 332nd of 1024 tweets
containing the word cold is on view. Note also that the
user is able to put a thumb-pin into any node, which indi-
cates that the previous output from that node should be used
in the next run, rather than being calculated again.
Its our ambition to build a community of open-source de-
velopers and users around the Flow approach, so that the
system can mimic the capabilities of existing generative sys-
tems in various domains, but more importantly, it can invent
new processes in those domains. Moreover, we plan to in-
stall the system on various servers worldwide, constantly re-
acting in creative ways to new nodes which are uploaded by
developers, and to new owcharts developed by users with
a variety of cultural backgrounds. We hope to show that, in
addition to creating at artefact level, software can innovate
at process level, test the value of new processes and intelli-
gently frame how they work and what they produce.
Figure 2: A stanza from the poem On Being Malevolent.
1
Proceedings of the Fourth International Conference on Computational Creativity 2013 222
A Rogue Dream:
Web-Driven Theme Generation for Games
Michael Cook
Computational Creativity Group
Imperial College, London
[email protected]
ABSTRACT
A Rogue Dream is an experimental videogame developed in
seven days for a roguelike development challenge. It uses
techniques from computational creativity papers to attempt
to theme a game dynamically using a source noun from the
player, including generating images and theme information.
The game is part of exploratory research into bridging the
gap between generating rules-based content and theme con-
tent for videogames.
1. DOWNLOAD
While A Rogue Dream is not available to download directly,
its code can be found at:
https://github.com/cutgarnetgames/roguedream
Spritely, a tool used in A Rogue Dream, can also be down-
loaded from:
https://github.com/gamesbyangelina/spritely
2. BACKGROUND
Procedural content generation systems mostly focus on gen-
erating structural details of a game, or arranging pre-existing
contextual information (such as choosing a noun from a list
of pre-approved words). This is because the relationship
between the mechanics of a game and its theme is hard to
dene and has not been approached from a computational
perspective.
For instance, in Super Mario eating a mushroom increases
the players power. We understand that food makes people
stronger, therefore a mushroom is contextually appropriate.
In order to procedurally replace that with another object,
the system must understand the real-world concepts of food,
strength, size and change. Most content generation systems
for games are designed to understand games, not the real
world. How can we overcome that?
3. A ROGUE DREAM
In [1] Tony Veale proposes mining Google Autocomplete us-
ing leading phrases such as why do <keyword>s... and us-
ing the autocompletions as a source of general knowledge
Figure 1: A screenshot from A Rogue Dream. The
input was cow - enemies were red, resulting in a
red shoe being the enemy sprite. Abilities including
mooing and giving milk.
or stereotypes. We refer to this as cold reading the Inter-
net, and use it extensively in A Rogue Dream. We also
employ Spritely, a tool for automatically generating sprite-
based artwork by mining the web for images.
The game begins by asking the player to complete the sen-
tence Last night, I dreamt I was a.... The noun used to
complete the sentence becomes a parameter for the search
systems in A Rogue Dream, such as Spritely and the various
text retrieval systems based on Veales cold reading. These
are subject to further ltering - queries matching why do
<keyword>s hate...are used to label enemies, for example.
This work connects to other research being conducted by the
author currently in direct code modication for content gen-
eration [?]. We hope to combine these two research tracks in
order to build technology that can understand and situate
abstract game concepts in a real-world context, and provide
labels and ction that describe and illustrate the game world
accurately and in a thematically appropriate way.
4. REFERENCES
[1] Tony Veale. From conceptual mash-ups to bad-ass
blends: A robust computational model of conceptual
blending. In Proceedings of the 3rd International
Conference on Computational Creativity, 2012.
Proceedings of the Fourth International Conference on Computational Creativity 2013 223
A Puzzling Present:
Code Modication for Game Mechanic Design
Michael Cook and Simon Colton
Computational Creativity Group
Imperial College, London
{mtc06,sgc}@doc.ic.ac.uk
Figure 1: A screenshot from A Puzzling Present.
ABSTRACT
A Puzzling Present is an Android and Desktop game re-
leased in December 2012. The game mechanics (that is, the
players abilities) as well as the level designs were generated
using Mechanic Miner, a procedural content generator that
is capable of exploring, modifying and executing codebases
to create game content. It is the rst game developed using
direct code modication as a means of procedural mechanic
generation.
1. DOWNLOAD
A Puzzling Present is available on Android and for all desk-
top operating systems, for free, here:
http://www.gamesbyangelina.org/downloads/app.html
The source code is also available on gamesbyangelina.org.
2. BACKGROUND
Mechanic Miner was developed as part of PhD research into
automating the game design process, through a piece of soft-
ware called ANGELINA. ANGELINAs ability to develop
small games autonomously, including theming the games
content using social and web media, was demonstrated at
ICCC 2012[1]. Mechanic Miner represents a large step for-
ward for ANGELINA as the system becomes able to inspect
and modify code directly, instead of using grammars or other
intermediate representations.
ANGELINAs research has always aimed to produce playable
games for general release. Space Station Invaders was re-
leased in early 2012 as a commission for the New Scientist,
and a series of newsgames were released to coincide with
several conferences in mid-2012. A Puzzling Present was
the largest release to date, garnering over 6000 downloads,
and entering the Android New Game charts in December,
as well as coverage on Ars Technica, The New Scientist, and
Phys.org.
3. A PUZZLING PRESENT
The game itself contains thirty levels split into three sets
of ten. Each set of levels, or world, has a unique power
available to the player, such as inverting gravity or becoming
bouncy. These powers can be switched on and o, and must
be used to complete each level. Each power was discovered
by Mechanic Miner by iterative modication of code and
simulation of gameplay to test the code modications. For
more information on the system, see [2].
Levels were designed using the same system - mechanics are
tested against designed levels to evaluate whether the level is
appropriate. This means the system is capable of designing
novel levels with mechanics it has never seen before - there
is no human intervention to add heuristics or evaluations for
specic mechanics.
We are currently working on integrating Mechanic Miner
into the newsgame generation module of ANGELINA, so
that the two systems can work together to collaboratively
build larger games. This initial work on code modication
has also opened up major questions about the relationship
between code and meaning in videogames, which we plan to
explore in future work.
4. REFERENCES
[1] Michael Cook and Simon Colton. Angelina -
coevolution in automated game design. In Proceedings
of the 3rd International Conference on Computational
Creativity, 2012.
[2] Michael Cook, Simon Colton, and Jeremy Gow.
Nobodys a critic: On the evaluation of creative code
generators. In Proceedings of the 4th International
Conference on Computational Creativity, 2013.
Proceedings of the Fourth International Conference on Computational Creativity 2013 224
Demonstration: A meta-pianist serial music comproviser
Roger T. Dean
austraLYSIS, Sydney; and MARCS Institute,
University of Western Sydney, Australia
[email protected]
Computational processes which produce meta-
human as well as seemingly-human outputs are
of interest. Such outputs may become apparently
human as they become familiar. So I write
algorithmic interfaces (often in MAXMSPJitter)
for real-time performative generation of complex
musical/visual features, to be part of
compositions or improvisations. Here I
demonstrate a musical system to generate serial
12-tone rows, their standard transforms, and then
to assemble them into melodic sequences, or into
two part meta-pianistic performances.
Serial rigour of pitch construction is
maintained throughout. This means here that 12-
note motives are made, each of which comprises
all the pitches within an octave on the piano (an
octave comprises a doubling of frequency of the
sound, and notes at the start and end of this
sequence are given the same note name
CDEFGABC etc). Then a generative system
creates a rigorous set of transforms of the chosen
note sequences. But as in serial composition at
large, when these are disposed amongst multiple
voices, and to create harmonies (simultaneous
notes) as well as melodies (successions of
separated notes), the serial chronology is
modified. Furthermore, the system allows
asynchronous processing of several versions of
the original series, or of several different series.
A range of complexity can result, and to
enhance this I also made a companion system
which uses tonal major scale melodies in a
similar way. Here the original (Prime) version
consists only of 12 notes taken from within an
octave of the major scale (which includes only 7
rather than 12 pitches), thus permitting some
repetitions. Chromatic inversion is used, so that
for example, the scale of Cmajor ascending from
C becomes the scale of Ab major descending
from C, and major tonality with change of key
centre is preserved.
The performance patch within the system
provided a default stochastic rhythmic, chordal
and intensity control process; all of whose
features are open to real-time control by the
user.The patches are used for generating
components of electroacoustic or notated
composition, normally with equal-tempered or
alternative tuning systems performed on a
physical synthesis virtual piano (PianoTeq); and
also within live solo MultiPiano performances
involving acoustic piano and electronics.
The outputs are meta-human in at least two
senses. First, as with many computer patches, the
physical limitations of playing an instrument do
not apply, and Xenakian performance
complexities can be realised. Second, no human
improviser could achieve this precision of pitch
transformation; rather we have evidence they
tend to take a simplified approach to atonality,
usually focusing on controlling intervals of 1, 2,
6, and 11 semitones. The products of these
patches are also in use in experiments on the
psychology of expectation (collaboration with
Freya Bailes, Marcus Pearce and Geraint
Wiggins, UK).
References
MultiPiano, by Roger Dean; Tall Poppies TP225,
Double CD (2012).
Proceedings of the Fourth International Conference on Computational Creativity 2013 225
assimilate - collaborative narrative construction
Damian Hills
Creativity and Cognition Studio
University of Technology, Sydney
Sydney, Australia
[email protected]
Abstract
This demonstration presents the 'assimilate - collabora-
tive narrative construction' project, that aims for a ho-
listic system design with support for the creative possi-
bilities of collaborative narrative construction.
Introduction
This demonstration presents the 'assimilate - collaborative
narrative construction' project (Hills 2011) that aims for a
holistic system design with support for the creative possi-
bilities of collaborative narrative construction. By incorpo-
rating interface mechanics with a flexible model of narra-
tive template representation, the system design emphasises
how mental models and intentions are understood by par-
ticipants, and represents its creative knowledge outcomes
based on these metaphorical and conversational exchanges.
Using a touch table interface participants collaboratively
narrate and visualise
narrative sequences
using online media
obtained through a
keyword search, or
by words obtained
from narrative tem-
plates. The search
results are styled
into generative be-
haviours that visu-
ally self-organise while participants make aesthetic choices
about the narrative outcomes and their associated behav-
iours.
The playful interface supports collaboration through em-
bedded mechanics that
extend gestural actions
commonly performed
during casual conversa-
tions. By embedding
metaphorical schemes
associated with narrative
comprehension, such as
pointing, exchanging,
enlarging or merging
views, gestural action
drives the experience
and supports the con-
versational aspects
associated with narra-
tive exchange.
System Architecture
The system architecture models the narrative template
events to allow a particular narrative perspective, globally
or locally within the generated story world. This is done by
modeling conversation relationships with the aim of self-
organising and negotiating an
agreement surrounding several
themes. The system extends
Conversation Theory (CT)
(Pask, 1976), a theory of
learning and social interaction,
that outlines a formal method of
conversation as a sense-making
network. Based on CT
entailment meshes with an
added fitness metric, this
develops a negotiated
agreement surrounding several
interrelated themes, that leads
to eventual narrative coherence.
References
Hills, D., assimilate: An Interface for Collaborative Narra-
tive Construction, ICIDS 2011, pp. 294-299.
Pask, G. Conversation theory: Applications in education
and epistemology. Elsevier, Amsterdam, 1976.
Proceedings of the Fourth International Conference on Computational Creativity 2013 226
Breeding on site
Tatsuo Unemi
Department of Information Systems Science
Soka University
Tangi-machi 1-236, Hachi oji, Tokyo 192-8577 Japan
[email protected]
Computer #1
SBArt4 Player Controller
Computer #2
Ethernet
Figure 1: System setup.
This is a live-performance of improvisational productions
and playbacks of a type of evolutionary art using a breed-
ing tool, SBArt4 version 3 (Unemi 2010). The performer
breeds a variety of individual animations using SBArt4 on
a machine at his front in a manner of interactive evolution-
ary computation, and sends the genotype of his/her favorite
individual to SBArt4Player through a network connection.
Figure 1 is a schematic illustration of the system setups.
Each individual animation that reached the remote machine
is played back repeatedly with the synchronized sound ef-
fect until another one arrives. Assisted by a mechanism of
automated evolution based on computational aesthetic mea-
sures as the tness function, it is relatively easy to produce
interesting animations and sound effects efciently on site
(Unemi 2011).
The player component has a functionality to composite
another animation of feathery particles that reacts against the
original image rendered by a genotype. Each particle moves
guided by the force calculated from the HSB color value
under the particle. The brightness is mapped to the strength,
the hue value is mapped to the orientation, and the saturation
is mapped to the uctuation. This additional effects provide
another impression for viewers.
The performance will start from a simple pattern selected
from the initial population randomly generated, and then
gradually shifts to complex patterns. The parameters of
sound synthesis are fundamentally determined from statistic
features of frame image so that it ts with the impression of
visuals, but some of them are also subjects of real-time tun-
ing. The performer is allowed to adjust several parameters
such as scale, tempo, rhythm, noise, and the other modula-
tion parameters (Unemi 2012) following his/her preference.
Because the breeding process includes spontaneous trans-
Figure 2: Live performance in Rome, December 2011.
formation by mutation and combination, the animations
shown in a performance are always different from those in
another occasion. This means each performance is just one
time.
References
Unemi, T. 2010. Sbart4 - breeding abstract animations in
realtime. In Proceedings of the IEEE World Congress on
Computational Intelligence, 40044009.
Unemi, T. 2011. Sbart4 as automatic art and live perfor-
mance tool. In Soddu, C., ed., Proceedings of the 14th Gen-
erative Art Conference, 436447.
Unemi, T. 2012. Synthesis of sound effects for generative
animation. In Soddu, C., ed., Proceedings of the 15th Gen-
erative Art Conference, 364376.
The projects website is:
http://www.intlab.soka.ac.jp/
unemi/sbart/4/
breedingOnSite.html
Demo video:
http://www.youtube.com/watch?v=1kKpWntUd8M
Proceedings of the Fourth International Conference on Computational Creativity 2013 227
A Fully Automatic Evolutionary Art
Tatsuo Unemi
Department of Information Systems Science
Soka University
Tangi-machi 1-236, Hachi oji, Tokyo 192-8577 Japan
[email protected]
Figure 1: Sample image.
This is a project of an automatic art that the computer
autonomously produces animations of a type of abstract im-
ages. Figure 1 is a typical frame image of an animation. A
custom software, SBArt4 version 3, developed by the au-
thor is tanking a main role of the work, that based on a ge-
netic algorithm utilizing computational aesthetic measures
as tness function (Unemi 2012a). The tness value is a
weighted geometric mean of measures including complex-
ity, global contrast factor, distribution of color values, dis-
tribution of edge angles, difference of color values between
consecutive frame images, and so on.
Figure 2 illustrates the system conguration using two
personal computers connected by the Ethernet. The left side
is for evolutionary process, and the right side is for render-
ing and sound synthesis. Starting from a population ran-
domly initialized with mathematical expressions that deter-
mines the color value for each pixel in a rectangular area, a
never-ending series of abstract animations are continuously
displayed on the screen in turn with synchronized sound ef-
fect (Unemi 2012b). Each of the 20 seconds animation is
corresponding to an individual of relatively high tness cho-
sen from the population in the evolutionary process.
The evolutionary part is using Minimal Generation Gap
model (Satoh, Ono, and Kobayashi 1997) for the genera-
tional alternation to guarantee the time for each computa-
tion step is minimal. After 120 steps of generational alterna-
Script
Computer #1
SBArt4 Player Controller
Computer #2
Ethernet
Figure 2: System setup.
tions, the genotypes of the best ten individuals are sent to the
player side in turn. To avoid convergence to lead a narrower
variation of individuals in the population, the individuals of
lower tness in one forth of the population are replaced with
random genotypes for each 600 steps.
The visitors will notice not only the recent progress of
the power of computer technology but also will possibly
be given an occasion to think what the artistic creativity is.
These technologies are useful not only to build up a system
that makes unpredictable interesting phenomena but also to
provide an occasion for people to reconsider how we should
relate to the artifacts around us. We know the nature is com-
plex and often unpredictable, but we, people in the modern
democratic society, intend to assume that articial systems
should be under our control and there must be some person
who takes responsibility on the effects. The author hopes
the visitors will notice that it is difcult to keep some of the
complex artifacts under our control, and will learn how we
can enjoy with them.
References
Satoh, H.; Ono, I.; and Kobayashi, S. 1997. A new gener-
ation alternation model of genetic algorithms and its assess-
ment. Journal of Japanese Society for Articial Intelligence
12(5):734744.
Unemi, T. 2012a. Sbart4 for an automatic evolutionary art.
In Proceedings of the IEEE World Congress on Computa-
tional Intelligence, 20142021.
Unemi, T. 2012b. Synthesis of sound effects for genera-
tive animation. In Soddu, C., ed., Proceedings of the 15th
Generative Art Conference, 364376.
Demo video:
http://www.youtube.com/watch?v=XBej nlu-Hg
Proceedings of the Fourth International Conference on Computational Creativity 2013 228
The Fourth International Conference on
Computational Creatiivity
ICCC 2013
Sydney, Australia
12-14 June 2013
in cooperation with
ISBN: 978-1-74210-317-4
supported by
int i,W=480,H=640,Z=64;float x,y,X,Y,s,D,d,c,A=PI/18;void setup(){size(W,H);background(255);strokeWeight(Z/2);for(i=-Z;
i<H+Z;i+=4){stroke(0,4);line(0,i+R(Z/2),W,i+R(Z/2));}strokeWeight(1);}void draw(){D=r(2)*PI+R(PI/4);d=R(A/10);c=R(A);x=
W/2*(1-2*cos(D));y=H/2*(1-2*sin(D));s=15+R(5);for(i=Z;i>0;--i){X=x+s*cos(D);Y=y+s*sin(D);stroke(0,i);line(x,y,X,Y);x=X;
y=Y;D+=d+R(d)+c;d+=R(A);if(R(Z)>Z-4)c*=5;}} int r(int n){return int(random(n));} float R(float v){return random(-v,v);}