Novice Chart Color Tool: NL2Color
Novice Chart Color Tool: NL2Color
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2023.3326522
Abstract— Choice of color is critical to creating effective charts with an engaging, enjoyable, and informative reading experience.
However, designing a good color palette for a chart is a challenging task for novice users who lack related design expertise. For
example, they often find it difficult to articulate their abstract intentions and translate these intentions into effective editing actions to
achieve a desired outcome. In this work, we present NL2Color, a tool that allows novice users to refine chart color palettes using natural
language expressions of their desired outcomes. We first collected and categorized a dataset of 131 triplets, each consisting of an
original color palette of a chart, an editing intent, and a new color palette designed by human experts according to the intent. Our tool
employs a large language model (LLM) to substitute the colors in original palettes and produce new color palettes by selecting some of
the triplets as few-shot prompts. To evaluate our tool, we conducted a comprehensive two-stage evaluation, including a crowd-sourcing
study (N=71) and a within-subjects user study (N=12). The results indicate that the quality of the color palettes revised by NL2Color
has no significantly large difference from those designed by human experts. The participants who used NL2Color obtained revised
color palettes to their satisfaction in a shorter period and with less effort.
Index Terms—chart, color palette, natural language, large language model
Authorized licensed use limited to: SHENZHEN UNIVERSITY. Downloaded on October 26,2023 at 10:02:53 UTC from IEEE Xplore. Restrictions apply.
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2023.3326522
mainly focus on facilitating visual analytics and visualization creation. Table 1: Demographics of all participants in the formative study, in-
The NLIs for visual analytics aim to assist users in the tasks of infor- cluding each participant’s ID, gender, and chart usage scenarios.
mation discovery, search, and query [47]. For example, FlowSense [54] ID Gender Chart Usage Scenarios
applied a semantic parser to understand user queries and accordingly Research Paper, Technical Report, Presentation
manipulate the visualizations produced by a dataflow diagram to help FP1 Male
Slides
users with visual data exploration within a dataflow system. Liu et Research Paper, Presentation Slides, Storyboard,
al. [27] proposed ADVISor, a pipeline to automatically generate charts FP2 Male
Web UI Design
for tabular data to answer users’ natural-language questions. Luo et Research Paper, Presentation Slides, Product
al. [29] developed an end-to-end deep learning model, ncNet, which FP3 Female
Analysis Report, User Behavior Analysis Report
translates natural language queries raised by users to Vega-Lite to FP4 Female Research Paper, Presentation Slides
generate visualizations. In such research, how to address vague and FP5 Female Presentation Slides, Course Report
underspecified natural language expressions is a key challenge and FP6 Male Research Paper, Work Report
different solutions have been proposed. For instance, Hearst et al. [19]
carried out an empirical study and proposed design guidelines for how
an NLI should respond to vague modifiers in natural language queries. We conducted semi-structured interviews with these participants. Af-
Setlur et al. [39] presented a system called Sentifiers to infer the data ter signing the consent form, they were first asked to recall and describe
attributes involved in vague utterances. their latest experience of refining the color palettes of charts, including
but not limited to whether there were some refinement requests that
Another line of research explores visualization creation based on
are vague or abstract, what vague or abstract requests they have, and
natural language descriptions. For example, Cui et al. [15] designed an
how they modify the charts to satisfy such requests. Then we asked
approach that automatically generated infographics according to natural
about the difficulties they faced in the refinement process and the need
language statements containing proportion facts. Rashid et al. [36]
for facilitating the color palette refinement. Finally, we invited the
focused on chart production and explored an approach to generate bar,
participants to envision what services they would like to have for the
line, or pie charts for user-input natural language text.
color palette refinement and what expectations and concerns they had
However, there is still limited research on how NLIs can enable
for such services.
users to author and modify visualizations by expressing their desired
outcomes in natural language. While some studies (e.g., [49]) have 3.2 Results
focused on authoring-oriented tasks, they are only applicable to natural
language requests containing specific editing intents, such as “make 3.2.1 Refinement Requests
the United States bar red”. Hence, these systems do not fully support All participants reflected that they had vague or abstract requests for
refinement requests that are vague and abstract. chart color palette refinement during the chart designing process. Based
on the interview results, we identified two common types of refinement
2.2 Color Palette Design Tools requests: descriptive-word-based and chart-topic-based (Table 2).
A variety of works have proposed different methods to facilitate color Descriptive-word-based. We observed that all participants utilized
palette design. As AI-driven approaches gained attention, Peng and descriptive words or phrases to specify their desired color palettes in
Chou [33] utilized sentiment analysis to help designers understand the refinement requests. Specifically, such requests could be further
stakeholders’ color palette requirements, while Bahng et al. [10] devel- divided into those with detailed references and those without (Table 2).
oped Text2Colors, a tool that employs input text semantics for grayscale For without-reference requests, participants only expressed their vague
image colorization. Qiu et al. [34] proposed a masked color model for feelings, such as desired styles (e.g., “cyberpunk style”) and change
recommending colors for different components in landing pages. These directions (e.g., “more professional”), about the original charts but did
studies demonstrate that AI techniques can greatly enrich color palette not know what specific colors can achieve their desired outcomes. For
generation tools. example, FP2 stated that sometimes he felt the color palette of a chart
In the field of visualization, there is a long history of studying was not professional enough, yet he could not imagine in his mind
tools for color palette design. Early works include ColorBrewer [17] what kind of color palette was professional. An interesting finding is
for map coloring and the techniques proposed by Wijffelaars et al.’s that with the same without-reference requests, participants may desire
work [50] for generating univariate palettes based on easily-understood different refinements in different usage scenarios, either brand-new
perceptual-property parameters. Later research, such as Lin et al. [26] or fine-tuned color palettes. Four participants pointed out that if the
as well as Setlur and Stone [40], delved into color assignment based original color palettes largely meet their needs, they just need a fine-
on concept-color associations. Shugrina et al. [41] introduced Color tuning to the palettes, which means that the hues of the colors in the
Builder, an innovative interface that integrates operations like swatches original palettes do not need to be revised but only other small changes
and smooth for enhanced visualization coloring. Yuan et al. [55] began (e.g., increasing the lightness of the colors) are required. In other cases,
to incorporate AI for color palette design by developing InfoColorizer, all participants reported that they desire brand-new color palettes when
an interactive tool to recommend color palettes for infographics using they have without-reference refinement requests.
a deep learning model trained on a large-scale infographics dataset. Furthermore, two participants mentioned that sometimes they re-
Wang et al.’s visualization authoring pipeline allows users to change member a reference chart in their minds but cannot find it. Therefore,
chart colors using natural language, although explicit instructions like they can only describe the reference in natural language (e.g., “I would
“Set the color of the Ford bar to red” are required [49]. like a fresh and lovely color scheme, specifically a yellow and green
palette.”). When having such requests, both of them sought to gain a
Despite these advancements, existing research still exhibits limi-
brand-new set of color palettes.
tations, such as a lack of consideration for the relationship between
Chart-topic-based. Three participants reflected that they may have
recommended palettes and users’ initial palettes. Additionally, users
refinement requests about making the charts’ color palettes align with
may still struggle to effectively apply general color palettes to their
specific topics, such as “environmentally friendly”. For instance, FP3
charts. Our work addresses these gaps by offering a more practical and
user-friendly tool for color palette design in visualization authoring.
Table 2: Refinement requests for chart color palettes.
3 FORMATIVE STUDY Brand-new Fine-tuned
3.1 Participants and Procedure Without-reference
Descriptive
(e.g., styles)
We invited 6 participants (3 female, 3 male; FP1-6) by word-of-mouth. -word-based
With-reference
They all have no background in design but have the need to use charts
Chart-topic-based
in their daily life (Table 1).
Authorized licensed use limited to: SHENZHEN UNIVERSITY. Downloaded on October 26,2023 at 10:02:53 UTC from IEEE Xplore. Restrictions apply.
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2023.3326522
said that she always makes presentation slides about different topics and and needs for color palettes despite having the same color palette
she would like the color palettes of the charts to match her topics when refinement requests. Hence, for each input refinement request,
she creates the charts for the slides. When having chart-topic-based the tool should provide multiple potential revised color palettes
requests, the three participants all expressed that they hope to obtain to cater to diverse user preferences.
brand-new color palettes.
4 NL2C OLOR
3.2.2 Challenges and Needs for Color Palette Refinement
Based on the design requirements, we presented a tool, NL2Color, that
Four participants indicated that while they would like the color palettes
enables novice users to refine the color palettes of charts by expressing
of charts to be revised, they request that the direction of the color
their vague or abstract requests and intents in natural language. In
scales embedded in the original palettes should be preserved in the
this section, we introduce the implementation details of NL2Color,
refined palettes. For example, FP5 said that “if some components of
including data collection and a pipeline that automatically refine chart
a chart are denoted by sequential/diverging colors, the revised color
color palettes based on users’ requests.
palettes should also contain corresponding sequential/diverging colors
to represent these components”. However, they were concerned that
4.1 Data Collection
maintaining the existing representations of colors makes the palette
update rather difficult. FP4 complained that it is challenging to identify The data for NL2Color were collected through two methods. First,
complex color scales when the original color palettes contain many we invited eight novice users who do not have backgrounds in design
colors. “Sometimes I cannot distinguish whether the several colors and introduced the types of vague or abstract chart palette refinement
are a set of sequential colors or categorical colors with similar hues” requests identified in our formative study to them. For each type
(FP4). Even though such color scales of the original palettes can of refinement request, we presented one or two examples mentioned
be identified properly, a tedious and time-consuming manual color by the participants in the formative study, helping the novice users
mapping is commonly required because the consistency of the direction understand the concept of vague or abstract refinement requests. After
of the color scales between refined palettes and original ones is rarely the introduction, we asked the eight novices to write down potential
satisfied. For instance, FP1 shared a personal experience that he wanted vague or abstract requests they may make in their day-to-day life when
to revise a color palette containing a set of sequential colors, while he wanting to change the color palettes of charts. We removed duplicated
could only search for a satisfactory new categorical palette and needed requests and finally obtained 41 unique requests. Then we randomly
to manually extend a color of it into a set of sequential colors based on collected 84 SVG-based charts of various common types from the Plotly
the original sequential colors. Chart Studio1 , a widely used website that allows users to manually
In addition, we found that individuals may have different preferences create and share charts [20]. For the subsequent model training in the
when refining the color palettes of charts. When different participants color palette refinement module (Section 4.2.2), we only kept the charts
have the same refinement request, their expected new palettes may be whose color palettes are categorical colors and discarded the others,
different. Even for the same request from a single user, different palettes resulting in 60 charts in our dataset (Table 3). We showed these 60
may be selected in different scenarios. Therefore, all participants charts to the eight novices and asked them to match the requests they
suggested that it would be helpful if our tool could provide multiple provided to each chart. For the type of without-reference requests,
possible options that fulfill the refinement requests so that they could we also asked them whether they wanted to get a brand-new or fine-
select one based on their preferences. tuned color palette. We found that each chart corresponded to at least
four requests. Subsequently, for each chart, we randomly selected two
Table 3: The chart types contained in the dataset. matched requests, resulting in 120 pairs of a chart and a corresponding
abstract refinement request for the chart color palette. Among them, 80
Chart Type N Percent pairs require brand-new color palettes, and 40 pairs require fine-tuned
Line chart 5 8.3% ones. As the other method to collect data, we asked the participants in
Grouped bar chart 12 20.0% the formative study (Section 3) to provide the charts they mentioned
Bar chart
Stacked bar chart 9 15.0% in the interview that they were not satisfied with, as well as their
Pie chart 8 13.3% corresponding refinement requests. Through these two methods of data
Area chart 8 13.3% collection, there are a total of 131 pairs of original charts and requests in
Scatter chart 7 11.7% our dataset, 85 pairs for brand-new requests and 46 pairs for fine-tuned
Box chart 11 18.3% requests (DR1). We then invited nine design experts to design a new
color palette for the chart in each pair based on the refinement request.
These experts all have more than five years of design experience and
3.2.3 Design Requirements often create charts in their daily design work (Table 4).
Based on the qualitative results from the participants, we concluded We extracted the color palettes of the charts in our dataset. Specif-
three design requirements for our system design. ically, for each chart, we identified all the unique colors in the SVG
document while excluding those employed for texts, background, and
• DR1: Support both brand-new and fine-tuned color palette axes. After this, we finalized our dataset of 131 triplets of (1) the
refinement requests. From the formative study, we observed original color palette of a chart, (2) a vague or abstract color palette
that users often desire brand-new or fine-tuned color palettes in refinement request, and (3) a new color palette designed by human
various usage scenarios. Therefore, our tool should enable the experts according to the request.
production of both brand-new and fine-tuned color palettes in
response to users’ refinement requests. 4.2 Color Palette Refinement Pipeline
• DR2: Accommodate users’ refinement intents while preserv- We developed a pipeline that refines the color palette of the input chart
ing the direction of color scales in the original color palettes. based on the user’s vague or abstract refinement requests. The pipeline
When refining the color palettes of charts, users typically do consists of two modules: an original color palette extraction module
not want to break the well-designed color scales in the origi- and a color palette refinement module.
nal palettes. Thus, it is important to maintain the direction of
color scales of the original color palettes while fulfilling users’ 4.2.1 Original Color Palette Extraction Module
refinement requests.
When a user uploads an SVG-based chart, our module automatically
• DR3: Provide multiple options for a single input refinement extracts its color palette. For this purpose, we first identified all the
request to cater to various user preferences. According to our
formative study, individual users may have different preferences 1 https://chart-studio.plotly.com/feed/
Authorized licensed use limited to: SHENZHEN UNIVERSITY. Downloaded on October 26,2023 at 10:02:53 UTC from IEEE Xplore. Restrictions apply.
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2023.3326522
Table 4: Demographics of the nine design experts, including each expert’s ID, gender, design experience (in years), and previous design activities.
ID Gender Design Experience. Previous Design Activities
1 Female 5 Graphic Design, Interaction Design, Industrial Design, Service Design
2 Prefer not to say 5 Mobile App UI Design, Activity Poster Design, Industrial Design
3 Female 6 Web UI Design, Activity Poster Design, Product Advertisement Promotion Design
4 Male 6 Visualization Design, Product Design, Industrial Design
5 Female 5 Mobile App UI Design, Activity Poster Design, Industrial Design
Mobile App UI Design, Illustration Design, Product Advertisement Promotion Design,
6 Female 7
Game Design, Service Design, Pavilion Design
7 Female 5 Mobile App UI Design, Activity Poster Design, Game Design
8 Male 5 Mobile App UI Design, Interaction Design, Interior Design
9 Male 5 Mobile App UI Design, Industrial Design
(a) Fine-tuned request: “Please use more vibrant colors that create contrast and (b) Brand-new request: “I think this chart is too fancy.”
add energy to the chart.”
(c) Brand-new request: “I want the colors to show a sense of silence.” (d) Brand-new request: “I would like a professional chart. I remember that
many professional charts would use navy blue and maroon colors.”
(e) 1 Fine-tuned request: “I think the chart should have a more cartoon style but not industrial.”; 2 Brand-new request: “I think the chart should have a more
cartoon style but not industrial.”; 3 Brand-new request: “Please use our company’s (Google) colors and ensure that the chart design is consistent with our overall
visual identity.”
Fig. 1: Examples of color palette refinement by NL2Color. (a)-(d) show four pairs of an original chart (left) and a new chart (right) refined
by NL2Color according to the request. (e) shows an original chart (left) and three new charts (right) NL2Color generated according to three
refinement requests. The color palette of each chart is displayed above the chart. The original charts are collected from Vega-Lite [9].
unique colors in the SVG document in a similar way we extracted the colors in the original color palette for the subsequent model training
color palettes of the charts in our dataset (Section 4.1). (Section 4.2.2). Specifically, we first obtained all sets of sequential
Color scales, i.e., categorical, sequential, and diverging colors, are colors by identifying the groups of colors that have the same hue
commonly used by chart users to effectively communicate data [42]. and have a linearly monotonic sequence of increasing (or decreasing)
To preserve the color scales in the original palette, we identified all sets lightness [13]. Based on [57], the luminance trajectory should be
of sequential colors (i.e., a gradation of colors that go from light to dark balanced between the two sets of sequential colors in a set of diverging
or dark to light [38]) and diverging colors (i.e., two sequential colors colors. Thus, we transferred all these sets of sequential colors into
based on two different hues that meet in a neutral midpoint [13]) from HCL (Hue-Chroma-Luminance) color space. If there are two sets
the set of unique colors we extracted (DR2) and only kept their primary of sequential colors with equal intervals in luminance, they would
Authorized licensed use limited to: SHENZHEN UNIVERSITY. Downloaded on October 26,2023 at 10:02:53 UTC from IEEE Xplore. Restrictions apply.
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2023.3326522
combine to form a set of diverging colors. Then we identified the we do not directly state what kind of palette we want in the refinement
primary color(s) of each set of sequential and diverging colors. We request but simply describe the problem with the palette (i.e., “I think
defined the primary color of sequential colors as the color closest to the chart is too fancy”), NL2Color still successfully understands our
the center of the gradient and the primary colors of diverging colors needs and returns a palette with plain colors. To satisfy the requirement
as the two primary colors of the pair of sequential colors they contain. in Fig. 1(c), NL2Color changes the original colors to a cooler tone to
Therefore, we arranged the colors in each sequential color group from create a calm and serene feeling. Fig. 1(d) showcases an example of
lightest to darkest and regarded the color located in the middle or, if the NL2Color processing with-reference requests. It not only applies the
sequential color group contains an even number of colors, the lighter colors from the description of the reference palette but also makes sure
of the two middle colors as its primary color. Finally, we combined that the other NL2Color freely complemented colors are harmonious
these primary colors with the remaining categorical colors in the set of with these colors and create the professional feel the user wants. Fig.
unique colors other than sequential and diverging color groups as our 1(e) displays the new charts that NL2Color recommends based on
extracted original color palette. the same original chart but according to different refinement requests.
The colors of 1 are more vivid and playful while maintaining the
4.2.2 Color Palette Refinement Module same hues as the original palette. In 2 , NL2Color directly uses
This module employed OpenAI’s GPT-3 model [14] to refine the origi- some candy colors to generate a brand-new chart in cartoon style. As
nal chart color palettes based on users’ requests. The GPT-3 model has for 3 which is requested to keep consistent with Google’s overall
been demonstrated to have high performance on various tasks using a visual identity, NL2Color directly applies the four main theme colors
small number of examples and a well-crafted prompt [43, 46, 53]. We of Google company in the new palette and complements an extra grey
followed the principles and techniques proposed by [31, 37] to design color to ensure that the number of colors in the new palette is consistent
prompts for GPT-3. Specifically, we crafted two few-shot prompts, with the original one without affecting the expression of Google’s visual
respectively, for the brand-new and fine-tuned color palette refinement identity. These examples showcase how NL2Color handles the different
requests (DR1). In each prompt, we first described our task, including types of color palette refinement requests (Table 2) and confirm that
the task goal, the input, and the output, as well as the definition of NL2Color meets our DR1.
brand-new or fine-tuned color palettes (please see the supplementary
material). Then we applied the text embedding API2 of OpenAI to 6 E VALUATION
get the embeddings of the natural language request input by the user To assess the effectiveness and usefulness of NL2Color, we conducted
and the vague or abstract requests in the triplets in our dataset. If the a two-stage evaluation, including a crowd-sourcing study and a within-
user requires a brand-new (fine-tuned) color palette, we would solely subjects user study. In this section, we present our evaluation study
consider brand-new (fine-tuned) requests in our dataset. We calculated design and the findings regarding the performance of our tool and
the cosine similarity - a commonly used effective measure of text simi- whether and how it would influence novice chart users’ color palette
larity [22, 48] - between the user-input request and each request in the revision process.
dataset based on their embeddings. The five requests in our dataset
with the highest similarity were selected, and the corresponding triplets 6.1 Study1: Crowd-sourcing Study
were added to the prompt for the GPT-3 model to perform few-shot
In this study, we evaluated how well NL2Color refines color palettes
learning. Finally, we concatenated the original color palette extracted
from the perspective of chart readers. Since we designed two models
from the user-uploaded chart (Section 4.2.1) and user-input refinement
request to the prompt and passed it to the model to refresh the color
palette. The model’s output was controlled to present ten alternative
palettes to provide multiple options to users (DR3).
Once the updated color palettes were produced, we extended the
new primary colors in them into new sequential or diverging colors
to match the original palette (DR2). Specifically, for each color in a
set of sequential colors we identified through the original color palette
extraction module (Section 4.2.1), we computed the difference value
between the primary color and it in the luminance channel. Then the
new color corresponding to this color can be obtained by adding this (a) Expert-designed. Rating: 4.4
difference value to the luminance value of the new primary color in the
new palette. In the same way, the two groups of sequential colors are
obtained respectively, thereby extending a new set of divergent colors.
Note that our pipeline only supports SVG-based charts for easy color
extraction. We acknowledge that this may limit users’ flexibility in the
chart design process. Our main goal in this work is not to develop a full-
fledged system but to propose a basic method for automatically refining
the color palette of a chart according to the user’s intent expressed in
natural language. Future work could improve our methods for original
(b) NL2Color-refined. Rating: 4.0
color palette extraction to make it applicable to diverse chart formats.
5 E XAMPLE O UTPUT
Fig. 1 shows several examples of color palette refinement by NL2Color.
As shown in Fig. 1(a), we used our tool to fine-tune the color palette
of the original chart (left) to make it more vibrant and energetic. The
modified color palette keeps the hues of the colors in the original
palette (i.e., blue, brown, and yellow) but makes the colors brighter.
The contrast between the colors is also more pronounced following the
refinement request. For the second pair of examples (Fig. 1(b)), the new (c) Zero-shot-model-refined. Rating: 3.3
palette does not maintain the same hues as the original palette since Fig. 2: An example of the new charts refined in the three conditions of
the input request indicates a desire for a brand-new palette. Even if our crowd-sourcing study, along with the ratings they received from the
2 https://platform.openai.com/docs/guides/embeddings participants. The original chart is Fig. 5(a) and the refinement request
is “The chart should have a cultural or historical theme”.
Authorized licensed use limited to: SHENZHEN UNIVERSITY. Downloaded on October 26,2023 at 10:02:53 UTC from IEEE Xplore. Restrictions apply.
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2023.3326522
normal distribution ( p > .05). Therefore, we ran the Friedman test with
post-hoc Wilcoxon signed-rank tests with Bonferroni correction [25] to
assess the difference in the participants’ ratings on the quality of the
revised color palettes across the three conditions.
6.1.3 Results
The results indicate significant differences (χ 2 (2) = 16.30, p < .01)
between the quality of the fine-tuned charts in the three conditions
(Fig. 3 (left)). The pairwise comparisons showed that the charts re-
Fig. 3: Means and standard errors of the participants’ ratings on the fined by human designers (4.21, [3.78, 4.65] 95% CI) and NL2Color
revision quality respectively for fine-tuned requests (left) and brand- (4.31, [3.87, 4.74] 95% CI) received significantly higher ratings (expert-
new requests (right) on a 7-point Likert scale (1 - absolutely not meet designed: Z = -2.02, p < .05; NL2Color-refined: Z = -3.92, p < .01)
the refinement request, 7 - absolutely meet the refinement request; *: p than those revised by the zero-shot model (3.49, [3.14, 3.84] 95% CI).
< .05, **: p < .01). However, the difference between the expert-designed condition and the
NL2Color-refined condition is not significant.
respectively for the fine-tuned and brand-new requests, we evaluated We also found a significant difference between the participants’
their performance separately. On the one hand, we compared the quality ratings on the charts with brand-new color palettes in the three condi-
of the new color palettes refined by NL2Color with those designed by tions (χ 2 (2) = 6.87, p < .05). As shown in Fig. 3 (right), participants
human experts. On the other hand, to validate our prompt design gave significantly higher ratings (expert-designed: Z = -2.93, p < .01;
(Section 4.2.2), we further crafted two zero-shot prompts (i.e., the NL2Color-refined: Z = -2.40, p < .05) to the charts modified by human
prompt only contains task description) for GPT-3 corresponding to experts (4.37, [4.13, 4.61] 95% CI) and NL2Color (4.29, [4.03, 4.56]
the two few-shot prompts we designed for NL2Color to respectively 95% CI) compared to those revised by zero-shot model (3.84, [3.63,
generate fine-tuned and brand-new color palettes. Overall, we, for 4.05] 95% CI); no statistical difference is found between the brand-new
each type of request (i.e., brand-new or fine-tuned), compared three color palettes designed by human experts and NL2Color.
conditions: (1) expert-designed, (2) NL2Color-refined, and (3) zero- These results proved that the color palettes refined by NL2Color,
shot-model-refined color palettes (Fig. 2). regardless of whether they are fine-tuned or brand-new, have no signif-
icantly large difference from those designed by human experts. Fur-
6.1.1 Study Setup thermore, the prompts we designed for NL2Color are demonstrated to
Using the first way of data collection we mentioned in Section 4.1, we be effective in facilitating the GPT-3 model to handle the color palette
collected 60 pairs of (1) a chart with an original color palette and (2) refinement tasks.
a vague or abstract refinement request for the crowd-sourcing study.
Among them, 40 pairs consist of requests for brand-new color palettes 6.2 Study2: User Study
and 20 pairs for fine-tuned ones. To acquire expert-designed palettes, In this study, we evaluated NL2Color with real users and explored its
we invited eight professional designers to revise the color palette for influence on users’ color palette refinement process. We conducted
each chart according to the corresponding request. For the NL2Color- a within-subjects study with 12 participants, where the participants
refined and zero-shot-model-refined conditions, we leveraged the re- completed chart palette refinement under two conditions. In the control
sults returned by NL2Color and the models with zero-shot prompts. As condition, the participants are allowed to use any tools and websites
each of these models would provide ten alternative color palettes for a they commonly use in their routine practices (e.g., Adobe Color [1],
given pair of data, we randomly selected a palette from the options as Color Hunt [4]) to revise the color palettes. In the experiment condition,
the model-refined palette. participants were allowed to use NL2Color only. We did not choose
We developed crowdsourcing questionnaires on Prolifc3 . In each any specific tool, such as Adobe Illustrator [2], as the baseline in the
questionnaire, six problem sets (two about fine-tuned requests and four user study since we found from the formative interviews (Section 3)
about brand-new requests) were randomly assigned to each participant that each novice has his/her own way of exploring new palettes and we
and evaluated one by one. Each problem set contains a pair of an did not reach a conclusion regarding widely-used tools.
original chart and a refinement intent, as well as three new charts
colored with the three color palettes refined in the three conditions. We 6.2.1 Experimental Website Design
referred to these three new charts as “Chart 1”, “Chart 2”, and “Chart We developed an experimental website for the experimental condition
3” to eliminate potential bias in human designers and the models. For based on NL2Color (Fig. 4) which involves five panels. Specifically,
each problem set, we asked participants to respectively rate how each the Original Chart Panel (Fig. 4C) allows a user to upload an original
new chart can meet the refinement request on a 7-point Likert scale (1 - chart for refinement and displays this chart. Once the chart is uploaded,
absolutely not, 7 - absolutely meet). NL2Color would automatically extract its color palette. All extracted
We filtered out unreliable responses if the answers met any of the sets of sequential colors, diverging colors, and the color palette are
following criteria: 1) unreasonably completed the questionnaire too shown in the Color Palette Panel (Fig. 4A). After the chart submission,
quickly and 2) had consistent patterns in ratings. We also made sure that the user could input the refinement intent in natural language in the
each pair of data in our dataset for the crowdsourcing study received Refinement Request Panel (Fig. 4B). There are a Brand-new/Fine-tuned
ratings from at least five participants. For those that remain less than button group and a Get Refined Palettes button under the input box. The
five valid responses, we repeated the aforementioned crowdsourcing Brand-new/Fine-tuned button group is used for users to specify whether
steps until five valid responses were obtained. Finally, we recruited 71 they want a brand-new color palette or a fine-tuned one. Once the user
participants in total and averaged the ratings from different users per clicks on the Get Refined Palettes button, NL2Color would recommend
new chart. Each of these participants is given 0.76 USD as a reward for new color palettes according to the user’s request. The returned new
the valid questionnaire completion and the average duration for each palette alternatives and the thumbnails of the charts colored with them
questionnaire completion is about five minutes. are listed in the Refined Color Palettes Panel (Fig. 4E). The user can
6.1.2 Data Analysis select the new color palette of interest to view it in the Refined Chart
Panel (Fig. 4D).
For each type of request (i.e., brand-new or fine-tuned), we first per-
formed the Shapiro-Wilk test on the ratings of the refined color palette 6.2.2 Participants and Procedures
in the three conditions. The results show that they all followed the
We recruited 12 participants (7 males and 5 females) with diverse
3 https://www.prolific.co/ academic backgrounds through word-of-mouth. They all self-reported
Authorized licensed use limited to: SHENZHEN UNIVERSITY. Downloaded on October 26,2023 at 10:02:53 UTC from IEEE Xplore. Restrictions apply.
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2023.3326522
Fig. 4: Screenshot of the experimental website. The example of color palette refinement showcases that NL2Color meets our DR2 and DR3.
having no domain knowledge in chart design but have the need to create
and modify charts in their daily life.
We discussed with two professional designers who helped us refine
color palettes in the expert-designed condition of the crowd-sourcing
study (Section 6.1) and designed two chart color palette refinement
tasks for our within-subjects user study. To ensure that the two tasks
are of similar difficulty, we chose two charts respectively for the two
tasks whose color palettes contain the same number of colors. Also,
as most of the formative study participants mentioned the without- (a) The original chart for refinement in T1.
reference requests that indicate the change directions of an original
palette, we selected such requests in our task design. The two color
palette refinement tasks are as follows:
• T1: Refine the color palette of the chart (Fig. 5(a)) to make it
more edgy and bold.
• T2: Refine the color palette of the chart (Fig. 5(b)) to make it
more playful and fun to appeal to a younger audience.
After obtaining the participants’ consent, we asked each of them to
complete these two tasks separately in the control and experiment con-
ditions. For each task, participants need to refine the color palette until
they were satisfied with the results without a time limitation. Before
the tasks using NL2Color, we carefully introduced the experimental
website to the participants and gave them 5 minutes to familiarize
themselves with NL2Color. To alleviate the potential order effect, we (b) The original chart for refinement in T2.
counterbalanced the task assignment and the order of the two condi-
tions. We recorded the video of these two user study sessions. At the Fig. 5: The original charts for refinement in the user study.
end of each session, we asked the participants to fill out a questionnaire
on a 7-point Likert scale to rate (1) user confidence in the final refined pare the two conditions regarding each measure. As for the qualitative
new palettes [52]; (2) the cognitive load during the tasks, measured data, two authors of this paper conducted a thematic analysis on the
using the NASA Task Load Index (NASA-TLX) [18]. In the in-task transcripts of the post-study semi-structured interview and identified
survey in the experiment condition, participants were also asked to rate key themes in participants’ feedback.
their perceptions of NL2Color, including the usability, usefulness, and
user satisfaction with the tool [28]. To better understand participants’ 6.2.4 Results
ratings and behavior, we further conducted a semi-structured interview Here we summarize the quantitative results regarding participants’ task
with them upon the completion of the two sessions. completion time, user confidence in the refined color palettes, and
perceived cognitive load during the tasks, as well as qualitative findings
6.2.3 Data Analysis from the user study.
As a series of Shapiro-Wilk tests showed that all quantitative measures Completion time. To inspect how well NL2Color helps users revise
(i.e., user behavior data coded from video recordings and participants’ chart color palettes, we performed a statistical analysis of the partici-
responses on the questionnaires) have significant departures from the pants’ task completion time. As shown in Fig. 6(a), participants using
normal distribution, we conducted Wilcoxon signed-rank tests to com- NL2Color (8.07, [4.40, 11.73] 95% CI) spent significantly less time
Authorized licensed use limited to: SHENZHEN UNIVERSITY. Downloaded on October 26,2023 at 10:02:53 UTC from IEEE Xplore. Restrictions apply.
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2023.3326522
exact colors could achieve their desired effects, how to assign the colors
in the refined palettes to the chart elements, and how to preserve color
relationships in the original palettes.
Usability and usefulness. Participants generally gave positive feed-
back on the usability and usefulness of NL2Color (Fig. 8). In the
post-study interview, they praised our tool as it is “satisfactory” (9/12),
“convenient” (5/12), “intuitive” (3/12), and “user-friendly” (2/12). All
participants appreciated that NL2Color makes it easier to know where
to start modifying color palettes and greatly simplifies the palette re-
finement process. P2 explained, “Since I often only have vague and
abstract needs for palette refinement, I do not know what colors I
want at the beginning. However, the tool provides me with a variety
(a) Task completion time. (b) User confidence of modified palettes from which I can choose one directly or use one
as a basis for simple adjustments to obtain my desired new palettes”.
Fig. 6: Means and standard errors of the participants’ task completion In addition, three participants reported that NL2Color facilitated them
time and user confidence in their refined color palettes (*: p < .05.) to examine and compare alternative refined color palettes intuitively.
Without our tool, users need to choose and try different colors based
(Z = −2.00, p < .05) completing the color palette refinement tasks on subjective feelings until find the proper colors to form the refined
than when they followed their routine practices (14.02, [7.92, 20.12] palettes, which “causes lots of trial-and-error tweaking” (P12). For
95% CI). In the control condition, the participants mainly use three example, as P5 complained, “In my chart palette revision routine, I
categories of tools: 1) six participants applied manual tools, including always find palettes online that seem to meet my refinement requests
Adobe Photoshop [3], Adobe Illustrator [2], Inkscape [6], and Power- but actually do not work well after being applied in the charts. In such
Point, where they selected colors from the color panels to refine the cases, I need to go back to search for other palettes that may satisfy
original color palettes; 2) two participants used the color palette gener- my requirements”. In comparison, NL2Color enables users to compare
ation support tool, Adobe Color [1]; 3) four participants searched for alternative palette designs in the charts adopting them. Moreover, P11
color schemes on color palette recommendation websites (e.g., Material pointed out that NL2Color stimulated her creativity and inspired the
UI [7], Coolors [5], Palettable [8]) to guide their palette refinement. palette refinement process since it recommended designs that she never
Compared to utilizing these tools to complete color palette revision, thought of.
all participants reflected that they preferred to use NL2Color in the
interview. With NL2Color, they did not need to spend a lot of time and 7 DISCUSSION
effort to learn the complex functions of the aforementioned professional In this section, we discuss the generalizability of our work. Built
software (P1, P6-7), search for pre-designed color palettes online for upon the key findings in our user study, we then derive several design
charts to be refined (P4-5, P11), and manually map and substitute the considerations and implications for NLIs for visualization. We also
original colors with those in the refined palettes (P1, P9, and P11). P3 discuss the failure cases and limitations of our research.
added that “Even though sometimes the results returned by NL2Color
still need to be manually fine-tuned, it cost much less time than design- 7.1 Generalizability
ing a new color palette from scratch, especially when the color palettes Although our system, NL2Color, is designed for common charts, our
contain many colors”. proposed system design and pipeline could be easily extended to other
User confidence. The results reveal that compared with the con- types of visualizations (e.g., pictorial visualization and node-link dia-
trol condition (4.42, [3.73, 5.11] 95% CI), participants reported to gram). Our approach could be adapted to help refine the color palettes
have significantly higher confidence (Z = −2.49, p < .05; Fig. 6(b)) of these visualizations based on users’ vague or abstract requests (e.g.,
in their final refined color palettes in the experiment condition (5.67, “more vivid” or “softer”) by 1) adjusting the task description in prompts
[5.04, 6.29] 95% CI). For one thing, in the control condition, partic- and 2) gathering and applying visualization-specific training data for
ipants typically revised the colors in the original palette one by one, few-shot learning. Although the prompt should be tailored to the spe-
or searched for a pre-designed palette and then fine-tuned it to get the cific type of visualization, our prompt design provides guidance for this
final new palette. Hence, they often could only come up with one new and the template of our prompt (Section 4.2.2) could be applicable to
palette after the color palette refinement process. In contrast, NL2Color other visualizations, which contains the task goal, the descriptions of
provides various refined palette alternatives and users can choose the input and expected output, the explanation of special constraints, and
satisfactory one after carefully comparing them, which makes partic- few-shot learning examples.
ipants feel more confident in their final palette design. For another, Meanwhile, color coordination influences the quality of visualiza-
three participants claimed that they usually compromise on the quality tions’ color palettes a lot [45]. The attributes related to color coordina-
of refined palettes due to their limited capability in chart design. On tion can be easily incorporated into our system to improve the quality
the contrary, when using NL2Color, users would pursue higher stan- of color palette refinement and make our tool applicable for more com-
dards (e.g., color harmony, visual appeal) on palette refinement and plex charts (e.g., heat maps, and cartograms) where color coordination
they found that NL2Color could help them well meet their standards, is especially important. For example, we could add additional con-
improving their confidence in refined color palettes (P3). straints, such as color harmony [30] and visual consistency [35], into
Cognitive load. Using Wilcoxon signed-rank tests, we analyzed our prompts to enhance the quality of the refined color palettes.
participants’ cognitive load during the color palette refinement process 7.2 Design Considerations and Implications
on each related dimension in the control condition and experiment
condition. We found significant differences in the Mental Demand (Z = 7.2.1 Balance Automated and Manual Visualization Editing
−2.29, p < .05), Physical Demand (Z = −2.17, p < .05), Temporal Although all the participants in our user study appreciated the conve-
Demand (Z = −2.77, p < .01), Performance (Z = −2.12, p < .05), nience of NL2Color, five of them expressed their demand for manual
and Effort (Z = −2.10, p < .05) dimensions of cognitive load and chart editing. This is because of the inherent uncertainty of vague
marginally significant difference in the Frustration (Z = −1.84, p = or abstract natural language requests. With such refinement requests,
.07) dimension (Fig. 7). Participants explained that they perceived participants sometimes may not obtain satisfactory outcomes from
less cognitive load when using NL2Color because they only needed NL2Color even after going through multiple iterations and hope to
to think about how to accurately communicate their palette revision make more fine-grained adjustments on the basis of the returned results
requirements with our tool instead of being overwhelmed by low-level (e.g., manually fine-tuning the brightness parameter of a certain com-
technical issues they encountered in the control condition, such as what ponent of the returned chart). Hence, we suggest balancing automated
Authorized licensed use limited to: SHENZHEN UNIVERSITY. Downloaded on October 26,2023 at 10:02:53 UTC from IEEE Xplore. Restrictions apply.
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2023.3326522
Fig. 7: Means and standard errors of the participants’ cognitive load in the color palette refinement process on a 7-point Likert scale (+: .05 < p <
.1, *: p < .05, **: p < .01).
hand, in some cases, such as users desire a brighter color palette while
the colors in the original palette are relatively not dark, the difference
between the fine-tuned color palettes and the original ones may not
be noticeable. To resolve this issue, our system can allow users to
explicitly request in prompts that new color palettes should showcase
greater differences from the original ones. On the other hand, the
difference between the multiple options of fine-tuned color palettes
may also not be noticeable. In practical use, we can set a difference
threshold based on the theory of just-noticeable difference [44] to
constrain our system to return diverse color palettes.
Fig. 8: User perception towards NL2Color. In addition, for some complicated and long statements, such as
“Please revise the color palette to embody the essence of a summer sunset
and manual visualization editing in NLIs for visualizations to take the with hues that blend seamlessly from warm oranges to cool blues”,
advantages of both the convenience of automation and the accurateness NL2Color may not segment correctly and provide satisfactory color
of manual operations. palettes. We believe this issue can be mitigated with more training data.
Moreover, our system can decompose the complex revision requests
7.2.2 Improve System Transparency to Promote NLI Discover- into a series of sub-requests, each mapped to a distinct step that can be
ability and Debugging processed by LLMs. In this way, complicated palette refinement can be
In our user study, we observed that users had problems figuring out achieved by chaining and aggregating the results of each step [51].
how to communicate effectively with NL2Color and how to evaluate Our design also has some limitations in terms of capability.
its recommendations. Due to the black box nature of LLMs, users can NL2Color currently only supports modifying color palettes for SVG-
only speculate about the reasons for the outcomes of the system [56]. based charts. This may limit users’ flexibility during the color palette
Therefore, when receiving unexpected results, some users tried different refinement process. Our system could be extended to be applicable to
expressions based on their speculation until obtaining satisfactory color other types of files in the future. For example, we can apply algorithms,
palettes from NL2Color. Even if the returned results of NL2Color are such as K-means clustering [32], enabling the color palette extraction
basically satisfied, the unfamiliarity with the logic of LLMs makes users from PNG-based or JPG-based charts.
doubt whether there are other ways of expressing refinement requests
that may result in better outcomes and users thus keep trying other
expressions. Such issues cost a considerable amount of time during the 8 C ONCLUSION AND F UTURE W ORK
color palette refinement process and negatively affect the effectiveness In this paper, we presented NL2Color, a tool that enables novice users
of our system. Therefore, when designing NLIs, designers can enhance to refine the color palettes of charts using natural language requests.
model transparency and interpretability to prompt NLI discoverabil- The tool uses a dataset of 131 triplets each of which includes an orig-
ity (i.e., users’ awareness of system-supported commands [16]) and inal color palette of a chart, a vague or abstract request, and a new
debugging. For instance, in addition to the results of visualization color palette designed by human experts according to the request. Our
manipulation, we could require LLMs also provide explanations of tool leverages the GPT-3 model to automatically fine-tune or generate
their recommendations. brand-new color palettes by utilizing the triplets in our dataset whose
refinement requests are similar to the user’s input as few-shot prompts.
7.2.3 Learn from Users to Mitigate Ambiguity Issues Through a crowd-sourcing study and a within-subjects user study, we
One pain point we found from the user study is the expression ambiguity demonstrated the effectiveness and usefulness of NL2Color in helping
of vague and abstract revision requests. Consider a statement: “I want novices modify chart color palettes with natural language.
the color palette to have the impression of a dark night”. It is not In the future, we would like to expand our color palette extraction
clear whether the user expects a black or a dark blue palette. This approach to support more file types. Moreover, we will enhance our
issue is hard to resolve using current LLMs. A model that incorporates dataset by collecting more high-quality data. On the one hand, we will
an understanding of user intent is thus required in NLI design. For collect more color palette revision requests with diverse expression
example, NLIs could proactively ask users for further deliberation or styles and patterns so that our system can understand user intents
give users multi-level choices to decode their thoughts behind their in different commands and the expression ambiguity issue can be
ambiguous commands. Moreover, NLIs could learn users’ preferences mitigated. On the other hand, we will invite more designers to help build
from their historical decisions [21] so that they could infer user intents our training dataset and ask different designers to refine the color palette
in ambiguous requests and provide personalized recommendations. for the same pair of an input original chart and a palette refinement
request. This way, our system could learn different design styles from
7.3 Failure Cases and Limitations different designers and provide refined color palettes of diversity and
During the development of NL2Color, we observed some failure cases high quality for a single input revision request to satisfy various user
where our tool generates wrong or bad results. These failure cases preferences. Furthermore, more state-of-art LLMs (e.g., GPT-4) will
mainly appear when fine-tuning the color palettes of charts. On the one be employed to improve our tool’s performance and robustness.
Authorized licensed use limited to: SHENZHEN UNIVERSITY. Downloaded on October 26,2023 at 10:02:53 UTC from IEEE Xplore. Restrictions apply.
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2023.3326522
Authorized licensed use limited to: SHENZHEN UNIVERSITY. Downloaded on October 26,2023 at 10:02:53 UTC from IEEE Xplore. Restrictions apply.
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
This article has been accepted for publication in IEEE Transactions on Visualization and Computer Graphics. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/TVCG.2023.3326522
Authorized licensed use limited to: SHENZHEN UNIVERSITY. Downloaded on October 26,2023 at 10:02:53 UTC from IEEE Xplore. Restrictions apply.
© 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.