Foundation Models & Visualization Insights
Foundation Models & Visualization Insights
Review Article
Abstract Recent studies have indicated that foun- different domains, generally using self-supervision [1].
dation models, such as BERT and GPT, excel at Notable examples include bidirectional encoder
adapting to various downstream tasks. This adap- representations from transformers (BERT) [2] for
tability has made them a dominant force in building natural language processing, VisionTransformer [3]
artificial intelligence (AI) systems. Moreover, a new and InternImage [4] for computer vision, Contrastive
research paradigm has emerged as visualization Language-Image Pretraining (CLIP) [5] for cross-
techniques are incorporated into these models. This modal learning, and the generative pre-trained
study divides these intersections into two research transformer (GPT) series models [6–8] for text
areas: visualization for foundation model (VIS4FM) generation. Unlike traditional machine learning
and foundation model for visualization (FM4VIS). models, foundation models typically possess para-
In terms of VIS4FM, we explore the primary role meters ranging from hundreds of millions to billions
of visualizations in understanding, refining, and eva-
and require extensive training on vast datasets over
luating these intricate foundation models. VIS4FM
several weeks or months. These immense scales of
addresses the pressing need for transparency, explai-
parameters and training data enable foundation
nability, fairness, and robustness. Conversely, in terms
models to capture general knowledge regarding the
of FM4VIS, we highlight how foundation models can
be used to advance the visualization field itself. The world and serve as a “foundation” to effectively adapt
intersection of foundation models with visualizations is to various downstream tasks such as information
promising but also introduces a set of challenges. By extraction, object recognition, image captioning, and
highlighting these challenges and promising oppor- instruction following [1]. To illustrate this, consider
tunities, this study aims to provide a starting point for a BERT model. After pretraining on a substantial
the continued exploration of this research avenue. text corpus to predict randomly masked words, the
BERT model acquires a foundational understanding
Keywords visualization; artificial intelligence (AI);
of natural language. This enables the model to rapidly
machine learning; foundation models;
adapt to various natural language processing tasks,
visualization for foundation model (VIS4FM);
foundation model for visualization (FM4VIS) such as text classification, sentiment analysis, and
question answering. Such tasks often require minimal
task-specific finetuning. Owing to the adaptability of
1 Introduction
foundation models, they have become a leading force
A foundation model is a large-scale machine learning in shaping the creation of versatile, high-performance
model trained on a large amount of data across artificial intelligence (AI) systems across multiple
applications. A recent OpenAI report indicated
1 School of Software, Tsinghua University, Beijing 100084, that approximately 19% of jobs have undergone
China. E-mail: W. Yang, [email protected]; considerable changes, and at least 50% of the tasks
Z. Wang, [email protected]; S. Liu,
were affected by these models [9].
[email protected] ( ).
2 Microsoft, Redmond 98052, USA. E-mail: mengcliu@
In this era of big data and AI, the need
microsoft.com. to visualize large-scale datasets and machine
Manuscript received: 2023-10-04; accepted: 2023-11-15 learning models has been increasingly observed for
399
400 W. Yang, M. Liu, Z. Wang, et al.
efficient analyses. Recent studies have indicated that techniques tailored to large-scale models. On
incorporating humans into the analysis process can the other hand, while foundation models have
make visualization techniques a critical bridge for demonstrated a capability to unlock new dimensions
the human comprehension of complex models [10–17]. of visualization, methods for maximizing their
This enhanced human–AI collaboration facilitates capability and the seamless integration of humans
effective insight communication, informed decision- and AI in developing visualizations are topics
making, and improved AI trustworthiness. A new that remain largely underexplored. Despite the
research paradigm has emerged from incorporating promising potential of combining foundation models
both visualization techniques and foundation models. and visualizations, to the best of our knowledge,
Figure 1 shows the two promising research areas no comprehensive review has been made available
that arise from this paradigm: visualization for on this topic. Previous studies have primarily
foundation model (VIS4FM) and foundation model summarized the intersections between traditional
for visualization (FM4VIS). In VIS4FM, visualization machine learning models (e.g., boosting trees and
is an indispensable mechanism for facilitating convolutional neural networks) and visualizations,
the understanding, analysis, and refinement of such as machine learning for visualization [15, 16, 18]
foundation models. Conversely, FM4VIS focuses on and visualization for machine learning [12–14, 19]. In
how foundation models can be employed to improve this survey, we took initial steps to highlight both the
visualization techniques by adapting them to different challenges and opportunities of this emerging research
visualization-related tasks, such as automatically topic to invite further research.
generate visualizations and communicate richer
insights to users. Embracing these intersections 2 Overview
between foundation models and visualizations will
advance both fields and improve collaboration The intersections between visualizations and foun-
between humans and AI. dation models concern two perspectives: VIS4FM
While the integration of foundation models and and FM4VIS.
visualizations is promising, it also introduces some 2.1 VIS4FM
challenges and new opportunities. On the one VIS4FM focuses on leveraging the power of
hand, the increasing scale and complexity of visualization tools to understand, refine, and evaluate
foundation models make the models difficult to intricate foundation models. Figure 2 shows that
analyze and interpret using traditional manners. foundation models conduct two primary phases:
This highlights the need for novel visualization training and adaptation [1].
Data are the basis for building foundation models
and are critical in determining the performance,
reliability, and ethical standing of the resulting
models. Therefore, ensuring that the data are of high
quality is crucial, such as broad coverage and precise
annotations [46–49]. Given that foundation models
often have billions or even trillions of parameters,
they can learn from vast datasets and absorb both
the beneficial and problematic aspects of the data.
Consequently, the data must be ensured to be not
only extensive but also of high quality. Visualizations
facilitate the data curation process based on
four aspects. First, visualizations guide the data
generation process using real-time feedback regarding
the data coverage and correctness. This allows for
Fig. 1 Intersections between visualizations and foundation models
immediate adjustments to be made such that the
divided into two categories: VIS4FM and FM4VIS. generated data adequately represent the intended
Foundation models meet visualizations: Challenges and opportunities 401
Fig. 2 How visualizations enhance foundation models along the learning pipeline.
scope and have the correct annotations. Second, model is suitable for the downstream tasks. With a
visualization is useful for integrating heterogeneous more comprehensive understanding, model developers
data from multiple sources into a coherent and can then compare multiple finetuned models and
high-quality dataset. This is required for training select the optimal model. In prompt engineering,
successful foundation models. Third, visualization visualization streamlines the trial-and-error process
assists in data selection by providing a visual of crafting effective prompts that lead to desired
representation of the dataset. This simplifies the outputs. In alignment via human feedback, the
identification of high-quality samples. The feedback model is steered toward human preferences based on
provided by users through visualization is used human feedback. Visualizations serve two functions:
to further refine the dataset. Fourth, visualization (1) aid in collecting human feedback to improve the
discloses anomalies and biases in the data and enables training data and (2) offer an interactive platform to
more targeted corrections. This improves both the iteratively refine the model outputs.
efficiency and accuracy of the data correction process. In addition, visualization is a useful technique for
Training is the initial phase in building foundation enhancing the model evaluation process for
models. The models are trained on vast datasets both foundation and adapted models [45]. For
that often contain diverse and general information. quantitative evaluations with clear metrics, visua-
This allows the models to learn a wide range lizations offer users a comprehensive and intuitive
of features, patterns, and knowledge from the understanding of the model performance. In addition,
data. During this phase, visualization is essential given the adaptability of foundation models to
for training diagnoses [50, 51]. First, a model various downstream applications, evaluating their
explanation task is conducted to reveal the working performance across multiple tasks is important. Well-
mechanism of the foundation models. Second, the designed visualizations facilitate efficient comparative
model developers conduct a performance diagnosis to analyses based on different metrics, thereby enabling
identify the root causes of low performance and make users to select the optimal model or obtain insights for
necessary refinements. Finally, an efficiency diagnosis additional refinements. For a qualitative evaluation
is conducted to identify bottlenecks that impair the lacking clear metrics, visualization serves as a valuable
training speed or waste resources during training. tool for incorporating human judgment into the
Foundation models are typically adapted using task- evaluation process. For example, consider open-ended
specific datasets to optimize their performance on questions that lack definitive ground-truth answers;
specific downstream tasks. This adaptation process visualizations can summarize frequent patterns in
refines the general knowledge of the models to model-generated answers and provide an informative
better align them with the desired task outputs. overview. This enables users to evaluate the quality
In this phase, visualizations are employed to of the responses more efficiently. Once low-quality
facilitate the adaptation steering process in three responses are identified, various strategies can be
manners: model finetuning, prompt engineering, and employed to enhance their quality. One such method
alignment via human feedback. In model finetuning, involves enriching the dataset using various instances
visualizations help in understanding the knowledge of the associated problematic questions.
learned by the models and in analyzing whether the Building on the above discussion, Table 1 summa-
402 W. Yang, M. Liu, Z. Wang, et al.
rizes the four main processes in VIS4FM. This table from the presented information [78]. FM4VIS focuses
outlines existing initiatives and highlights areas where on enhancing each phase in this pipeline, from
future research can be beneficial, particularly where data transformation and visual mapping to view
few investigations have been conducted to date. transformation and visual perception.
Data transformation converts raw data into a
2.2 FM4VIS
more suitable format for visualization and analysis
FM4VIS leverages the power of foundation models purposes. Because foundation models are trained on
to create more adaptive, user-friendly, and intelligent diverse datasets, they can be used to perform feature
visualization techniques and systems. These efforts extraction. This is the extraction of meaningful
aim to advance the field of visualization. As illustrated features from complex data for visualization purposes.
in Fig. 3, the visualization pipeline transforms raw It is particularly useful for unstructured data
data into an interpretable visual representation that such as text and images, where traditional feature
allows users to interact with and derive insights engineering methods often produce less informative
Fig. 3 How foundation models enhance visualizations along the visualization pipeline.
Foundation models meet visualizations: Challenges and opportunities 403
features [79]. Foundation models can perform tasks Visual perception is a cognitive process that occurs
such as classification, relationship extraction, and object in the mind. It interprets the visual representation
detection to extract various patterns such as relationships, and translates the colors, shapes, and patterns
trends, and anomalies. These tasks provide visualization back into an understanding of the underlying data.
tools with richer pattern data, thereby enabling a Moreover, users can interact with visualizations,
multi-faceted understanding and analysis. such as by zooming, panning, or selecting specific
Visual mapping determines the way to visually data points. These interactions promote a deeper
represent underlying data. It involves mapping data understanding and reveal further insights. Here,
and their values to certain marks (e.g., points, foundation models can achieve active engagement.
lines, or areas) and visual channels (e.g., positions, Active engagement enhances user interactions in two
colors, or sizes, respectively). Foundation models aspects: direct and predictive. Direct interaction
can enrich this phase by facilitating visualization enhancement employs foundation models to directly
generation, including automatic content generation, simplify user interactions. For example, in the context
style generation, and interaction generation. These of three-dimensional (3D) scatterplots, foundation
models can learn patterns and user preferences from models can refine the shape of a lasso selection
datasets. Therefore, they can recommend or generate to make them more precise and contextually
optimal layouts that highlight important data trends. relevant [80]. In addition to visual selections, these
Moreover, they can understand the context of the models can interpret text descriptions provided
data and suggest appropriate marks and visual by users. For instance, when a user describes a
channels. For example, these models can determine specific pattern or attribute, the models can process
which color palettes best differentiate data categories the description and highlight the corresponding
and which shapes represent specific data points visual patterns on display. Predictive interaction
more effectively. By leveraging foundation models, enhancement uses foundation models to predict and
we can generate more insightful and contextually enhance user interactions for immediate responses
relevant visual representations of the data. With and broader data exploration insights. The predictive
code generation capabilities, foundation models can capabilities of these models can be leveraged to
augment visualization with rich interactions. predict user actions within the visualizations. For
View transformation involves converting abstract example, after observing user interactions with a
visual representations into concrete pixels on a scatter plot, the models can be used to predict
screen. It is crucial to ensure that the final visual where users are likely to click next, streamlining the
representation is effectively communicated to users. exploration process [81]. A more advanced application
During this phase, foundation models play an of these models involves analyzing user interactions.
important role in visualization understanding, Based on how users interact with visualizations, the
which aims to enhance the understanding of models can predict the imminent actions of users as
visualization content and communicate the underlying well as their broader attributes, such as their likely
information to users. First, the models contribute performance on a specific task or even specific aspects
to the distillation and abstraction of key information of their personality [82].
from visual presentations. For example, a foundation Based on the aforementioned discussion, Table 2
model can be finetuned to extract an adaptable summarizes the four main processes in FM4VIS.
visualization template from a set of complex timeline In addition to overviewing existing efforts, Table 2
visualizations [73]. This involves recognizing visual indicates potential research directions that few studies
elements and understanding their hierarchical and have addressed.
relational significance. Second, they amplify the users’
comprehension of visualizations by conveying key
3 Existing VIS4FM efforts
information in an engaging, multi-modal format, such
as using a combination of natural language and visual This section discusses recent works on VIS4FM with a
elements. For example, the models can provide clear focus on data curation, training diagnosis, adaptation
and accurate captions that the visualization designers steering, and model evaluation (Fig. 2). Table 1 lists
aim to communicate through the visualizations [75]. typical examples of each category.
404 W. Yang, M. Liu, Z. Wang, et al.
large volume of profiling data, such as execution time, These methods add task-specific parameters to the
resource utilization, and communication overhead. foundation models and train only new parameters.
To address these issues, Wei et al. [33] proposed This reduces the training complexity and allows
a visual analysis method for diagnosing parallel the adapters and LoRA modules to learn task-
training processes. This method integrates detailed specific knowledge without modifying the weights
information regarding the parallelization strategy into of the foundation models. Consequently, many
a computational graph, which is visualized using publicly available adapters and LoRA modules have
a directed acyclic graph layout. To facilitate the been finetuned for different tasks and datasets [86].
analysis of the profiling data, Wei et al. designed an Understanding what task-specific knowledge is
enhanced Marey’s graph to visualize the execution acquired can help model developers in selecting an
time of the network layers, peak memory of different appropriate adapter or LoRA module for their tasks.
devices, and inter-device communication latency. In For example, Sevastjanova et al. [35] proposed a visual
addition, an aggregation method is employed to analysis method to compare the knowledge learned
handle the large volume of profiling data within by different adapters. The method integrates three
Marey’s graph. types of explanation methods: concept embedding
3.3 Adaptation steering similarity, concept embedding projection, and concept
prediction similarity. These methods are used
Based on the methods used to align models with
to compare the adapters. This method enables
human preferences, existing VIS4FM efforts in
developers to make informed decisions regarding
adaptation steering can be divided into three
which adapter best suits the downstream task of
categories: model finetuning, prompt engineering,
interest.
and alignment via human feedback.
3.3.2 Prompt engineering
3.3.1 Model finetuning
Instead of using traditional finetuning methods,
Model finetuning is a widely used technique for
foundation models can be adapted for downstream
adapting foundation models to downstream tasks
tasks using prompting techniques. A prompt is
by updating the model parameters using task-
a natural language description of a task that
specific training data. In model finetuning, model
makes the task suitable for foundation models.
developers aim to understand the knowledge that
The prompt can significantly influence the model
the models learn and whether this knowledge is
performance, and designing a high-performing
suitable for downstream tasks. Visualizations have
prompt requires deep expertise. To alleviate the
been demonstrated to be effective in providing
burden of manually crafting prompts, Strobelt et
insights into model behavior [34, 50, 83] and
al. [36] developed PromptIDE, which allows users
thus serve as a useful method for accelerating
to construct different prompts, compare their
the finetuning process. For example, Wang et
performance, and interactively refine them. Figure 5
al. [34] developed CommonsenseVIS to analyze the
illustrates the basic workflow. First, the range of
commonsense knowledge learned by the models
variables in a prompt template is specified, and
and whether the knowledge is used in the models’
a comprehensive set of prompts that spans all
reasoning. First, it employs a knowledge graph to
potential combinations can then be generated. The
extract the commonsense knowledge from the input
data. The alignment of the model behavior with
human reasoning is then achieved using the overlap
between the extracted and learned knowledge. Using
interactive visualizations for the alignment, model
developers can effectively understand and diagnose
issues for which the models underperform in terms of
learning. In addition to the finetuning of foundation
models, a growing trend has been observed toward
parameter-efficient methods, such as the adapter [84]
and low-rank adaptation (LoRA) [85] methods. Fig. 5 Prompt engineering workflow in PromptIDE.
Foundation models meet visualizations: Challenges and opportunities 407
generated prompts are evaluated using a small set of human feedback and communicating the associated
validation data with ground-truth labels to provide changes in the model output. Through this human-
quantitative measures. Users can then compare their in-the-loop process, users can iteratively align the
performance and refine the prompt template or a model outputs with their preferences. Recently,
single prompt. Similarly, ScatterShot [37] focuses TaleBrush [41] was developed to support writers
on helping users interactively select informative in iteratively crafting stories. TaleBrush employs
samples and add them to the prompts. It employs line-sketching interactions along with a GPT-based
a clustering technique to organize samples into language model to support writers in dictating
clusters based on task-specific key phrases and character fortune plots in line with the creative goals
offers a performance estimation for each cluster. of the writers. Writers can refine the generated
Low-performance clusters are prioritized for further narrative by editing the text and modifying the initial
exploration and sample selection. For tasks without sketches.
clear quantitative measures, such as text-to-image 3.4 Model evaluation
generation, visualization can assist in exploring
the relationships between the input prompts and Foundation models can be evaluated quantitatively
output results. For example, PromptMagician [38] and qualitatively.
streamlines the interactive refinement of text prompts Quantitative evaluation. Quantitative evaluation
in text-to-image generation tasks. It employs a employs predefined quantitative measures to evaluate
prompt-recommendation model to retrieve prompt- the model performance. Various visualization tech-
image pairs that are similar to the input prompt niques have been developed to enrich the presentation
from a preexisting database. The retrieved pairs of these quantitative measures, thereby offering a
are visualized in a two-dimensional (2D) space comprehensive and intuitive understanding of the
using t-distributed stochastic neighbor embedding model performance [42–44]. For example, Görtler et
(t-SNE) and organized using hierarchical clustering al. [44] developed Neo, which extends traditional
for efficient exploration. Important and relevant confusion matrices to facilitate the evaluation of
prompt keywords are extracted to facilitate prompt classification tasks with complex label structures.
refinement. Recently, the chain-of-thought technique Users can efficiently explore confusion matrices
has emerged as an effective strategy to enhance related to hierarchical or multi-output labels and
the performance of foundation models for handling inspect model confusion patterns.
complex tasks [87]. A chain of thought is a series Qualitative evaluation. Qualitative evaluation
of prompts that breaks down a complex task into lacks clear metrics and often rely on visualizations
a sequence of more manageable sub-tasks. Visual to integrate human judgment into the evaluation
analysis tools can aid users with limited experience process. For example, Chen et al. [45] developed
in authoring their own chains [39, 40]. For example, Uni-Evaluator, a unified evaluation method suitable
Wu et al. [40] developed AIChains, which supports for various tasks in computer vision, including
eight primitive operations that are well suited image classification, object detection, and instance
for language models. An interactive interface was segmentation (Fig. 6). In addition to revealing class-
designed to facilitate the examination and the analysis
of the chain structure and model outputs. Based on
the analysis, users can adjust different granularities,
ranging from refinement within an individual prompt
to modifying the intermediate model outputs and
even restructuring the entire chain.
3.3.3 Alignment via human feedback
Unlike model finetuning and prompt engineering,
model alignment directly utilizes human feedback
to steer the model outputs toward human preferences.
Fig. 6 Uni-Evaluator interface. Reproduced with permission from
Visualization techniques are suitable for collecting Ref. [45], c IEEE 2024.
408 W. Yang, M. Liu, Z. Wang, et al.
level confusion patterns, Uni-Evaluator facilitates finetuned to generate high-quality fact embeddings for
fine-grained examinations of the model capabilities fact interpolation. Similarly, MetaGlyph [53] utilizes
and behaviors at the sample level. For example, when a pretrained sentence-BERT to transform both the
users visually compare model-generated segmentation descriptions of data attributes and data topics into
masks with ground-truth masks, they tend to observe semantic features. MetaGlyph then calculates the
inadequate segmentations of the helicopter rotors. distances between these features and ranks the
These rotors, due to their thin and limited surface attributes according to the distances between the
area, are often overlooked or inadequately segmented attribute descriptions and data topics. Attributes
in the model output. This observation has guided the with smaller distances are prioritized for selection
enhancement of model performance by incorporating and subsequently visualized.
a boundary-based loss specifically for helicopter 4.1.2 Pattern recognition
segmentations. Pattern recognition utilizes the extracted features
to identify a range of patterns that enhance both
4 Existing FM4VIS efforts understanding and analysis. Similar to existing
methods that employ traditional machine learning
This section introduces recent efforts on FM4VIS with
models, foundation models are used to perform
a focus on feature extraction and pattern recognition,
various tasks, such as classification [59–63], object
visualization generation, visualization understanding,
detection [64, 65], and relationship extraction [66].
and active engagement (Fig. 3). Table 2 lists typical
For example, LegalVis [59] employs a finetuned
examples of each category.
Longformer model to identify binding precedents
4.1 Feature extraction and pattern recognition (past legal decisions made by higher courts) in legal
4.1.1 Feature extraction documents. Similarly, Teddy [60] utilizes a finetuned
Feature extraction transforms unstructured data, BERT model to extract fine-grained opinions (e.g.,
such as text and images, into semantic feature cleanliness and service) from review text and convey
vectors. Foundation models pretrained on vast them to data scientists.
datasets often outperform traditional models in 4.2 Visualization generation
this task [1]. These high-quality semantic feature Foundation models have been used to facilitate the
vectors facilitate the advancement of visualization visualization generation process by either directly
techniques. Methods for enhancing visualizations generating visualization content (e.g., visualization
include querying relevant data [52–57] and enriching types, data encodings, and annotations) [67, 68] or
metadata [58]. For example, Erato [52] is a human– generating visualization styles (e.g., color schemes,
machine cooperative system for generating data layout styles, and typographies) [69].
stories (Fig. 7). Once users determine key data facts
4.2.1 Content generation
for the story that they want to focus on, Erato utilizes
an interpolation algorithm to generate intermediate Content generation uses foundation models to
data facts that smoothly connect the different key produce desired visualization content. For example,
data facts. To achieve this, a BERT model is Liu et al. [67] developed ADVISor to generate
visualizations with annotations given tabular data
and natural language questions. In ADVISor, a BERT
model is first finetuned to extract the features of both
the questions and table heads. Subsequently, several
lightweight models are trained to determine the
selected attributes, aggregation types, visualization
types, and annotations that best address target
questions. A corresponding visualization is generated
based on this information. Data Player [68] is
Fig. 7 Erato interface. Reproduced with permission from Ref. [52],
another representative method designed to simplify
c IEEE 2022. the creation of data videos based on static input
Foundation models meet visualizations: Challenges and opportunities 409
visualizations and corresponding narrative text. As used as visualization templates to create similar
illustrated in Fig. 8, Data Player uses OpenAI gpt-3.5- infographics using different data.
turbo and a large language model (LLM) to establish 4.3.2 Information communication
semantic connections between the visualization With the capability of content generation, foundation
components and narrative entities. These semantic models serve as valuable tools for communicating
connections are then used to generate narration– extracted content and underlying information to
animation interplay in the resulting data videos. users [74–76]. For example, Sultanum and Srinivasan [74]
4.2.2 Style generation proposed DataTales to create data-driven articles
Foundation models have been leveraged to produce based on data visualizations. DataTales uses charts as
desired visualization styles. Xiao et al. [69] developed input and leverages OpenAI gpt-3.5-turbo to generate
ChartSpark to simplify the generation of pictorial corresponding narratives and titles. These generated
chart visualizations. ChartSpark employs a text-to- narratives are then linked back to the original chart to
image diffusion model to generate the corresponding improve the readability and overall comprehension of
visualization style for given semantic text prompts. the given data. Liu et al. [75] developed AutoTitle, an
In addition, it can take a chart image as an additional interactive tool designed to generate meaningful titles
input to ensure that the generated visualization for visualizations. It first extracts the underlying data
approximates the given chart. To further enhance the from the visualizations and then computes high-level
quality of the final output, users can utilize image-to- facts through operations such as aggregation and
image generation techniques to improve the harmony comparison. Based on the computed facts, a T5 [88]
and consistency of the generated charts. foundation model is finetuned to generate fluent and
4.3 Visualization understanding informative natural language descriptions.
repetition, low coverage, and incorrect annotations. storage, computational power, and processing time.
Although an initial effort to address undesirable Furthermore, the training of foundation models is
repetition has been made [20], the issues of becoming a serious source of carbon emissions that
low coverage and incorrect annotations remain threaten our environment [90]. Recent studies have
underexplored. For the issue of low coverage, shown that selecting a subset of data for training can
visualizations offer a useful manner of exploring the achieve comparable or even better performance [88,
distribution of generated datasets and identifying 91]. These findings suggest the possibility of reducing
regions with insufficient training samples. Based on computational and environmental costs associated
the findings, users can interactively steer the data with model training. Visualization is a valuable tool
generation strategies to generate more samples in for exploring large-scale datasets and selecting high-
those regions. For the issue of incorrect annotations, quality training data [92, 93]. However, two major
visualizations serve as a powerful tool for users challenges must be addressed.
to enhance the data quality. For example, with The first challenge is scalability. This is particularly
appropriate visualizations, specific subsets in which important in the context of foundation models. The
the samples tend to contain noisy annotations can be large amount of data for training and finetuning
easily identified. These corrections provide valuable these models is too large to fit in memory, increasing
feedback for the foundation models and contribute the difficulty of the simultaneous processing and
to the generation of more accurate data. In addition, visualizing of all the data. This not only calls for
incorrect annotations can be addressed via data out-of-memory sampling techniques but also poses
selection, which is facilitated by visualizations and is real-time interaction challenges for visualization. Out-
discussed in the following. of-memory sampling techniques can be used to present
Data integration. Foundation model training an overview of the data distribution. This allows users
typically requires the collection and preprocessing to examine the general landscape quickly and identify
of vast amounts of data from multiple sources. regions that warrant closer inspection. Users can
Merging these heterogeneous data into a coherent and then zoom in on these targeted regions for a more
high-quality dataset poses considerable complexities, granular analysis. Because new data are not loaded
such as handling data inconsistencies and resolving from memory, studying how to support real-time
semantic differences across different sources. These interactions is worthwhile.
issues often lead to improvements in human The second challenge stems from the unannotated
feedback during the integration process. In this and unstructured nature of training data. Most
context, visualization techniques are typically training data for foundation models, such as images or
crucial in facilitating more efficient data integration text crawled from websites, are unstructured without
and governance processes. One interesting avenue annotations. Their unannotated nature increases the
for future research is the development of a difficulty in evaluating the quality of training data
visualization-guided preprocessing framework that and selecting high-quality samples for training. One
enables interactive adjustments to the preprocessing possible solution is to design multiple metrics to
procedure and continuous monitoring of data integrity. visually summarize the data characteristics from
Another promising avenue is the investigation of different perspectives. The unstructured nature of
visualization techniques that can simultaneously the data poses difficulties for users in quickly
handle the large-scale and heterogeneous natures understanding the content of the samples. Innovative
of training data. These techniques would facilitate visualizations of the data are then required to
comparisons of data distributions from different alleviate the cognitive load. In addition, multi-modal
sources and the identification of inconsistencies. data have been widely used in training foundation
Data selection. The training and adaptation models. However, the visualization of alignments
of foundation models are computationally intense between different modalities remains underexplored
processes and typically require millions or even billions and deserves further investigation.
of training data [8]. This large-scale data requirement The selection of test data shares challenges with
introduces several complexities, including data the selection of training data, including scalability
Foundation models meet visualizations: Challenges and opportunities 411
and the unstructured nature of the data. However, Multi-level interpretation mechanism is tailored to
some differences are noteworthy. The test data offer explanations at varying levels of detail, from
are primarily intended to faithfully convey the high-level overviews to detailed, granular insights. At
performance of the foundation models while exposing the highest level, these explanations provide a general
their potential weaknesses. Therefore, the test data summary of the models’ decision-making logic. This
must cover both the common samples that models is a surface-level interpretation. For example, for a
process regularly and “edge case” samples where text generation task, a surface-level explanation might
the models may fail. Visualization techniques are state, “the model generated this sentence based on
suitable for examining the selection balance between the overall sentiment of the input”. In addition, it can
the two types of samples. Therefore, the integration summarize associated statistics, such as confidence
of visualization techniques with the subset selection and bias scores. The next level provides a component-
method is worth exploring for a well-balanced level interpretation that aims to explain the role
selection. of specific model components, such as particular
5.1.2 Training diagnosis layers or attention heads. For example, “the 10th
Model explanation. The intrinsic nature of attention head focused primarily on the relationships
foundation models is defined by their vast number of between the subject and object in the sentence”. The
parameters. Although this vastness is the source deepest potential level can provide a parameter-level
of the models’ capabilities, it also makes model interpretation. This enables the examination of the
interpretation difficult. Understanding the complex influence and interactions of specific parameters or
interactions, transformations, and computations groups of parameters. This can involve visualizing
within these parameters is challenging. When a the weights, gradients, or activations associated
foundation model produces an output, the output with particular tokens or features. Given the vast
is the result of a cascade of intricate operations amount of data present at each level, an effective
influenced by millions, or even billions, of parameters. sampling method that can easily capture human
Tracing back these operations to identify the exact interest and display the corresponding data is in
reasoning or mechanism is similar to navigating a demand. This has motivated studies on interactive
vast, complex maze without a map. As a model sampling strategies, which require the development
increases in size and complexity, understanding the of interactive visualizations to facilitate the detection
specific factors or processes contributing to the output of different user intents and provide tailored data
becomes increasingly difficult. subsets. These strategies enable users to seamlessly
The aforementioned challenge posed by the scale navigate through complex data layers. For example,
and complexity of foundation models requires they probe deeper into specific areas of interest or
innovative visualization solutions to incorporate approach the issue by taking a step back for a broader
human knowledge into the analysis process. These perspective to enhance the overall understanding of
visualization tools can serve as “lenses” that allow the model functioning.
users to investigate the intricacies of these models Online training diagnosis. With the increasing
and offer insights that can be understood intuitively. complexity of foundation models, their training
In addition, exploration based on rich interaction time typically requires weeks or even months for
techniques is important for explaining foundation high-end GPUs. Traditional offline methods gather
models. These exploration methods aim to distill relevant data after the training process and then feed
the complex behaviors of foundation models into them to an analysis tool. This is less effective in
more understandable forms without compromising reducing unnecessary training trials. Moving the
their essence. This might involve developing multi- visual analysis earlier in the model development
level interpretation mechanism where users can workflow can save vast amounts of time and
select the granularity of the explanation, leverage computational resources, such as by halting ineffective
unsupervised techniques to automatically identify and inefficient training. Therefore, visualization
the most salient features or operations driving the techniques suitable for monitoring results in real time
model decisions, and present them for further analysis. and identifying performance and/or efficiency issues
412 W. Yang, M. Liu, Z. Wang, et al.
must be developed. Two interesting avenues warrant model developers can identify when the model begins
further exploration. to exhibit biases or vulnerabilities that downgrade
The first promising avenue is to support an in- its performance. Subsequently, visualizations can
depth analysis of model performance during model be leveraged as an efficient manner of interactively
training. Although some existing methods, such steering the finetuning process, for example, by
as Tensorboard [94], have supported the online adding more balanced or targeted data. This method
monitoring of the training process, they consider enhances the model performance as well as its
only high-level performance metrics, such as the reliability and robustness.
loss and prediction accuracy. These metrics are too Prompt engineering. Recent studies have shown
abstract to effectively troubleshoot why the model that providing high-quality examples within prompts
does not perform as expected. To address this, it can significantly enhance the model performance.
is necessary to integrate advanced data and model This is known as the in-context learning ability [98].
analysis modules into the visualizations to provide In-context learning is a valuable component of prompt
richer information. By analyzing the sample content engineering. In this setup, prompt engineering is
and how the model processes it, model developers critical for curating and structuring examples that
can obtain more insights into the performance issues can effectively guide the model. To fully leverage
and address them accordingly. the capabilities of foundation models and achieve
The second promising avenue lies in the mana- satisfactory performance, the examples provided
gement of large-scale profiling data for online should be well suited for the downstream task.
diagnoses. Given the rapid generation of profiling However, generating high-quality examples requires
data and input/output overhead associated with expertise and often involves iterative refinement.
transferring data from GPU to memory or even disk This typically involves trial and error. Visualizations
storage, storing all the data and then transferring offer an efficient method to facilitate this refinement
them to a visualization tool for analysis is an process by integrating humans into the analysis
impractical approach. In-situ visualization is a loop [11, 13, 99]. One promising solution involves
promising method for addressing this issue [95]. employing visualizations to illustrate model responses
It generates visualizations directly within the across different in-context examples. The insights
computational environment in which the data derived from the visualizations enable users to
are generated. Although in-situ visualization has evaluate the effectiveness of the constructed examples
been demonstrated to be useful for scientific and identify those most suitable for the current
visualizations [96, 97], whether it can be employed task. Once informed, the users can then refine
to streamline the efficiency diagnosis during model the examples for improved performance. In addition
training remains unexplored. to interactively refining examples for each task,
5.1.3 Adaptation steering another promising direction lies in using visualizations
Model finetuning. After a foundation model is to summarize the general principles for in-context
finetuned for a specific task, it deviates from its example selection [100]. In exploring different
pretrained version. The changes can be in terms of subsets of examples and comparing them, users
performance metrics as well as in model behavior, can summarize the principles that determine which
such as in processing different types of inputs types of examples are beneficial and which are not.
and developing new input–output associations. By These principles contribute to a more systematic and
analyzing these behavior changes, model developers informed example selection to craft effective prompts
can understand how generic knowledge evolves into for the downstream task.
task-specific knowledge and identify where the model Alignment via human feedback. In the model
does not function as expected. Therefore, a promising adaptation process, aligning the model behavior
research opportunity lies in using visualizations to with human preferences is essential. This alignment
effectively monitor behavioral changes and identify improves the user experience by generating more
abnormal behavior during the finetuning process. relevant responses and addresses ethical and societal
With a deep understanding of behavioral changes, concerns [7]. Recently, reinforcement learning from
Foundation models meet visualizations: Challenges and opportunities 413
human feedback has been shown to be effective in availability of publicly finetuned foundation models
aligning model behavior with human preferences [7]. has opened new avenues for the efficient development
This method first trains a reward model directly of AI systems. When confronted with an AI task,
from human feedback, which predicts whether the users can search for and select a preexisting model
response aligns with human preferences (high reward) that fits their needs from a learnware market.
or not (low reward). Subsequently, this reward However, without sufficient expertise, navigating
information guides the optimization of foundation the expansive model space to determine the most
models through reinforcement learning. In this suitable foundation model can be challenging [106].
process, the key lies in collecting high-quality human The challenge lies in facilitating user exploration
feedback and using this data to train a reward by capturing user requirements and recommending
model that accurately captures human preferences. high-performance models. One potential solution
Visualization techniques are suitable for both tasks. is to employ visualization techniques to illustrate
Interactive visualizations have already demonstrated the model space. Using these visualizations, users
their value in enhancing the process of collecting can navigate the complex model space more
human feedback. For example, existing research easily, understand model behaviors, identify model
on interactive data labeling has demonstrated the limitations, and compare models from multiple
effectiveness of employing visualization techniques to perspectives, such as performance scores and resource
facilitate the collection of human-generated data [101– requirements. Such a comprehensive understanding
103]. Moreover, visualizations offer an efficient and comparison enable the identification of an optimal
method for diagnosing the training process of reward model for specific tasks.
models and interactively refining them through 5.1.4 Model evaluation
additional human feedback. A tight integration of
The field of visualization has extensively covered
human feedback into this process better aligns the
reward models with actual human preferences. This quantitative evaluations. Therefore, we discuss the
integration leads to more accurate and reliable reward research challenges and opportunities related to
information for the ongoing optimization of the qualitative evaluations.
foundation model. Evaluating free-form outputs. Recently, foun-
The primary challenges in this context are rooted in dation models have achieved impressive performance
the collection of high-quality human feedback and the in various tasks, particularly in answering open-ended
complexities of integrating visualization techniques questions without definitive ground-truth answers.
into reinforcement learning pipelines. First, collecting However, evaluating the quality of free-form model
high-quality human feedback is difficult, and this responses remains challenging because of the high
difficulty is amplified when the data must be fed variability in possible responses and the absence of
to the reward model that drives the reinforcement clear ground-truth answers. Addressing this challenge
learning. Any errors or biases in the feedback requires human involvement during the evaluation
collection can result in skewed training or unreliable process. However, users are unable to manually
models. Second, although visualization techniques inspect and assess each model response because of
offer the opportunity to collect human-generated the huge volume of data. One possible solution
data more effectively, seamlessly integrating these is to semi-automatically create rules for evaluating
techniques with reinforcement learning pipelines model responses using active learning methods.
presents additional complexities. Balancing real- Visualizations enhance this process by offering a
time interactions with computational efficiency in comprehensive overview of the evaluation rules and
a complex training process is another challenge that their associated model responses. Subsequently, users
must be overcome. can iteratively refine these rules based on their
Model selection. Recently, there has been an preferences. This ultimately leads to more accurate
increasing trend among model developers to upload and reliable evaluations. Another potential solution
their models with metadata (e.g., descriptions, involves using visualizations to highlight responses
model architectures, and resource requirements) to that are difficult for semi-automatic evaluation
learnware markets [86, 104, 105]. The increasing methods and present them to users for manual review.
414 W. Yang, M. Liu, Z. Wang, et al.
To minimize redundancy and simplify this process, it address ethical dilemmas, and assess broader societal
is essential to cluster a massive volume of responses impacts are essential avenues of exploration. These
and summarize the clustering results in an intuitive research directions are essential for advancing the
visual form. area of VIS4FM and ensuring responsible model
Robustness. Many foundation models, such as deployments.
those in the GPT series [6, 8], are generative First, cross-cultural differences significantly affect
models. Although these models demonstrate impre- how individuals perceive and interpret information.
ssive generation abilities, they can misinterpret Cultural factors such as language, beliefs, values, and
inputs or generate off-target or incorrect outputs. norms influence the understanding and acceptance
Such inconsistencies pose challenges for the reliable of foundation models and their explanations.
deployment of these models, particularly in scenarios Therefore, how VIS4FM techniques account for and
where a single error can have significant consequences. adapt to cross-cultural differences in explanation
Therefore, clearly understanding their robustness is generation and presentation applications must be
an urgent need. With this information, users can investigated. This involves studying cultural biases
assess the performance of these models in different in foundation models, developing culture-aware
situations and identify weak areas that require explanation methods, and conducting user studies in
finetuning to improve their performance [107, 108]. diverse cultural contexts to assess the effectiveness
One possible solution is to construct a set of and appropriateness of VIS4FM techniques.
input samples with perturbations and compare the Second, ethical considerations are important for
corresponding model responses with well-designed the development and application of adapted models.
visualizations. This method effectively illustrates Visualization techniques should adhere to ethical
how small changes in the input can affect the principles such as transparency, fairness, privacy, and
model output. This provides insights into the accountability. This includes addressing issues such
robustness and sensitivity of the model. Visualizations as algorithmic bias, discrimination, and the potential
provide an important method used to identify impact of VIS4FM explanations on vulnerable
critical samples for closer examination, interactively populations. Research on specific ethical frameworks
construct perturbated samples for deeper behavioral and guidelines for VIS4FM can help ensure that
insight, and summarize multiple model responses for adapted models with visual explanations are deployed
efficient analysis. Another solution involves analyzing in a responsible and ethical manner.
numerous input samples collected in real-world 5.2 FM4VIS
scenarios to identify potential robustness issues.
Models are often deployed in complex environments, 5.2.1 Feature extraction and pattern recognition
where they encounter a wide range of inputs. The Foundation models offer two notable opportunities
manual examination of each robustness issue is that are unavailable with traditional machine learning
overwhelming. Visualizations offer an effective means models. First, because of their training on more
of exploring and filtering a set of similar inputs that diverse and extensive datasets, foundation models
produce diverse results, which frequently indicates typically generate features of higher quality than
robustness issues. Once these issues are identified, those obtained from traditional machine learning
visualization tools help enable “what–if” analyses. models. These features better reveal the underlying
These analyses examine how the model behaves under patterns in the data, such as clusters [5, 60, 109]
various conditions and then identify specific areas and important insights [52, 110, 111]. These high-
where its robustness could be improved. quality features and patterns facilitate the design
Fairness. Given that foundation models are of suitable visualizations used to analyze the data.
increasingly being deployed in diverse cultural Second, previous feature extraction methods have
contexts and used by diverse user groups, it is crucial primarily focused on single-modality data, such as
to prioritize culturally sensitive, ethically sound, and latent Dirichlet allocation for textual data [112]
socially aligned explanations provided by VIS4FM and the scale-invariant feature transform for image
techniques. Consequently, how VIS4FM techniques data [113]. Recent research efforts have been made
can effectively navigate cross-cultural differences, to train multi-modality foundation models, such as
Foundation models meet visualizations: Challenges and opportunities 415
CLIP [5], to map multi-modality data into one unified Style generation. In computer vision, style
feature space. This enables researchers to design a transfer refers to the technique of applying the
unified visualization of multi-modal data to facilitate visual style of one image to the content of another
the disclosure of inter-modality relationships within image [120]. This often involves a content and a
the data. style image. The algorithm reconfigures the content
5.2.2 Visualization generation image to assume the artistic style of the style image.
Prompted content generation. As widely studied For instance, StyleGAN [121] leverages generative
foundation models, LLMs have demonstrated a adversarial networks to distill the style cues from
capability to generate source code given natural reference images. The incorporation of style-based
language prompts. For example, Code LLAMA generator layers offers fine-grained control over the
has exhibited state-of-the-art performance on several image attributes. This improves the quality and
public code generation benchmarks [114]. An versatility of the generated images. Currently, these
interesting avenue for future research could be style-transfer models remain within the domain of
to democratize visualization designs by extending natural image generation. However, the principles
these capabilities to automatically generate advanced behind style transfer offer potential applications
visualizations. By integrating well-known engines, beyond the visual arts. They open avenues to other
such as D3 [115] and matplotlib [116], this method fields, such as visualization. This remains an open but
simplifies the process for individuals without prior important research avenue for effectively harnessing
experience in visualization design. They can be used style transfer techniques in the field of visualization.
to create advanced visual data representations and This extension would allow users to easily transfer
address complex challenges. Although the execution stylistic elements from one visualization to another.
of this concept seems intuitive using existing public Moreover, it serves as a valuable resource for users
APIs, it has not been fully implemented. Several with limited programming skills and facilitates the
research efforts are still underway to improve creation of user-centric visualizations with minimal
the quality of generated visualizations. First, the efforts. This makes complex data more accessible
development of a visualization-related instruction- and understandable to a broader audience. A
tuning dataset is critical. Currently, visualization critical challenge in this endeavor is preserving the
codes such as the D3 code comprise only a data integrity in transferred visualizations. Unlike
small portion of the training corpus of LLMs. natural images, visualization is a visual form of
Therefore, developing a dataset containing both data. Therefore, a reliable representation of these
instructions and accompanying visualization code data is critical. Current style transfer techniques,
is necessary to improve the performance of creating when applied to visualization, may introduce subtle
different visualization components with LLMs. The changes in visual elements, such as line-length
importance of visualization-specific datasets has been adjustments. This may lead to perceptual errors.
demonstrated using existing automatic graph layout A promising research opportunity lies in adapting
methods [117]. Using such datasets and leveraging style transfer models to incorporate the original data
advanced finetuning techniques, such as reinforcement used to generate visualizations, thereby ensuring data
learning from human feedback, can significantly integrity when transferring styles. Another challenge
enhance the code-generation capabilities of a model is the automatic recommendation of styles, which is
in the visualization field. Second, prompt engineering complicated by the multifaceted intricacies of human
is essential to ensure that the generated visualizations perception and divergent individual preferences. For
align with user intent. Existing research has shown example, one user might prioritize clarity and
that different prompts substantially influence the simplicity, whereas another might focus on intricate
output generated by LLMs [118]. Therefore, effective details and vibrant color schemes. Additionally,
prompts are critical. To alleviate human efforts in the cultural background, professional training, and mood
tedious prompt curation process, recent techniques, can influence what a user finds engaging or easy to
such as automatic prompt optimization [119], can be interpret. These varying factors make the automatic
leveraged. process of recommending styles a complex endeavor
416 W. Yang, M. Liu, Z. Wang, et al.
because the system must account for a wide range of adapting current foundation models to understand
subjective preferences. complex visualizations is the lack of domain-specific
Interaction generation. Interaction enables data. Currently, existing public datasets in the
users to tailor their views according to specific visualization field often focus on simple charts such
information requirements. This serves as a cor- as bar and line charts [15]. Therefore, the creation of
nerstone for effective data exploration and a public dataset that contains complex visualizations
analysis. However, creating intuitive and responsive and extracted insights is critical. Another challenge
interactions is a challenge that requires expertise lies in identifying contextually relevant information
in both visualization techniques and programming. that matches the analytical focus. Interactive
The code-generation capabilities of foundation visualizations often excel at conveying useful patterns
models offer significant opportunities. An interesting embedded in large amounts of data. For example,
avenue for research is the simplified interaction the visualization of a social network may present
design. As with the aforementioned prompted multiple interesting sub-communities that deserve
content generation, users can implement basic exploration. A tailored summary of the sub-
interactions by describing their intent using natural communities of interest is often more beneficial than a
language. The challenge lies in the ambiguities generic overview of the entire network. Consequently,
that natural languages often present [18]. This the task of capturing the analytical focus of users
increases the difficulty of describing complex inter- and dynamically extracting relevant patterns and
active functionalities clearly. Therefore, extending tailored summaries for visualization has emerged as
foundation models to accept other types of inputs, a promising avenue for future investigation.
such as sketches and video examples, is an exciting Visual-question-answering-based communi-
opportunity for producing more accurate interaction cation. In computer vision, the development of
designs. At a more advanced level, foundation models machine learning models to answer questions about
have the potential to simplify the programming of an image is an active research topic called visual
complex interactions such as multi-stage animation question answering [123]. Using foundation models,
scheduling and sophisticated visual effects. However, users can engage in free-form and open-ended
ensuring that the generated code satisfies quality dialog regarding visualizations. This alleviates the
standards remains an issue. Hence, a potential cognitive load of understanding the visualizations.
avenue for future research is the development of To achieve this, two key aspects must be considered.
automatic quality assurance mechanisms that can First, the model must have a robust linguistic
evaluate and refine the code generated by foundation comprehension capability and possess a large amount
models. of knowledge to effectively address open-ended
5.2.3 Visualization understanding questions regarding the visualizations. While some
Content extraction. Previous research has high- foundation models have achieved remarkable accuracy
lighted the enhanced reasoning capabilities inherent rates exceeding 90% on the CommonsenseQA
in foundation models [6]. Using these capabilities, benchmark dataset [124], the ability to answer open-
visualization researchers can adapt foundation models ended questions regarding visualization remains a
to comprehend complex visualizations, such as topic for further study. Second, contextual awareness
node-link diagrams and tree maps, and extract is a critical component that enables a smooth,
key information for in-depth analyses [122]. For multi-round dialog experience in foundation models.
example, when presented with a node-link diagram Currently, chat-centric models such as ChatGPT have
representing a complex social network, foundation demonstrated the ability to deliver desired results
models can effectively identify key information conditioned on previous user prompts in the dialog [7].
such as influential users, sub-communities, and Adding the underlying data to the prompts can help
their connections. Descriptive captions and concise the foundation model understand the visualizations
summaries of this information can be generated more precisely and answer numerical questions.
and presented alongside visualizations. This greatly However, the incorporation of data into the prompts
facilitates comprehension. A critical challenge in raises scalability issues. Directly incorporating all
Foundation models meet visualizations: Challenges and opportunities 417
the data into the prompts is inefficient, as well as and AI agents poses two challenges.
unfeasible given the large volume of data. To solve The first challenge lies in finetuning a foundation
this, the development of data abstraction techniques model capable of automatically generating interaction
(e.g., sampling [125, 126] and statistical summary) is sequences to extract useful patterns. To alleviate the
necessary to extract the most important data closely efforts in interacting with different visual analysis
linked to the generated visualizations. tools, foundation models can be used to generate
5.2.4 Active engagement interaction sequences, which are then used to
Direct interaction enhancement. Currently, automatically extract pattern candidates. Domain
several widely used interactions such as brushing experts need only examine these candidates and find
and zooming have been overlooked in the training the most relevant patterns for further analysis. The
of foundation models. Consequently, these models second challenge is the efficient adaptation of the
struggle to understand and enhance such user foundation model to specific visual analysis tools and
interactions. Two potential solutions exist to domain experts. To achieve this, the capacity of
address this gap. A straightforward solution is to the model must be boosted for in-context learning.
convert these interactions into formats that current The foundation model should be able to learn from
foundation models can readily understand. For a few example interaction sequences performed by
example, mouse-click interactions can be converted experts and then extract more patterns from similar
into textual descriptions and fed to LLMs. A more interactions.
promising solution involves training or adapting
foundation models to understand these interactions 6 Conclusions
directly. Encouragingly, initial efforts have The intersection of foundation models and visua-
been made to enhance model capabilities in this lizations represents a substantial step in the
direction. For example, DragGAN enables users to advancement of AI systems. On the one hand,
manipulate objects within images using drag-and- VIS4FM is crucial in explaining the complexities
drop interactions [127]. These efforts are notable of foundation models. This highlights the growing
steps toward expanding the capabilities of interaction- need for transparency, explainability, fairness, and
aware foundation models. robustness in the expanding role of AI. On the other
Predictive interaction enhancement. Recen- hand, FM4VIS provides new pathways for further
tly, several initiatives have been implemented to advances in visualization techniques. Although
enhance the capabilities of foundation models by integrating these two fields presents certain challenges,
creating foundation-model-based AI agents [128]. their potential benefits and advancements are
These AI agents are designed to mimic human undeniable. The challenges must be confronted
behaviors and typically include various modules, such directly while embracing the vast opportunities that
as perception, memory, planning, and reflection, each lie ahead. This confluence not only promises a
of which is often supported by a foundation model. brighter future for AI and visualization but also
Such agents can actively identify human feedback encourages a sustained journey of discovery and
and incorporate it into their reflection module, which innovation in this emerging research topic.
adapts their actions to subsequent steps based on this
feedback [129]. Employing these AI agents is feasible Fundings
for visual analyses. Traditional approaches require
domain experts to manually examine data through This work was supported by the National Natural
visualizations and identify patterns through sequences Science Foundation of China (Grant Nos. U21A20469
of interactions. This process is time-consuming and and 61936002), the National Key R&D Program of
expertise-dependent. By contrast, AI agents may China (Grant No. 2020YFB2104100), and grants from
help simplify this analysis process by generating the Institute Guo Qiang, THUIBCS, and BLBCI.
similar interaction sequences based on the interaction
sequences performed by domain experts. However, Author contributions
achieving productive collaboration between humans Weikai Yang: Conceptualization, Writing - Original
418 W. Yang, M. Liu, Z. Wang, et al.
Draft, Writing - Review Editing. Mengchen Liu: [6] Brown, T. B.; Mann, B.; Ryder, N.; Subbiah, M.;
Conceptualization, Writing - Original Draft, Writing - Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.;
Review Editing. Wang Zheng: Writing - Original Sastry, G.; Askell, A.; et al. Language models are few-
Draft, Writing - Review Editing. Shixia Liu: shot learners. In: Proceedings of the 34th Conference
Conceptualization, Supervision, Writing - Original on Neural Information Processing Systems, 1877–1901,
Draft, Writing - Review Editing, Funding acquisition. 2020.
[7] Ouyang, L.; Wu, J.; Jiang, X.; Almeida, D.;
Wainwright, C.; Mishkin, P.; Zhang, C.; Agarwal,
Acknowledgements S.; Slama, K.; Ray, A.; et al. Training language
The authors thank Dr. Xiting Wang, Dr. Changjian models to follow instructions with human feedback.
Chen, Jun Yuan, Yukai Guo, Jiangning Zhu, and In: Proceedings of the 36th Conference on Neural
Duan Li for their valuable comments. Information Processing Systems, 27730–27744, 2022.
[8] OpenAI; Achiam, J.; Adler, S.; Agarwal, S.; Ahmad,
L.; Akkaya, I.; Aleman, F. L.; Almeida, D.;
Declaration of competing interest
Altenschmidt, J.; Altman, S. GPT-4 technical report.
The authors have no competing interests to declare arXiv preprint arXiv:2303.08774, 2023.
that are relevant to the content of this article. [9] Eloundou, T.; Manning, S.; Mishkin, P.; Rock, D.
GPTs are GPTs: An early look at the labor market
References impact potential of large language models. arXiv
preprint arXiv:2303.10130, 2023.
[1] Bommasani, R.; Hudson, D. A.; Adeli, E.; Altman, [10] Liu, S.; Wang, X.; Liu, M.; Zhu, J. Towards
R.; Arora, S.; von Arx, S.; Bernstein, M. S.; better analysis of machine learning models: A visual
Bohg, J.; Bosselut, A.; Brunskill, E.; et al. On the analytics perspective. Visual Informatics Vol. 1, No.
opportunities and risks of foundation models. arXiv 1, 48–56, 2017.
preprint arXiv:2108.07258, 2021. [11] Choo, J.; Liu, S. Visual analytics for explainable deep
[2] Devlin, J.; Chang, M. W.; Lee, K.; Toutanova. K. learning. IEEE Computer Graphics and Applications
BERT: Pretraining of deep bidirectional transformers Vol. 38, No. 4, 84–92, 2018.
for language understanding. In: Proceedings of the [12] Hohman, F.; Kahng, M.; Pienta, R.; Chau, D. H.
Conference of the North American Chapter of the Visual analytics in deep learning: An interrogative
Association for Computational Linguistics: Human survey for the next frontiers. IEEE Transactions on
Language Technologies, Volume 1 (Long and Short Visualization and Computer Graphics Vol. 25, No. 8,
Papers), 4171–4186, 2019. 2674–2693, 2019.
[3] Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; [13] Yuan, J.; Chen, C.; Yang, W.; Liu, M.; Xia, J.; Liu,
S. A survey of visual analytics techniques for machine
Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani,
learning. Computational Visual Media Vol. 7, No. 1,
M.; Minderer, M.; Heigold, G.; Gelly, S.; et al.
3–36, 2021.
An image is worth 16x16 words: Transformers for
[14] Sacha, D.; Kraus, M.; Keim, D. A.; Chen, M. VIS4ML:
image recognition at scale. In: Proceedings of the
An ontology for visual analytics assisted machine
International Conference on Learning Representations,
learning. IEEE Transactions on Visualization and
2021.
Computer Graphics Vol. 25, No. 1, 385–395, 2019.
[4] Wang, W.; Dai, J.; Chen, Z.; Huang, Z.; Li, Z.;
[15] Wang, Q.; Chen, Z. T.; Wang, Y.; Qu, H. A survey on
Zhu, X.; Hu, X.; Lu, T.; Lu, L.; Li, H.; et al. ML4VIS: Applying machine learning advances to data
InternImage: Exploring large-scale vision foundation visualization. IEEE Transactions on Visualization and
models with deformable convolutions. In: Proceedings Computer Graphics Vol. 28, No. 12, 5134–5153, 2022.
of the IEEE/CVF Conference on Computer Vision [16] Wu, A.; Wang, Y.; Shu, X.; Moritz, D.; Cui, W.;
and Pattern Recognition, 14408–14419, 2023. Zhang, H.; Zhang, D.; Qu, H. AI4VIS: Survey on
[5] Radford, A.; Kim, J. W.; Hallacy, C.; Ramesh, A.; artificial intelligence approaches for data visualization.
Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, IEEE Transactions on Visualization and Computer
P.; Clark, J.; et al. Learning transferable visual models Graphics Vol. 28, No. 12, 5049–5070, 2022.
from natural language supervision. In: Proceedings [17] Wang, J.; Liu, S.; Zhang, W. Visual analytics for
of the 38th International Conference on Machine machine learning: A data perspective survey. arXiv
Learning, 8748–8763, 2021. preprint arXiv:2307.07712, 2023.
Foundation models meet visualizations: Challenges and opportunities 419
[18] Shen, L.; Shen, E.; Luo, Y.; Yang, X.; Hu, X.; Zhang, [29] Li, Y.; Wang, J.; Dai, X.; Wang, L.; Yeh, C. C M.;
X.; Tai, Z.; Wang, J. Towards natural language Zheng, Y.; Zhang, W.; Ma, K. L. How does attention
interfaces for data visualization: A survey. IEEE work in vision transformers? A visual analytics
Transactions on Visualization and Computer Graphics attempt. IEEE Transactions on Visualization and
Vol. 29, No. 6, 3121–3144, 2023. Computer Graphics Vol. 29, No. 6, 2888–2900, 2023.
[19] Liu, S.; Wang, X.; Collins, C.; Dou, W.; Ouyang, F.; [30] Yeh, C.; Chen, Y.; Wu, A.; Chen, C.; Viégas,
El-Assady, M.; Jiang, L.; Keim, D. A. Bridging text F.; Wattenberg, M. AttentionViz: A global view
visualization and mining: A task-driven survey. IEEE of transformer attention. IEEE Transactions on
Transactions on Visualization and Computer Graphics Visualization and Computer Graphics Vol. 30, No.
Vol. 25, No. 7, 2482–2504, 2019. 1, 262–272, 2024.
[20] Reif, E.; Kahng, M.; Petridis, S. Visualizing linguistic [31] Li, Z.; Wang, X.; Yang, W.; Wu, J.; Zhang, Z.; Liu, Z.;
diversity of text datasets synthesized by large language Sun, M.; Zhang, H.; Liu, S. A unified understanding
models. arXiv preprint arXiv:2305.11364, 2023. of deep NLP models for text classification. IEEE
[21] Jin, Z.; Wang, X.; Cheng, F.; Sun, C.; Liu, Q.; Transactions on Visualization and Computer Graphics
Qu, H. ShortcutLens: A visual analytics approach Vol. 28, No. 12, 4980–4994, 2022.
for exploring shortcuts in natural language [32] Zhang, X.; Ono, J. P.; Song, H.; Gou, L.; Ma,
understanding dataset. IEEE Transactions on K. L.; Ren, L. SliceTeller: A data slice-driven
Visualization and Computer Graphics doi: 10.1109/ approach for machine learning model validation. IEEE
TVCG.2023.3236380, 2023. Transactions on Visualization and Computer Graphics
[22] Chen, C.; Yuan, J.; Lu, Y.; Liu, Y.; Su, H.; Yuan, Vol. 29, No. 1, 842–852, 2023.
S.; Liu, S. OoDAnalyzer: Interactive analysis of [33] Wei, Y.; Wang, Z.; Wang, Z.; Dai, Y.; Ou,
out-of-distribution samples. IEEE Transactions on G.; Gao, H.; Yang, H.; Wang, Y.; Cao, C. C.;
Visualization and Computer Graphics Vol. 27, No. 7, Weng, L.; et al. Visual diagnostics of parallel
3335–3349, 2021. performance in training large-scale DNN models.
[23] Yang, W.; Li, Z.; Liu, M.; Lu, Y.; Cao, K.; IEEE Transactions on Visualization and Computer
Maciejewski, R.; Liu, S. Diagnosing concept drift Graphics doi: 10.1109/TVCG.2023.3243228, 2023.
with visual analytics. In: Proceedings of the [34] Wang, X.; Huang, R.; Jin, Z.; Fang, T.; Qu, H.
IEEE Conference on Visual Analytics Science and CommonsenseVIS: Visualizing and understanding
Technology, 12–23, 2020. commonsense reasoning capabilities of natural
[24] Liu, S.; Chen, C.; Lu, Y.; Ouyang, F.; Wang, language models. IEEE Transactions on Visualization
B. An interactive method to improve crowdsourced and Computer Graphics Vol. 30, No. 1, 273–283, 2024.
annotations. IEEE Transactions on Visualization and [35] Sevastjanova, R.; Cakmak, E.; Ravfogel, S.;
Computer Graphics Vol. 25, No. 1, 235–245, 2019. Cotterell, R.; El-Assady, M. Visual comparison of
[25] Xiang, S.; Ye, X.; Xia, J.; Wu, J.; Chen, Y.; Liu, language model adaptation. IEEE Transactions on
S. Interactive correction of mislabeled training data. Visualization and Computer Graphics Vol. 29, No. 1,
In: Proceedings of the IEEE Conference on Visual 1178–1188, 2023.
Analytics Science and Technology, 57–68, 2019. [36] Strobelt, H.; Webson, A.; Sanh, V.; Hoover, B.;
[26] Bäuerle, A.; Neumann, H.; Ropinski, T. Classifier- Beyer, J.; Pfister, H.; Rush, A. M. Interactive and
guided visual correction of noisy labels for image visual prompt engineering for ad-hoc task adaptation
classification tasks. Computer Graphics Forum Vol. with large language models. IEEE Transactions on
39, No. 3, 195–205, 2020. Visualization and Computer Graphics Vol. 29, No. 1,
[27] Li, R.; Xiao, W.; Wang, L.; Jang, H.; Carenini, 1146–1156, 2023.
G. T3-Vis: Visual analytic for Training and fine- [37] Wu, S.; Shen, H.; Weld, D. S.; Heer, J.; Ribeiro,
Tuning Transformers in NLP. In: Proceedings of M. T. ScatterShot: Interactive In-context example
the Conference on Empirical Methods in Natural curation for text transformation. In: Proceedings of
Language Processing: System Demonstrations, 220– the Proceedings of the 28th International Conference
230, 2021. on Intelligent User Interfaces, 353–367, 2023.
[28] DeRose, J. F.; Wang, J.; Berger, M. Attention flows: [38] Feng, Y.; Wang, X.; Wong, K. K.; Wang, S.; Lu,
Analyzing and comparing attention mechanisms in Y.; Zhu, M.; Wang, B.; Chen, W. PromptMagician:
language models. IEEE Transactions on Visualization Interactive prompt engineering for text-to-image
and Computer Graphics Vol. 27, No. 2, 1160–1170, creation. IEEE Transactions on Visualization and
2021. Computer Graphics Vol. 30, No. 1, 295–305, 2024.
420 W. Yang, M. Liu, Z. Wang, et al.
[39] Wu, T.; Jiang, E.; Donsbach, A.; Gray, J.; Molina, A.; supervision for object detection. IEEE Transactions
Terry, M.; Cai, C. J. PromptChainer: Chaining large on Visualization and Computer Graphics Vol. 28, No.
language model prompts through visual programming. 4, 1941–1954, 2022.
In: Proceedings of the CHI Conference on Human [50] Liu, M.; Shi, J.; Li, Z.; Li, C.; Zhu, J.; Liu, S. Towards
Factors in Computing Systems, Article No. 359, 2022. better analysis of deep convolutional neural networks.
[40] Wu, T.; Terry, M.; Cai, C. J. AI chains: Transparent IEEE Transactions on Visualization and Computer
and controllable human-AI interaction by chaining Graphics Vol. 23, No. 1, 91–100, 2017.
large language model prompts. In: Proceedings of [51] Liu, M.; Shi, J.; Cao, K.; Zhu, J.; Liu, S. Analyzing
the CHI Conference on Human Factors in Computing the training processes of deep generative models.
Systems, Article No. 385, 2022. IEEE Transactions on Visualization and Computer
[41] Chung, J. J. Y.; Kim, W.; Yoo, K. M.; Lee, Graphics Vol. 24, No. 1, 77–87, 2018.
H.; Adar, E.; Chang, M. TaleBrush: Sketching [52] Sun, M.; Cai, L.; Cui, W.; Wu, Y.; Shi, Y.; Cao,
stories with generative pretrained language models. N. Erato: Cooperative data story editing via fact
In: Proceedings of the CHI Conference on Human interpolation. IEEE Transactions on Visualization
Factors in Computing Systems, Article No. 209, 2022. and Computer Graphics Vol. 29, No. 1, 983–993, 2023.
[42] Alsallakh, B.; Hanbury, A.; Hauser, H.; Miksch, [53] Ying, L.; Shu, X.; Deng, D.; Yang, Y.; Tang,
S.; Rauber, A. Visual methods for analyzing T.; Yu, L.; Wu, Y. MetaGlyph: Automatic
probabilistic classification data. IEEE Transactions generation of metaphoric glyph-based visualization.
on Visualization and Computer Graphics Vol. 20, No. IEEE Transactions on Visualization and Computer
12, 1703–1712, 2014. Graphics Vol. 29, No. 1, 331–341, 2023.
[43] Ren, D.; Amershi, S.; Lee, B.; Suh, J.; Williams, [54] Guo, Y.; Han, Q.; Lou, Y.; Wang, Y.; Liu, C.; Yuan,
J. D. Squares: Supporting interactive performance X. Edit-history vis: An interactive visual exploration
analysis for multiclass classifiers. IEEE Transactions and analysis on wikipedia edit history. In: Proceedings
on Visualization and Computer Graphics Vol. 23, No. of the IEEE 16th Pacific Visualization Symposium,
1, 61–70, 2017. 157–166, 2023.
[44] Görtler, J.; Hohman, F.; Moritz, D.; Wongsuphasawat, [55] Tu, Y.; Qiu, R.; Wang, Y. S.; Yen, P. Y.; Shen,
K.; Ren, D.; Nair, R.; Kirchner, M.; Patel, K. H. W. PhraseMap: Attention-based keyphrases
Neo: Generalizing confusion matrix visualization to recommendation for information seeking. IEEE
hierarchical and multi-output labels. In: Proceedings Transactions on Visualization and Computer Graphics
of the CHI Conference on Human Factors in Vol. 30, No. 3, 1787–1802, 2024.
Computing Systems, Article No. 408, 2022. [56] Li, X.; Wang, Y.; Wang, H.; Wang, Y.; Zhao, J.
[45] Chen, C.; Guo, Y.; Tian, F.; Liu, S.; Yang, W.; Wang, NBSearch: Semantic search and visual exploration of
Z.; Wu, J.; Su, H.; Pfister, H.; Liu, S. A unified computational notebooks. In: Proceedings of the CHI
interactive model evaluation for classification, object Conference on Human Factors in Computing Systems,
detection, and instance segmentation in computer Article No. 308, 2021.
vision. IEEE Transactions on Visualization and [57] Narechania, A.; Karduni, A.; Wesslen, R.; Wall,
Computer Graphics Vol. 30, No. 1, 76–86, 2024. E. VITALITY: Promoting serendipitous discovery
[46] Liu, S.; Andrienko, G.; Wu, Y.; Cao, N.; Jiang, L.; Shi, of academic literature with transformers & visual
C.; Wang, Y. S.; Hong, S. Steering data quality with analytics. IEEE Transactions on Visualization and
visual analytics: The complexity challenge. Visual Computer Graphics Vol. 28, No. 1, 486–496, 2022.
Informatics Vol. 2, No. 4, 191–197, 2018. [58] Shi, C.; Nie, F.; Hu, Y.; Xu, Y.; Chen, L.; Ma,
[47] Jiang, L.; Liu, S.; Chen, C. Recent research X.; Luo, Q. MedChemLens: An interactive visual
advances on interactive machine learning. Journal tool to support direction selection in interdisciplinary
of Visualization Vol. 22, No. 2, 401–417, 2019. experimental research of medicinal chemistry. IEEE
[48] Chen, C.; Wang, Z.; Wu, J.; Wang, X.; Guo, L. Z.; Li, Transactions on Visualization and Computer Graphics
Y. F.; Liu, S. Interactive graph construction for graph- Vol. 29, No. 1, 63–73, 2023.
based semi-supervised learning. IEEE Transactions [59] Resck, L. E.; Ponciano, J. R.; Nonato, L. G.;
on Visualization and Computer Graphics Vol. 27, No. Poco, J. LegalVis: Exploring and inferring precedent
9, 3701–3716, 2021. citations in legal documents. IEEE Transactions on
[49] Chen, C.; Wu, J.; Wang, X.; Xiang, S.; Zhang, Visualization and Computer Graphics Vol. 29, No. 6,
S. H.; Tang, Q.; Liu, S. Towards better caption 3105–3120, 2023.
Foundation models meet visualizations: Challenges and opportunities 421
[60] Zhang, X.; Engel, J.; Evensen, S.; Li, Y.; Demiralp, transformers with localization and encoding for
Ç.; Tan, W. C. Teddy: A system for interactive review chart question answering. In: Proceedings of
analysis. In: Proceedings of the CHI Conference on the Conference on Empirical Methods in Natural
Human Factors in Computing Systems, Article No. Language Processing, 3275–3284, 2020.
108, 2020. [71] Ma, W.; Zhang, H.; Yan, S.; Yao, G.; Huang, Y.; Li,
[61] Wu, Y.; Xu, Y.; Gao, S.; Wang, X.; Song, W.; Nie, H.; Wu, Y.; Jin, L. Towards an efficient framework for
Z.; Fan, X.; Li, Q. LiveRetro: Visual analytics for data extraction from chart images. In: Document
strategic retrospect in livestream E-commerce. IEEE Analysis and Recognition – ICDAR 2021. Lecture
Transactions on Visualization and Computer Graphics Notes in Computer Science, Vol. 12821. Lladós, J.;
Vol. 30, No. 1, 1117–1127, 2024. Lopresti, D.; Uchida, S. Eds. Springer Cham, 583–597,
[62] Ouyang, Y.; Wu, Y.; Wang, H.; Zhang, C.; Cheng, F.; 2021.
Jiang, C.; Jin, L.; Cao, Y.; Li, Q. Leveraging historical [72] Song, S.; Li, C.; Sun, Y.; Wang, C. VividGraph:
medical records as a proxy via multimodal modeling Learning to extract and redesign network graphs
and visualization to enrich medical diagnostic learning. from visualization images. IEEE Transactions on
IEEE Transactions on Visualization and Computer Visualization and Computer Graphics Vol. 29, No.
Graphics Vol. 30, No. 1, 1238–1248, 2024. 7, 3169–3181, 2023.
[63] Tu, Y.; Li, O.; Wang, J.; Shen, H. W.; Powalko, P.; [73] Chen, Z. T.; Wang, Y.; Wang, Q.; Wang, Y.; Qu, H.
Tomescu-Dubrow, I.; Slomczynski, K. M.; Blanas, Towards automated infographic design: Deep learning-
S.; Jenkins, J. C. SDRQuerier: A visual querying based auto-extraction of extensible timeline. IEEE
framework for cross-national survey data recycling. Transactions on Visualization and Computer Graphics
IEEE Transactions on Visualization and Computer Vol. 26, No. 1, 917–926, 2020.
Graphics Vol. 29, No. 6, 2862–2874, 2023. [74] Sultanum, N.; Srinivasan, A. DATATALES:
Investigating the use of large language models for
[64] Chen, Z.; Yang, Q.; Shan, J.; Lin, T.; Beyer, J.;
authoring data-driven articles. In: Proceedings of the
Xia, H.; Pfister, H. IBall: Augmenting basketball
IEEE Visualization and Visual Analytics, 231–235,
videos with gaze-moderated embedded visualizations.
2023.
In: Proceedings of the CHI Conference on Human
Factors in Computing Systems, Article No. 841, 2023. [75] Liu, C.; Guo, Y.; Yuan, X. AutoTitle: An inter-
active title generator for visualizations. IEEE
[65] Chen, Z. T.; Yang, Q.; Xie, X.; Beyer, J.; Xia, H.;
Transactions on Visualization and Computer Graphics
Wu, Y.; Pfister, H. Sporthesia: Augmenting sports
doi: 10.1109/TVCG.2023.3290241, 2023.
videos using natural language. IEEE Transactions on
[76] Song, S.; Chen, J.; Li, C.; Wang, C. GVQA: Learning
Visualization and Computer Graphics Vol. 29, No. 1,
to answer questions about graphs with visualizations
918–928, 2023.
via knowledge base. In: Proceedings of the CHI
[66] Tu, Y.; Xu, J.; Shen, H. W. KeywordMap: Attention-
Conference on Human Factors in Computing Systems,
based visual exploration for keyword analysis. In:
Article No. 464, 2023.
Proceedings of the IEEE 14th Pacific Visualization
[77] Adhikary, J.; Vertanen, K. Text entry in virtual
Symposium, 206–215, 2021.
environments using speech and a midair keyboard.
[67] Liu, C.; Han, Y.; Jiang, R.; Yuan, X. ADVISor: IEEE Transactions on Visualization and Computer
Automatic visualization answer for natural-language Graphics Vol. 27, No. 5, 2648–2658, 2021.
question on tabular data. In: Proceedings of the IEEE
[78] Card, S. K.; Mackinlay, J. D.; Shneiderman, B.
14th Pacific Visualization Symposium, 11–20, 2021.
Readings in Information Visualization: Using Vision
[68] Shen, L.; Zhang, Y.; Zhang, H.; Wang, Y. Data to Think. San Francisco, CA, USA: Academic Press,
player: Automatic generation of data videos with 1999.
narration-animation interplay. IEEE Transactions on [79] Zhou, C.; Li, Q.; Li, C.; Yu, J.; Liu, Y.; Wang,
Visualization and Computer Graphics Vol. 30, No. 1, G.; Zhang, K.; Ji, C.; Yan, Q.; He, L.; et al.
109–119, 2024. A comprehensive survey on pretrained foundation
[69] Xiao, S.; Huang, S.; Lin, Y.; Ye, Y.; Zeng, W. Let models: A history from BERT to ChatGPT. arXiv
the chart spark: Embedding semantic context into preprint arXiv:2302.09419, 2023.
chart with text-to-image generative model. IEEE [80] Chen, Z. T.; Zeng, W.; Yang, Z.; Yu, L.; Fu, C.
Transactions on Visualization and Computer Graphics W.; Qu, H. LassoNet: Deep lasso-selection of 3D
Vol. 30, No. 1, 284–294, 2024. point clouds. IEEE Transactions on Visualization and
[70] Singh, H.; Shekhar, S. STL-CQA: Structure-based Computer Graphics Vol. 26, No. 1, 195–204, 2020.
422 W. Yang, M. Liu, Z. Wang, et al.
[81] Ottley, A.; Garnett, R.; Wan, R. Follow the clicks: [93] Yang, W.; Wang, X.; Lu, J.; Dou, W.; Liu, S.
Learning and anticipating mouse interactions during Interactive steering of hierarchical clustering. IEEE
exploratory data analysis. Computer Graphics Forum Transactions on Visualization and Computer Graphics
Vol. 38, No. 3, 41–52, 2019. Vol. 27, No. 10, 3953–3967, 2021.
[82] Brown, E. T.; Ottley, A.; Zhao, H.; Lin, Q.; [94] Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.;
Souvenir, R.; Endert, A.; Chang, R. Finding Waldo: Chen, Z.; Citro, C.; Corrado, S. G.; Davis, A.; Dean,
Learning about users from their interactions. IEEE J.; Devin, M.; et al. TensorFlow: Large-scale machine
Transactions on Visualization and Computer Graphics learning on heterogeneous distributed systems. arXiv
Vol. 20, No. 12, 1663–1672, 2014. preprint arXiv:1603.04467, 2016.
[83] Wexler, J.; Pushkarna, M.; Bolukbasi, T.; [95] Ma, K. L. In situ visualization at extreme scale:
Wattenberg, M.; Viégas, F.; Wilson, J. The what-if Challenges and opportunities. IEEE Computer
tool: Interactive probing of machine learning models. Graphics and Applications Vol. 29, No. 6, 14–19, 2009.
IEEE Transactions on Visualization and Computer [96] Rapp, T.; Peters, C.; Dachsbacher, C. Image-based
Graphics Vol. 26, No. 1, 56–65, 2020. visualization of large volumetric data using moments.
[84] Houlsby, N.; Giurgiu, A.; Jastrzebski, S.; Morrone, IEEE Transactions on Visualization and Computer
B.; De Laroussilhe, Q.; Gesmundo, A.; Attariyan, M.; Graphics Vol. 28, No. 6, 2314–2325, 2022.
Gelly, S. Parameterefficient transfer learning for NLP. [97] Richer, G.; Pister, A.; Abdelaal, M.; Fekete, J. D.;
In: Proceedings of the 36th International Conference Sedlmair, M.; Weiskopf, D. Scalability in visualization.
on Machine Learning, 2790–2799, 2019. IEEE Transactions on Visualization and Computer
[85] Hu, E. J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Graphics doi: 10.1109/TVCG.2022.3231230, 2022.
Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: [98] Dong, Q.; Li, L.; Dai, D.; Zheng, C.; Wu, Z.; Chang,
Low-rank adaptation of large language models. In: B.; Sun, X.; Xu, J.; Li, L.; Sui, Z. A survey on in-
Proceedings of the International Conference on context learning. arXiv preprint arXiv:2301.00234,
Learning Representations, 2021. 2022.
[86] AdapterHub. Available at https://adapterhub.ml/ [99] Liu, S.; Xiao, J.; Liu, J.; Wang, X.; Wu, J.; Zhu,
[87] Wei, J.; Wang, X.; Schuurmans, D.; Bosma, M.; Ichter, J. Visual diagnosis of tree boosting methods. IEEE
B.; Xia, F.; Chi, E.; Le, Q.; Zhou, D. Chain-of-thought Transactions on Visualization and Computer Graphics
prompting elicits reasoning in large language models. Vol. 24, No. 1, 163–173, 2018.
In: Proceedings of the 36th Conference on Neural [100] Yuan, J.; Liu, M.; Tian, F.; Liu, S. Visual analysis
Information Processing Systems, 24824–24837, 2022. of neural architecture spaces for summarizing design
[88] Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Liu, principles. IEEE Transactions on Visualization and
P. J. Exploring the limits of transfer learning with a Computer Graphics Vol. 29, No. 1, 288–298, 2023.
unified text-to-text transformer. Journal of Machine [101] Khayat, M.; Karimzadeh, M.; Zhao, J.; Ebert, D. S.
Learning Research Vol. 21, No. 1, 5485–5551, 2020. VASSL: A visual analytics toolkit for social spambot
[89] Wang, Y.; Hou, Z.; Shen, L.; Wu, T.; Wang, J.; Huang, labeling. IEEE Transactions on Visualization and
H.; Zhang, H.; Zhang, D. Towards natural language- Computer Graphics Vol. 26, No. 1, 874–883, 2020.
based visualization authoring. IEEE Transactions on [102] Bernard, J.; Zeppelzauer, M.; Lehmann, M.; Müller,
Visualization and Computer Graphics Vol. 29, No. 1, M.; Sedlmair, M. Towards user-centered active
1222–1232, 2023. learning algorithms. Computer Graphics Forum Vol.
[90] Schwartz, R.; Dodge, J.; Smith, N. A.; Etzioni, O. 37, No. 3, 121–132, 2018.
Green AI. Communications of the ACM Vol. 63, No. [103] Yang, W.; Ye, X.; Zhang, X.; Xiao, L.; Xia, J.;
12, 54–63, 2020. Wang, Z.; Zhu, J.; Pfister, H.; Liu, S. Diagnosing
[91] Zhou, C.; Liu, P.; Xu, P.; Lyer, S.; Sun, J.; Mao, ensemble few-shot classifiers. IEEE Transactions on
Y.; Ma, X.; Efrat, A.; Yu, P.; Yu, L.; et al. LIMA: Visualization and Computer Graphics Vol. 28, No. 9,
Less is more for alignment. In: Proceedings of the 3292–3306, 2022.
37th Conference on Neural Information Processing [104] Zhou, Z. H.; Tan, Z. H. Learnware: Small models do
Systems, 2024. big. Science China Information Sciences Vol. 67, No.
[92] Zhou, Y.; Yang, W.; Chen, J.; Chen, C.; Shen, Z.; 1, Article No. 112102, 2023.
Luo, X.; Yu, L.; Liu, S. Cluster-aware grid layout. [105] HuggingFace. Available at https://huggingface.co/
IEEE Transactions on Visualization and Computer models
Graphics Vol. 30, No. 1, 240–250, 2024. [106] Wang, Q.; Yuan, J.; Chen, S.; Su, H.; Qu, H.; Liu,
Foundation models meet visualizations: Challenges and opportunities 423
S. Visual genealogy of deep neural networks. IEEE [119] Pryzant, R.; Iter, D.; Li, J.; Lee, Y. T.; Zhu,
Transactions on Visualization and Computer Graphics C.; Zeng, M. Automatic prompt optimization with
Vol. 26, No. 11, 3340–3352, 2020. “gradient descent” and beam search. arXiv preprint
[107] Cao, K.; Liu, M.; Su, H.; Wu, J.; Zhu, J.; Liu, arXiv:2305.03495, 2023.
S. Analyzing the noise robustness of deep neural [120] Jing, Y.; Yang, Y.; Feng, Z.; Ye, J.; Yu, Y.; Song, M.
networks. IEEE Transactions on Visualization and Neural style transfer: A review. IEEE Transactions
Computer Graphics Vol. 27, No. 7, 3289–3304, 2021. on Visualization and Computer Graphics Vol. 26, No.
[108] Liu, M.; Liu, S.; Su, H.; Cao, K.; Zhu, J. Analyzing 11, 3365–3385, 2020.
the noise robustness of deep neural networks. In: [121] Abdal, R.; Qin, Y.; Wonka, P. Image2StyleGAN: How
Proceedings of the IEEE Conference on Visual to embed images into the StyleGAN latent space?
Analytics Science and Technology, 60–71, 2018. In: Proceedings of the IEEE/CVF International
[109] Qiu, R.; Tu, Y.; Wang, Y. S.; Yen, P. Y.; Shen, H. Conference on Computer Vision, 4432–4441, 2019.
W. DocFlow: A visual analytics system for question- [122] Chen, Q.; Cao, S.; Wang, J.; Cao, N. How
based document retrieval and categorization. IEEE does automation shape the process of narrative
Transactions on Visualization and Computer Graphics visualization: A survey of tools. IEEE Transactions
Vol. 30, No. 2, 1533–1548, 2024. on Visualization and Computer Graphics doi:
[110] Shi, D.; Xu, X.; Sun, F.; Shi, Y.; Cao, N. Calliope: 10.1109/TVCG.2023.3261320, 2023.
Automatic visual data story generation from a [123] Antol, S.; Agrawal, A.; Lu, J.; Mitchell, M.; Batra,
spreadsheet. IEEE Transactions on Visualization and D.; Zitnick, C. L.; Parikh, D. VQA: Visual question
Computer Graphics Vol. 27, No. 2, 453–463, 2021. answering. In: Proceedings of the IEEE International
[111] Chen, Q.; Chen, N.; Shuai, W.; Wu, G.; Xu, Z.; Tong, Conference on Computer Vision, 2425–2433, 2015.
H.; Cao, N. Calliope-net: Automatic generation of [124] Anil, R.; Dai, A. M.; Firat, O.; Johnson, M.; Lepikhin,
graph data facts via annotated node-link diagrams. D.; Passos, A.; Shakeri, S.; Taropa, E.; Bailey, P.;
IEEE Transactions on Visualization and Computer Chen, Z.; et al. PaLM 2 technical report. arXiv
Graphics Vol. 30, No. 1, 562–572, 2024. preprint arXiv:2305.10403, 2023.
[112] Blei D. M.; Ng A. Y.; Jordan, M. I. Latent dirichlet [125] Zhao, Y.; Jiang, H.; Chen, Q. A.; Qin, Y.; Xie,
allocation. Journal of Machine Learning Research Vol. H.; Wu, Y.; Liu, S.; Zhou, Z.; Xia, J.; Zhou, F.
3, 993–1022, 2003. Preserving minority structures in graph sampling.
[113] Lowe, D. G. Object recognition from local scale- IEEE Transactions on Visualization and Computer
invariant features. In: Proceedings of the 7th IEEE Graphics Vol. 27, No. 2, 1698–1708, 2021.
International Conference on Computer Vision, 1150– [126] Yuan, J.; Xiang, S.; Xia, J.; Yu, L.; Liu, S.
1157, 1999. Evaluation of sampling methods for scatterplots.
[114] Rozière, B.; Gehring, J.; Gloeckle, F.; Sootla, S.; Gat, IEEE Transactions on Visualization and Computer
L.; Tan, X. E.; Adi, Y.; Liu, J.; Sauvestre, R.; Remez, Graphics Vol. 27, No. 2, 1720–1730, 2021.
T.; et al. Code Llama: Open foundation models for [127] Pan, X.; Tewari, A.; Leimkühler, T.; Liu, L.; Meka,
code. arXiv preprint arXiv:2308.12950, 2023. A.; Theobalt, C. Drag your GAN: Interactive point-
[115] Bostock, M.; Ogievetsky, V.; Heer, J. D3 Data-Driven based manipulation on the generative image manifold.
Documents. IEEE Transactions on Visualization and In: Proceedings of the Special Interest Group
Computer Graphics Vol. 17, No. 12, 2301–2309, 2011. on Computer Graphics and Interactive Techniques
[116] Hunter, J. D. Matplotlib: A 2D graphics environment. Conference, Article No. 78, 2023.
Computing in Science and Engineering Vol. 9, No. 3, [128] Wang, L.; Ma, C.; Feng, X.; Zhang, Z.; Yang, H.;
90–95, 2007. Zhang, J.; Chen, Z.; Tang, J.; Chen, X.; Lin, Y.; et al.
[117] Kwon, O. H.; Ma, K. L. A deep generative model A survey on large language model based autonomous
for graph layout. IEEE Transactions on Visualization agents. arXiv preprint arXiv:2308.11432, 2023.
and Computer Graphics Vol. 26, No. 1, 665–675, 2020. [129] Park, J. S.; O’Brien, J.; Cai, C. J.; Morris,
[118] Zamfirescu-Pereira, J. D.; Wong, R. Y.; Hartmann, M. R.; Liang, P.; Bernstein, M. S. Generative
B.; Yang, Q. Why johnny can’t prompt: How non- agents: Interactive simulacra of human behavior. In:
AI experts try (and fail) to design LLM prompts. In: Proceedings of the 36th Annual ACM Symposium on
Proceedings of the CHI Conference on Human Factors User Interface Software and Technology, Article No.
in Computing Systems, Article No. 437, 2023. 2, 2023.
424 W. Yang, M. Liu, Z. Wang, et al.