Statistics 695V: Data Visualization, Spring 2005
Data Visualization theless has a research orientation. The objectives
are to provide students with the opportunity to
All areas of learning from data — statistics, ma-
• understand basic frameworks for data visual-
chine learning, and data mining — can benefit im-
ization
mensely from data visualization. Visualization pro-
vides a front line of attack in the analysis of data, re- • understand in depth selected tools for data vi-
vealing intricate structure that cannot be absorbed sualization
in any other way. We discover unimagined effects, • experience using data visualization to display
and we challenge imagined ones. data
• experience surveying, evaluating, and in
Many approaches to learning from data today in- some cases improving research ideas in a spe-
volve the use of complex tools that extent tailor cific area of data visualization
themselves to the patterns of the data, a form of au- • experience giving a research talk
tomated learning. But human guidance from data • experience writing a research paper.
visualizations added to the algorithms can vastly
increase their performance. Other approaches in-
volve the building of complex statistical models that
are used to describe relationships among variables
Prerequisites, Class Limit, Permis-
and to carry out probabilistic inductive inference. sions, and Questions
Data visualization enables the building of models
that follow the patterns of the data, resulting in Prerequisites are
valid inferences. • probability: basic
• statistics: basic, including least-squares fitting
Tools matter. There are exceptionally powerful vi- of parametric functions to data
sualization tools, and there are others, some well
• mathematics: calculus and linear algebra
known, that rarely outperform the best ones. The
• data visualization: no previous knowledge
analyst learning from data needs to be hard-boiled
needed
in evaluating the efficacy of a visualization tool. It
is easy to be dazzled by a display of data, especially
if it is rendered with color or depth. Our tendency The level of the course will be that of the book Vi-
is to be misled into thinking we are absorbing rele- sualizing Data by the instructor.
vant information when we see a lot. But the success Permission of the instructor is required. The class
of a visualization tool should be based solely on the will be limited to 20 students. Send questions about
amount we learn through the data about the phe- the course or requests for permission to attend to
nomenon under study. Some tools in the course are the instructor at wsc@[Link].
new and some are old, but all have a proven record
of success in the analysis of common types of statis-
tical data that arise in science and technology.
Student Responsibilities
Students will form groups of size one up to a maxi-
Statistics 695V is a Research Course mum size that will be determined by the class size.
Each group will select either (1) a methodological
While the prerequisites do not require previous area of data visualization or (2) a set of data. If (1),
knowledge of data visualization, the course never- a group will
1
Computing Environments
• read papers in the area
• give a talk on the area A research computing environment will be pro-
vided for students to carry out their work. This new
• prepare a paper on the area that reviews and
instructional endeavor to provide such an environ-
evaluates it, and if desired, try out a new idea
ment is being carried out by ITaP and the Statistics
and discuss in the paper.
Department. The operating system will be linux
running on a cluster of computers with the S lan-
If (2), a group will guage (S-PLUS and R implementations).
• use a collection of visualization tools to study
the data set
• evaluate the use of the tools to show impor-
Proposed Instructor Lecture Topics
tant characteristics of the data and lead to Lectures will take material in part from the book
conclusions about the subject matter. Visualizing Data. In addition there will be lectures
• give a talk on the conclusions about the trellis display framework for visualizing
• prepare a paper on the conclusions. multidimensional databases, visualization of mas-
sive databases, and the design principles of the trel-
lis display visualization framework in the S lan-
The TA will help facilitate the formation of groups, guage. Written material, papers and book chapters,
which should be completed by Jan. 21. Each group will be provided for the additional lecture topics.
should discuss the selection of the area or data set
with the instructor. Groups should be formed and
topics selected by Feb. 9.
Course Instructor
The word or latex template from the 2005 IEEE
InfoVis Conference should be used for the pa- William S. Cleveland has been a Professor of Statis-
per. The 2004 templates may be found at tics and Computer Science at Purdue University
[Link]/∼vis/Tasks/[Link] but it is since January 2004. Previous to this he was a Dis-
expected that the 2005 templates will be available tinguished Member of Technical Staff in the Statis-
in Jan. The paper should be sent to the TA electron- tics and Data Mining Research Department at Bell
ically as pdf by May 3. Labs.
Talks should be given from and either the classroom His areas of research include machine learning,
computer or a student computer using the class- data mining, data visualization, statistical methods
room projector. At the end of each presentation, and models, and computer networking.
there will be an active student question and discus-
sion session. Cleveland has introduced tools for local learning,
as well as many visualization tools, that are widely
The course depends heavily on teamwork, joint used in engineering, science, medicine, and busi-
planning, and feedback. It is important that stu- ness. He has participated in the design and imple-
dents attend classes to foster this. mentation of software for these tools that is now a
part of many commercial systems. He has been in-
For all projects students should use S or R for the volved in many projects applying learning and visu-
data visualization. The reason is that there is a par- alization tools to data from several fields including
ticularly rich set of tools in S and R, and the course environmental science, customer opinion polling,
methodology is closely linked with S and R. Part of visual perception, and computer networking.
the course lectures will be devoted to design issues
for the S and R graphics software. His books The Elements of Graphing Data and Visu-
2
alizing Data have been reviewed in many journals tion about them, including reviews, is available at
from a wide variety of disciplines, and Elements was [Link] and [Link]/∼wsc/.
selected for the Library of Science. More informa-