0% found this document useful (0 votes)
79 views21 pages

Science Mapping Software Review

This document reviews and compares several software tools for conducting science mapping and bibliometric analysis. It describes science mapping as building bibliometric maps to describe how research fields are structured conceptually, intellectually and socially. The document outlines the general workflow for science mapping including data retrieval, preprocessing, network extraction, normalization, mapping, analysis and visualization. It then reviews and compares features of different software tools for these steps and conducts a cooperative study using sample data.

Uploaded by

Italo Soares
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views21 pages

Science Mapping Software Review

This document reviews and compares several software tools for conducting science mapping and bibliometric analysis. It describes science mapping as building bibliometric maps to describe how research fields are structured conceptually, intellectually and socially. The document outlines the general workflow for science mapping including data retrieval, preprocessing, network extraction, normalization, mapping, analysis and visualization. It then reviews and compares features of different software tools for these steps and conducts a cooperative study using sample data.

Uploaded by

Italo Soares
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Science Mapping Software Tools: Review, Analysis, and

Cooperative Study Among Tools

M.J. Cobo, A.G. López-Herrera, E. Herrera-Viedma, and F. Herrera


Department of Computer Science and Artificial Intelligence, CITIC-UGR (Research Center
on Information and Communications Technology), University of Granada, E-18071 Granada,
Spain. E-mail: {mjcobo, lopez-herrera, viedma, herrera}@decsai.ugr.es

Science mapping aims to build bibliometric maps that Different approaches have been developed to extract
describe how specific disciplines, scientific domains, networks using the selected units of analysis (authors, doc-
or research fields are conceptually, intellectually, and uments, journals, and terms). Co-word analysis (Callon,
socially structured. Different techniques and software
tools have been proposed to carry out science mapping Courtial, Turner, & Bauin, 1983) uses the most important
analysis. The aim of this article is to review, analyze, words or keywords of the documents to study the concep-
and compare some of these software tools, taking into tual structure of a research field. Co-author analyzes the
account aspects such as the bibliometric techniques authors and their affiliations to study the social structure
available and the different kinds of analysis. and collaboration networks (Gänzel, 2001; Peters & van
Raan, 1991). Finally, the cited references are used to ana-
lyze the intellectual base used by the research field or to
Introduction
analyze the documents that cite the same references. In
Science mapping, or bibliometric mapping, is an important this sense, bibliographic coupling (Kessler, 1963) analyzes
research topic in the field of bibliometrics (Morris & Van Der the citing documents, whereas co-citation analysis (Small,
Veer Martens, 2008; van Eck & Waltman, 2010). It attempts 1973) studies the cited documents. Other approaches such
to find representations of intellectual connections within the as author bibliographic coupling (Zhao & Strotmann, 2008),
dynamically changing system of scientific knowledge (Small, author co-citation (White & Griffith, 1981), journal bibli-
1997). In other words, science mapping aims at displaying the ographic coupling (Gao & Guan, 2009; Small & Koenig,
structural and dynamic aspects of scientific research (Börner, 1977), and journal co-citation (McCain, 1991) are examples
Chen, & Boyack, 2003; Morris & Van Der Veer Martens; of macro analysis using aggregated data.
Noyons, Moed, & Luwel, 1999a). Once the network has been built, a normalization process
The general workflow in a science mapping analysis has is commonly performed over the relation (edges) between its
different steps: data retrieval, preprocessing, network extrac- nodes by using similarity measures. A review of similarity
tion, normalization, mapping, analysis and visualization. At measures used in science mapping was carried out in (van
the end of this process, the analyst has to interpret and obtain Eck & Waltman, 2009).
some conclusions from the results. With the normalized data different techniques can be
There are different bibliometric sources where the data can used to build the map (mapping process; Börner et al.,
be retrieved, such as the ISI Web of Science (WoS) or Sco- 2003). Dimensionality reduction techniques such as princi-
pus. Moreover, a science mapping analysis can be performed pal component analysis or multidimensional scaling (MDS),
using patent or funding data. clustering algorithms and Pathfinder networks (PFNETs) are
The preprocessing step is maybe one of the most important widely used.
ones. The goodness of the result will depend on the quality Analysis methods for science mapping allow us to extract
of the data. Several preprocessing methods can be applied, useful knowledge from data. Network analysis (Carrington,
for example, to detect duplicate and misspelled elements. Scott, & Wasserman, 2005; Cook & Holder, 2006; Skillicorn,
2007; Wasserman & Faust, 1994) allows us to perform a sta-
tistical analysis over the generated maps to show different
Received October 26, 2010; revised February 10, 2010; accepted February measures of the whole network or measures of the relation-
10, 2010 ship or overlapping (the Jaccard’s Index can be used to do
© 2011 ASIS&T • Published online 2 May 2011 in Wiley Online Library that) of the different detected clusters (if a clustering algo-
(wileyonlinelibrary.com). DOI: 10.1002/asi.21525 rithm has been applied). Temporal analysis (Garfield, 1994;

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 62(7):1382–1402, 2011
15322890, 2011, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/asi.21525 by UFPE - Universidade Federal de Pernambuco, Wiley Online Library on [07/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Price & Gürsey, 1975) aims to show the conceptual, intel- Mapping Analysis: A Survey section describes the software
lectual, or social evolution of the research field, discovering tools to be analyzed. In the Comparative Study section, a com-
patterns, trends, seasonality, and outliers. Burst detection parison is made among the described software tools. In the
(Kleinberg, 2003), a particular temporal analysis, aims to Analysis of Generated Maps: A Cooperative Study Among
find features that have high intensity over finite durations Tools section, we show the performance of the software tools
of time periods. Finally, geospatial analysis (Batty, 2003; with a set of data and analyze their possible positive syner-
Leydesdorff & Persson, 2010; Small & Garfield, 1985) aims gies. In the Lessons Learned section, we note some lessons
to discover where something happens and what its impact on learned. Finally, some concluding remarks are made.
neighbouring areas is.
Additionally, visualization techniques are used to repre- Science Mapping
sent a science map and the result of the different analy-
Science mapping or bibliometric mapping is a spatial
ses, for example, the networks can be represented using
representation of how disciplines, fields, specialities, and
heliocentric maps (Moya-Anegón et al., 2005), geomet-
individual documents or authors are related to one another
rical models (Skupin, 2009), thematic networks (Bailón-
(Small, 1999). It is focused on monitoring a scientific field
Moreno, Jurado-Alameda, & Ruíz-Baños, 2006; Cobo,
and delimiting research areas to determine its cognitive
López-Herrera, Herrera-Viedma, & Herrera, 2011), or maps
structure and its evolution (Noyons, Moed, & van Raan,
where the proximity between items represents their similar-
1999b).
ity (Davidson, Wylie, & Boyack, 1998; Polanco, François, &
In this section, different important aspects of a science
Lamirel, 2001; van Eck & Waltman, 2010). To show the evo-
mapping analysis are described, such as: (a) the data sources,
lution in different time periods, Cluster string (Small, 2006;
(b) the units of analysis, (c) the data preprocessing, (d) the
Small & Upham, 2009; Upham & Small, 2010), and thematic
similarity measures that can be used to normalize the rela-
areas (Cobo et al., 2011) can be used.
tions between the units of analysis, (e) the mapping steps,
Although the science mapping analysis can be per-
(f) the types of methods of analysis that can be employed,
formed using generic social network analysis tools such
(g) some visualization techniques, and finally, (h) interpreta-
as Pajek (Batagelj & Mrvar, 1998) and UCINET (Borgatti,
tion of results.
Everett, & Freeman, 2002), or bioinformatics software such
as Cytoscape (Shannon et al., 2003), there are other software
Data Sources
tools specifically developed for this purpose. Some of these
software tools are specifically conceived for scientific science Nowadays, there are several online bibliographic (and also
mapping and others may be used in nonscientific domains. bibliometric) databases where scientific works and doc-
Some of these software tools have been implemented only uments and their citations are stored. These sources
for visualizing science maps and others allow us to visualize of bibliographic information allow us to search and
and also build the maps. A list of generic software tools used retrieve information about the majority of scientific fields.
in scientometrics research is shown in Börner et al. (2010). Undoubtedly, the most important bibliographic databases
The aim of this article is to present a deep comparative are ISI WoS (http://www.webofknowledge.com), Scopus
study of nine representative science mapping software tools (http://www.scopus.com), Google Scholar (http://scholar.
by showing their advantages, drawbacks and most important google.com), and NLM’s MEDLINE (http://www.ncbi.nlm.
differences. We analyze the following software tools: Bibex- nih.gov/pubmed).
cel (Persson, Danell, & Wiborg Schneider, 2009), CiteSpace ISI WoS, Scopus, and Google Scholar do not cover the sci-
II (Chen, 2004, 2006), CoPalRed (Bailón-Moreno, Jurado- entific fields and journals in the same way, as different studies
Alameda, Ruíz-Baños, & Courtial, 2005; Bailón-Moreno show. There are different studies (Bar-Ilan, 2010; Falagas,
et al., 2006), IN-SPIRE (Wise, 1999), Leydesdorff’s Soft- Pitsouni, Malietzis, & Pappas, 2008; Mikki, 2010) that relate
ware, Network Workbench Tool (Börner et al., 2010; Herr, this fact. Moreover, downloading large datasets from Google
Huang, Penumarthy, & Börner, 2007), Science of Science Scholar is difficult and a dump of the entire dataset is not
(Sci2 ) Tool (Sci2 Team, 2009), VantagePoint (Porter & Cun- available.
ningham, 2004), and VOSViewer (van Eck & Waltman, There are other bibliographic sources that can be used,
2010). Each one provides us with its own view of the data due such as: arXiv (http://arxiv.org), CiteSeerX (http://citeseerx.
to the fact that they implement different analysis techniques ist.psu.edu/), Digital Bibliography & Library Project (DBPL;
and algorithms. We should point out that they present comple- http://dblp.uni-trier.de/), SAO/NASA Astrophysics Data
mentary characteristics, and, therefore, it could be desirable System (ADS; http://adswww.harvard.edu/), Science Direct
to take their synergies to perform a complete science mapping (http://www.sciencedirect.com/)
analysis. We complete our analysis by showing the perfor- Patent data and funding data are also frequently used.
mance of all software tools with an example, and analyze Patent data can be retrieved from specific data sources such
some positive synergies among them. as the United States Patent and Trademark Office (USPTO;
This article is organized as follows. In the Science Map- http://www.uspto.gov/) or the Derwent Innovations Index
ping section, some concepts on science mapping are pre- provided by ISI WoS. Funding data can be downloaded from
sented. The Software Tools Designed to Perform a Science the National Science Foundation (http://www.nsf.gov/)

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—July 2011 1383
DOI: 10.1002/asi
15322890, 2011, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/asi.21525 by UFPE - Universidade Federal de Pernambuco, Wiley Online Library on [07/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 1. Bibliometric techniques taxonomy.

Bibliometric technique Unit of analysis used Kind of relation

Bibliographic Author Author’s oeuvres Common references among author’s oeuvres


coupling Document Document Common references among documents
Journal Journal’s oeuvres Common references among journal’s oeuvres
Co-author Author Author’s name Authors’ co-occurrence
Country Country from affiliation Countries’ co-occurrence
Institution Institution from affiliation Institutions’ co-occurrence
Co-citation Author Author’s reference Co-cited author
Document Reference Co-cited documents
Journal Journal’s reference Co-cited journal
Co-word Keyword, or term extracted from Terms’ co-occurrence
title, abstract or document’s body

Units of Analysis in Bibliometric Techniques Bibliographic coupling and co-citation can be extended
using journals and authors. Particularly, author bibliographic
The most common units of analysis in science mapping are
coupling (Zhao & Strotmann, 2008) aims at discovering
journals, documents, cited references, authors (the author’s
co-author relationships between authors that cite the same
affiliation can also be used), and descriptive terms or words
references, whereas journal bibliographic coupling (Gao &
(Börner et al., 2003). The words can be selected from the
Guan, 2009; Small & Koenig, 1977) aims at discovering the
title, abstract, body of the document, or some combinations of
journals that cite the same references. On the other hand,
them. Furthermore, we can select the original keywords of the
author co-citation (White & Griffith, 1981) aims to discover
documents (author’s keywords) or the indexing ones provided
the authors that are frequently cited together, whereas journal
by the bibliographic data sources (e.g., ISI Keywords Plus)
co-citation analysis (McCain, 1991) discovers the journals
as words to analyze.
that are co-cited frequently. Furthermore, journal biblio-
Several relations among the units of analysis can be
graphic coupling and journal co-citation can be extended to a
established. Usually, the units of analysis are used as a co-
category journal level. This supra-level of journal co-citation
occurrence data by the science mapping process, that is, the
has been used to study the marrow of science (Moya-Anegón
similarity between the units of analysis is usually measured
et al., 2007) using the ISI categories.
counting the times that two units appear together in the docu-
Finally, a relation between units can be established using
ments. Furthermore, direct linkage can be used to obtain the
direct linkages, for example, a document-document, author-
relations among units.
author, or journal-journal citation network. Furthermore, a
The relation among units can be represented as a graph
relation can be established using different units, for example,
or network, where the units are the nodes and the rela-
an author-paper (consumed/produced) network.
tions among them represent an edge between two nodes, that
is, by using relationships among units of analysis, different
bibliometric networks can be built. Data Preprocessing
In Table 1, a taxonomy of the most common bibliometric
techniques according to the units of analysis used and the The data retrieved from the bibliographic sources nor-
established relationships among them is presented. mally contains errors, for example, misspelling in the author’s
Different aspects of a research field can be analyzed name, in the journal title, or in the references list. Sometimes,
depending on the selected units of analysis, for example, additional information has to be added to the original data,
by using the authors (co-author or co-authorship analysis) for example, if the author’s address is incomplete or wrong.
the social structure of a scientific field can be analyzed For this reason, a science mapping analysis cannot be applied
(Gänzel, 2001; Peters & van Raan, 1991). Likewise, by using directly to the data retrieved from the bibliographic sources,
the author’s affiliations—co-institution, co-university, or that is to say, a preprocessing process over the retrieved data
co-country—, the international dimension of the research is necessary. In fact, the preprocessing step is perhaps one
field is studied. On the other hand, co-word (Callon et al., of the most important for improving the quality of the units of
1983) analysis is used to show the conceptual structure and analysis (mainly authors and words) and thus to obtain better
the main concepts treated by a field. Co-citation (Small, 1973) results in the science mapping analysis.
and bibliographic coupling (Kessler, 1963) are used to ana- Different preprocessing processes can be applied to pre-
lyze the intellectual structure of a scientific research field. The pare the data to get a good performance in the science
difference between bibliographic coupling and co-citation is mapping analysis:
that bibliographic coupling is a fixed and permanent rela- • Detecting duplicate and misspelling items. Sometimes, there
tionship because it depends on the references contained in are items in the data that represent the same object or concept
coupled documents, whereas co-citation will vary over time but with different spelling, for example, an author’s name
(Jarneving, 2005). can be written in different ways (e.g., Garfield, E.; Eugene

1384 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—July 2011
DOI: 10.1002/asi
15322890, 2011, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/asi.21525 by UFPE - Universidade Federal de Pernambuco, Wiley Online Library on [07/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Garfield), and yet each way represents the same author. In such as principal component analysis or MDS are used to
other cases, a concept can be represented with different words transform the network into a low-dimension space (often
(lexical forms) or acronyms, and yet represent the same con- two-dimension). Clustering algorithms are used to perform
cept. To detect duplicate items and misspelling enables these community detection, splitting the global network into dif-
errors to be fixed. ferent subnetworks. Recently, some authors have proposed
• The time slice process is useful to divide the data into different
new and different clustering algorithms to carry out this
time subperiods, or time slices, to analyze the evolution of
the research field under study. This process is only necessary
task: Streemer (Kandylas, Upham, & Ungar, 2010), spec-
if the science mapping analysis is made in the context of a tral clustering (Chen et al., 2010), modularity maximization
longitudinal study (Garfield, 1994; Price & Gürsey, 1975). (Chen & Redner, 2010). and a bootstrap resampling with a
• Data reduction aims to select the most important data. Nor- significance clustering (Rosvall & Bergstrom, 2010), among
mally, we have a large amount of data. With such a quantity others. Finally, Pathfinder networks (PFNETs) are used
of data, it could be difficult to get good and clear results in to identify the backbone of the network (Quirin, Cordón,
the science mapping analysis. For this reason, it is ordinar- Santamaría, Vargas-Quesada, & Moya-Anegón, 2008;
ily carried out using a portion of the data. This portion could Schvaneveldt, Durso, & Dearholt. 1989). Furthermore, gen-
be, for example, the most cited articles, the most productive eral graph mining techniques (Cook & Holder, 2006;
authors, and the journals with the best performance metrics. Skillicorn, 2007) or social network analysis (Carrington
• Networks preprocessing can be used to select the most impor-
et al., 2005; Wasserman & Faust, 1994) can be used in the
tant nodes of the network of relationships between the units
of analysis (bibliometric networks) according to different
mapping step.
measures, removing the isolated nodes, removing the less The information obtained and the kind of map built will
important links between nodes, etc. depend of the applied technique.

Normalization Process Analysis Methods


When the network of relationships between the selected Once the map has been built, different analyses can be
units of analysis has been built, a transformation is first applied to extract useful knowledge.
applied to the data to derive similarities from the data or, Network analysis (Carrington et al., 2005; Cook & Holder,
more specifically, to normalize the data (van Eck & Waltman, 2006; Skillicorn, 2007; Wasserman & Faust, 1994) allows
2009). us to perform a statistical analysis over the generated map
Different similarity measures have been used in the lit- in the later step, for example, different measures on the
erature, the most popular being Salton’s Cosine (Salton & network, such as the total number of nodes and isolated
McGill, 1983), Jaccard’s Index (Peters & van Raan, 1993), nodes, average degree, the number of weakly connected
Equivalence Index (Callon, Courtial, & Laville, 1991), and components, or the graph density can be measured. If a com-
Association Strength (Coulter, Monarch, & Konda„ 1998; munity detection algorithm was applied to build the map, then
van Eck & Waltman, 2007), which is also known as Prox- Callon’s centrality and density (Callon et al., 1991; Cobo
imity Index (Peters & van Raan, 1993; Rip & Courtial, 1984) et al., 2011) or other values that measure the relationships
or Probabilistic Affinity Index (Zitt, Bassecoulard, & Okubo, among the detected clusters can be used. Moreover, the over-
2000). lapping between the clusters can be measured using, for
Usually, a normalization of the document’s terms is example, the Jaccard’s Index. Furthermore, if documents are
needed; for example, if a co-citation analysis is performed assigned to each cluster, a performed analysis can be car-
and various clusters are detected, then a label should be set ried out to obtain quantitative or qualitative measures of each
to each one. This label should be selected using the most cluster (Cobo et al., 2011).
important document’s terms of the cluster. The text normal- Another important analysis that can be performed in a sci-
ization sets a weight to each term according to its importance ence mapping process is the temporal analysis, which aims to
in the corpus. Different text normalization measures can be identify the nature of phenomena represented by a sequence
applied (Baeza-Yates & Ribeiro-Neto, 1999; Chen, Ibekwe- of observations such as patterns, trends, seasonality, and out-
SanJuan, & Hou, 2010; Salton & McGill, 1983): tf·idf, latent liers. In other words, it aims to analyze the evolution of the
semantic analysis, log-likelihood ratio tests, log entropy, research field across different periods of time. This task can be
mutual information, etc. performed using a longitudinal framework (Garfield, 1994;
Price & Gürsey, 1975).
Burst detection is a kind of temporal analysis. It aims to
The Mapping Step
find features that have a high intensity over finite durations of
The mapping step is the most important one. The pro- time periods. In Kleinberg (2003), an algorithm to deal with
cess itself is responsible for building the map by applying a this problem was described.
mapping algorithm to the whole network formed using the Finally, geospatial analysis (Batty, 2003; Leydesdorff &
relationships among the selected units of analysis. Persson, 2010; Small & Garfield, 1985) aims to answer the
Different techniques have been proposed to build the map question of where something happens and with what impact
(Börner et al., 2003). Dimensionality reduction techniques on neighbouring areas. Geospatial analysis requires spatial

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—July 2011 1385
DOI: 10.1002/asi
15322890, 2011, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/asi.21525 by UFPE - Universidade Federal de Pernambuco, Wiley Online Library on [07/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 2. General information.

Software tool Last version Year Developed by

Bibexcel 2010-09-22 2010 University of Umeå (Sweden)


CiteSpace 2.2.R9 2010 Drexel University (USA)
CoPalRed 1.0 beta 2005 University of Granada (Spain)
IN-SPIRE 5 2010 Pacific Northwest National Laboratory
Leydesdorff’s Software N/A N/A University of Amsterdam (The Netherlands)
Network Workbench Tool 1.0.0 2009 Indiana University (USA)
Science of Science (Sci2 ) Tool 0.0.3 alpha 2010 Indiana University (USA)
VantagePoint 7 2010 Search Technology, Inc.
VOSViewer 1.2.1 2010 Leiden University (The Netherlands)

attribute values or geolocations for the units of analysis; this Interpretation


data is usually extracted from affiliation data.
When the science mapping analysis has finished, the
analyst has to interpret the results and maps using their
experience and knowledge.
Visualization Techniques
In the interpretation step, the analyst looks to discover and
As we have shown in the previous subsection, the output extract useful knowledge that could be used to make decisions
of each analysis method is different. The visualization tech- on which policies to implement.
nique employed is very important to a good understanding
and better interpretation of the output.
Software Tools Designed to Perform a Science
The networks and subnetworks detected in the mapping
Mapping Analysis: A Survey
step can be represented using heliocentric maps (Moya-
Anegón et al., 2005), geometrical models (Skupin, 2009), Although the science mapping analysis can be performed
and thematic networks (Bailón-Moreno et al., 2006; Cobo using generic software for social network analysis (Börner
et al., 2011). Another approach consists of representing the et al., 2010), there are other software tools specifically
networks in a map, where the distance between two items developed for science mapping analysis.
reflects the strength of the relation between both (Davidson In this section, we present nine representative software
et al., 1998; Fabrikant, Montello, & Mark, 2010; Polanco tools specifically developed to analyze scientific domains
et al., 2001; van Eck & Waltman, 2010). A smaller distance by means of science mapping. These software tools are as
generally indicates a stronger relation (van Eck & Waltman, follows:
2010). • Bibexcel (Persson et al., 2009)
If a community detection is applied, then the different • CiteSpace II (Chen, 2004, 2006)
clusters detected (subnetworks) can be categorized using a • CoPalRed (Bailón-Moreno et al., 2005, 2006)
strategic diagram. A strategic diagram (Callon et al., 1991; • IN-SPIRE (Wise, 1999)
Cobo et al., 2011) is a two-dimensional space built by plot- • Leydesdorff’s Software
ting themes according to different measures extracted using • Network Workbench Tool (Börner et al., 2010; Herr et al.,
a post network analysis. 2007)
To show the evolution of detected clusters in successive • Sci2 Tool (Sci2 Team, 2009)
time periods (temporal analysis), different techniques have • VantagePoint (Porter & Cunningham, 2004)
• VOSViewer (van Eck & Waltman, 2010)
been used: Cluster string (Small, 2006; Small & Upham,
2009; Upham & Small, 2010), rolling clustering (Kandylas In Table 2 some details of these software tools are described.
et al., 2010), alluvial diagrams (Rosvall & Bergstrom,
2010), ThemeRiver visualization (Havre, Hetzler, Whitney,
Bibexcel
& Nowell, 2002), and thematic areas (Cobo et al., 2011).
Other authors propose laying out the graph of a given time Bibexcel (http://www.umu.se/inforsk/Bibexcel; Persson
period, taking into account the previous and subsequent ones et al., 2009) is a bibliometric tool developed at the University
(Leydesdorff & Schank, 2008), or to pack synthesized tem- of Umeå (Sweden). This tool was specifically developed to
poral changes into a single graph (Chen, 2004; Chen et al., manage the bibliometric data and build maps, which can be
2010). read by software such as Excel, SPSS, UCINET (Borgatti
Geospatial results are usually visualized over a world or et al., 2002), and Pajek (Batagelj & Mrvar, 1998). Bibexcel
a thematic map. As an example, if a co-author analysis is is freely accessible for academic nonprofit use.
applied and then a community detection is performed, the Bibexcel can read data retrieved from different biblio-
detected clusters of authors can be represented as a network graphic sources, such as ISI Web of Science (WoS), Scopus,
in which each node is laid out over the author’s country. and the Procite export format.

1386 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—July 2011
DOI: 10.1002/asi
15322890, 2011, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/asi.21525 by UFPE - Universidade Federal de Pernambuco, Wiley Online Library on [07/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Bibexcel allows different preprocessing over the textual tf·idf, log-likelihood ratio tests, or mutual information (Chen
data to be performed, for example, an English word stem- et al., 2010).
mer can be applied and duplicate documents can be deleted.
Moreover, Bibexcel enables the deletion of low frequency
CoPalRed
items and keeps only the strongest links.
Different bibliometric networks can be extracted. The CoPalRed (http://ec3.ugr.es/copalred/; Bailón-Moreno
principal ones are as follows: co-citation, bibliographic et al., 2005, 2006) is a commercial software developed by
coupling, co-author, and co-word. Furthermore, different co- the research group EC3 at the University of Granada (Spain).
occurrence matrices can be extracted using any document’s It is specifically designed to perform co-word analysis using
field, or some combination of fields. The matrices can be the keywords of scientific documents. It is described as a
normalized using three different measures: Salton’s Cosine, Knowledge System, which collects the information contained
Jaccard’s Index, and the Vladutz and Cook measures. in databases and transforms it into new knowledge.
To the normalized data, the user can apply a cluster- This software tool reads files in comma-separated val-
ing algorithm or prepare a matrix for an MDS analysis ues format (csv), generated through the reference manager
(using external software). Bibexcel does not present an software Procite.
adequate visualization tool for the output, but it presents dif- One of the strengths of CoPalRed is that it contains a
ferent export options that make data visualization possible preprocessing module that allows users to normalize the key-
using external software like Pajek, UCINET or SPSS. The words in a simple way. With this module, the user can unify
bibliometric networks can also be exported. items (lexical items) that represent the same concept. Once
the keywords are unified, CoPalRed builds a co-occurrence
matrix and normalizes it using the equivalence index (Callon
CiteSpace II
et al., 1991).
CiteSpace (http://cluster.cis.drexel.edu/˜cchen/citespace; CoPalRed performs three kinds of analysis: structural
Chen, 2004, 2006) was developed at Drexel University (USA) analysis, strategic analysis, and dynamic analysis.
and it can be freely downloaded. It is a software tool devel-
• Structural analysis. It shows the knowledge in the form of
oped to detect, analyze, and visualize patterns and trends in thematic networks in which words and their relationships are
scientific literature. Its primary goal is to facilitate the analysis drawn.
of emerging trends in a knowledge domain. • Strategic analysis. It places each thematic network in a relative
CiteSpace can read different formats of bibliographic position within the global thematic network using two criteria:
sources, such as WoS, PubMed, arXiv, and SAO/NASA centrality (or intensity of its external relations) and density
Astrophysics Data System (ADS). Furthermore, CiteSpace (according to their internal cohesion density).
is able to read grants data such as NSF Awards and patent • Dynamic analysis. CoPalRed analyzes the transformations
data from Derwent Innovations Index. of the thematic networks over time. It identifies approaches,
Different kinds of bibliometric networks can be con- bifurcations, appearances, and disappearances of themes.
structed: co-author, co-author institutions, co-author coun- CoPalRed visualizes the results using strategic diagrams,
tries, co-grants,1 subject categories co-occurrence, co-word, themes, and thematic networks (Bailón-Moreno et al., 2005,
documents co-citation, author co-citation, journal co- 2006; López-Herrera et al., 2009, 2010). Each theme has
citation, and documents bibliographic coupling. The assigned a label that is the name of the most central node
networks, or graphs, can be built for different time periods (keyword) of its associated thematic network. Furthermore,
to analyze the evolution of the studied domain. Moreover, each theme can be represented in the strategic diagram as
the analyst can filter the items with which the networks are a sphere, where its volume is proportional to the number
built to select the most important of them (e.g., select the of documents belonging to it. In the same way, each node
100 most cited items from each time slice). The matrix (keyword) of the thematic network can be represented as a
(network) is normalized using Salton’s Cosine, Dice, or the sphere where its volume is proportional to the keyword’s
Jaccard’s Index. frequency.
Once the networks have been built, CiteSpace allows us to
visualize them and perform several analyses on them. CiteS-
pace allows the analyst to perform a spectral clustering and IN-SPIRE
a citation burst detection. In addition, CiteSpace has three IN-SPIRE (http://in-spire.pnl.gov; Wise, 1999) is a com-
visualization modes (Chen, 2006): cluster view, time line, mercial visual document analysis software tool that gives
and time zone. the analyst the ability to uncover relationships, trends, and
If clusters are detected, CiteSpace can assign labels to each themes hidden within data to obtain new knowledge and new
one using the most important terms extracted from the key- insights. IN-SPIRE uses landscape metaphor to help the user
words, title, or abstract. The terms are measured using the to easily discover the relationship among documents and the
sets of documents that are very similar. This tool uses statis-
1 Using the funding field of the document and analyzing the sponsors’ tical word patterns to characterize documents based on their
names of the grants that co-occur in the funding data. context (Hetzler & Turner, 2005). IN-SPIRE derived from the

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—July 2011 1387
DOI: 10.1002/asi
15322890, 2011, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/asi.21525 by UFPE - Universidade Federal de Pernambuco, Wiley Online Library on [07/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
SPIRE project funded by the Department of Energy and the Leydesdorff’s Software
U.S. intelligence agencies. It has been developed at Pacific
Leydesdorff’s software (http://www.leydesdorff.net) is a
Northwest National Laboratory (United States).
set of command-line programs that enable a science map-
IN-SPIRE can read unformatted documents (ASCII text)
ping with different analysis functions to be performed. It was
or formatted documents such as HTML and XML. Further-
developed at the University ofAmsterdam (The Netherlands).
more, it can read data from Microsoft Excel documents and
The set of programs is freely accessible to the academic
csv formatted files. The software allows the user to select the
community.
fields that will be used to measure the similarity among docu-
The different programs allow the performance of several
ments and other meta-fields such as the title of the documents
bibliometric analyses: co-word, co-author, author biblio-
and the associated date.
graphic coupling, journal bibliographic coupling, and author
Unlike the other analyzed software tools, IN-SPIRE does
co-citation. The results can be visualized using external soft-
not extract bibliometric networks from the selected field. It
ware such as Pajek (Batagelj & Mrvar, 1998), UCINET
uses a field or a set of fields to calculate the similarity among
(Borgatti et al., 2002), Network Workbench Tool (see Sub-
documents using its own text engine (Wise, 1999). In short,
section 3.6), or the Sci2 Tool (see Subsection 3.7). Moreover,
it uses the vector space model (Salton & McGill, 1983), and
international and institutional collaboration, and collabora-
thus each document is represented as a vector of terms. So, if
tion at the level of cities can be analyzed. The visualization of
keywords are selected as the field, then the similarity measure
these collaboration networks can be done using Google Maps
will show whether two documents have similar keywords.
and external software. The different matrices are normalized
Although IN-SPIRE is able to build a map using any field, its
using the Salton’s Cosine measure.
text engine works better if words are selected as the field. The
There are programs for organizing the data downloaded
text engine needs a large amount of data to correctly detect
from different sources (WoS, Scopus, Google Scholar, and
the similarities among documents.
Google) into a database. This database will be the input file
When the similarities among documents have been calcu-
of the different analysis programs.
lated, IN-SPIRE performs a cluster algorithm called “Fast
The set of programs does not allow the data to be prepro-
Divisive Clustering” (Wise, 1999). At the end of the cluster-
cessed; so, for example, to perform a longitudinal analysis,
ing process, several themes (sets of documents) are generated.
external software is needed to split the data into different time
Each theme has as its name the more frequently appearing
periods.
terms (using tf·idf) of its documents.
IN-SPIRE provides two different visualization tech-
niques, which are the flagship of the software: Galaxies Network Workbench Tool
and ThemeScapeTM . The Galaxies visualization employs the The Network Workbench (NWB) Tool (http://nwb.slis.
metaphor of documents as stars in the night sky. On the other indiana.edu) is a general network analysis, modelling, and
hand, ThemeScape is constructed directly from the distribu- visualization toolkit for physics, biomedical, and social sci-
tion of documents in the Galaxies visualization, representing ence researchers (Börner et al., 2010; Herr et al., 2007). It was
themes as sedimentary layers that together create the appear- developed by the Cyberinfrastructure for Network Science
ance of a natural landscape. In the ThemeScape visualization, Center at Indiana University (USA) and is freely accessible.
the height of its peaks corresponds to topic strength at The NWB Tool provides specific algorithms to deal with
those locations; the extent of its peaks corresponds to the publications data to construct and analyze bibliometric net-
area and brightness of the themes in the Galaxies visual- works and maps. The tool is able to read different bibliometric
ization. In both visualizations, the proximity of two items data formats such as ISI WoS, Scopus, Bibtex, and EndNote
(documents) reveals the similarity between them. Related Export Format. It can also read funding data from the National
documents are grouped together and common themes are Science Foundation (NFS) and other scholarly data in csv
highlighted. format.
IN-SPIRE provides a set of tools that help the ana- The NWB Tool allows the data to be preprocessed, dif-
lyst to discover knowledge within the corpus of studied ferent kinds of networks to be built, a graph analysis over
documents: the built networks to be performed, and finally their visual-
ization. Moreover, the tool is able to carry out a temporal
• Time slicer allows us to see how particular themes grow or
analysis.
shrink over time and can show how the mix of themes in the
galaxy changes over time. • Data preprocessing is performed removing duplicate records,
• The Groups tool defines a collection of documents within the dividing the data by different time periods, and detecting and
studied corpus. unifying duplicate items with different spelling (i.e., items
• The Facets allows us to discover relationships between that represent the same author in a co-author analysis or
calculated themes as well as groups defined by the user. terms that represent the same concept in a co-word analysis).
• Robust query capabilities that support boolean, word proxim- • The NWB Tool allows different kinds of networks to be
ity, phrase, or example-based searches. built: documents co-citation, co-author, co-word, and doc-
• The Correlation tool allows us to discover correlation between uments bibliographic coupling. Furthermore, the tool can
the groups. build networks by direct linkage; for example, it can build an

1388 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—July 2011
DOI: 10.1002/asi
15322890, 2011, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/asi.21525 by UFPE - Universidade Federal de Pernambuco, Wiley Online Library on [07/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
author-document network (consumed/produced) or a direct through a burst detection over the words and a co-word anal-
citation network. ysis. Network analysis allows a statistical analysis and the
• Several algorithms can be used to perform the mapping step application of different algorithms over the networks.
and a graph analysis on the generated networks. Further-
more, the tool is able to carry out a burst detection to identify
increases in the usage frequency of items. VantagePoint
• The visualization of the generated graphs is carried out
through different external plugins (e.g., GUESS, Jung). Fur- VantagePoint (http://www.thevantagepoint.com/; Porter &
thermore, several graph layouts can be applied such as the Cunningham, 2004) is a powerful commercial text-mining
DrL algorithm, which is the opensource successor of VxOrd software tool for discovering knowledge in search results
(Davidson, Hendrickson, Johnson, Meyers, & Wylie, 2001) from patent and literature databases, or generally from struc-
that was used in the VxInsight program (Boyack, Wylie, & tured texts. It allows the user to analyze large volumes of
Davidson, 2002; Davidson et al., 1998). structured text to discover patterns and relationships and
quickly address who, what, when, and where. It was devel-
Sci2 Tool oped by Search Technology Inc. (USA). VantagePoint has
been used to perform several science mapping analyses
The Sci2 Tool (http://sci.slis.indiana.edu) is a modular (Morel, Serruya, Penna, & Guimarães, 2009; Porter &Youtie,
toolset specifically designed to perform the study of science. 2009a,b; Shapira, Youtie, & Porter, 2010).
It supports temporal, geospatial, topical, and network analy- VantagePoint has 180 import filters that allow the user to
sis and the visualization of datasets at the micro (individual), import data from almost any literature or patent database.
meso (local), and macro (global) levels (Sci2 Team, 2009). It Furthermore, it has import filters to load data from Microsoft
was developed by Cyberinfrastructure for Network Science Excel and Access, XML file format2 or used-defined filters.
Center at Indiana University (USA) and it is freely accessible. Once the dataset is loaded, VantagePoint shows the differ-
The Sci2 Tool is similar to the NWB Tool (see Network ent fields included in the dataset; for example, if the dataset
Workbench Tool section), but it is specifically focused on contains bibliographical information, then the fields could be
science studies and has specific algorithms to deal with this the title, authors, affiliations, abstract, and references of the
topic. The most important strength of the tool could be that documents (records).
it provides several methods to deal with bibliometric data, to The VantagePoint graphic interface has three parts:
prepare it for later analysis. the workspace, the title view, and the detail windows. The
Similarly to the NWB Tool, the Sci2 Tool is able to read workspace displays all of the lists, matrices, and map views
different bibliographic data formats: ISI WoS, Scopus, Bib- generated by the user. The title view displays the titles of the
tex, and EndNote Export Format. It can also read funding records in the dataset for a selected set of items. Finally, the
data from the National Science Foundation (NFS) and other detail window shows the co-occurrence of items in one field
scholarly data in csv format. (any field can be selected) with items or nodes selected using
The Sci2 Tool allows the data to be prepared and pre- lists or charts.
processed, extracting different types of networks, perform- This software tool allows us to build different lists from any
ing a temporal, geospatial, topical, and network analysis, field. The lists show all of the field’s items from the dataset.
and finally visualizing the results through different plugins In the list view, for each item, the number of records where
and layout algorithms. Sci2 Tool includes the DrL layout the item appears and the number of instances (number of
algorithm. times that the items appear in the dataset, taking into account
The data preparation cleans the bibliographic data and the duplicate items in the records) is shown. The items of a
creates different networks and tables that can be used in list can be assigned to several groups. Groups are useful for
preprocessing, analysis, and visualization. Principally, the defining a portion of the dataset to reduce the data used in
networks that can be extracted are as follows: co-author, co- the later analysis, for example, a group containing the top 30
PI (Principal investigator), co-word, document co-citation, authors can be built. The items can be associated with more
journal co-citation, author co-citation, author bibliographic than one group.
coupling, document bibliographic coupling, and journal bib- One strength of VantagePoint is its preprocessing and
liographic coupling. Moreover, the tool can build different data cleaning tools. A list can be cleaned or reduced using
direct linkage networks such as author-citation, document- the Cleanup function, which attempts to identify the items
citation, source-citation paper, and, finally, author-document that may be equivalents, performing fuzzy near matches on
(consumed/produced) network. specific fields. Moreover, a list can be cleaned, applying
The tool contains several algorithms to perform the map- a thesaurus. Although VantagePoint has several predefined
ping step and next applies different analyses. The mapping thesauruses that can be easily used, the user can define
step can be performed using community detection and back- his/her own thesaurus or edit an existing thesaurus using
bone identification. Temporal analysis is performed slicing
the data into different time periods and through a burst detec-
tion. Geospatial analysis is performed through geocoding 2 There is a wizard that allows an import filter to be created from XML
and geospatial thematic maps. Topic analysis is performed data.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—July 2011 1389
DOI: 10.1002/asi
15322890, 2011, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/asi.21525 by UFPE - Universidade Federal de Pernambuco, Wiley Online Library on [07/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
the Thesaurus Editor. Any change performed over a list will has no preprocessing modules to prepare the data for later
generate a new list, so we always keep the original data. analysis.
VantagePoint allows several kinds of matrices to be built To lay out the elements on the maps, the VOS mapping
that show the records in the dataset contained in two given technique (van Eck, Waltman, Dekker, & van den Berg, 2010)
lists: is used by VOSViewer. This technique builds a similarity
matrix from a co-occurrence matrix using a similarity mea-
• Co-occurrence matrix: it shows the number of records in
which the element i (from the first list) and the element j sure known as the association strength (van Eck & Waltman,
(from the second list) appear together. 2007, 2009). The VOS mapping technique builds a two-
• Auto-correlation matrix: it shows the correlations among dimensional map in which the elements are located in such
items in a list. way that the distance between any pair of items reflects their
• Cross-correlation matrix: it shows the correlations among similarity as accurately as possible. The idea of the VOS map-
items in a list based on the values of another list. ping technique is to minimize a weighted sum of squared
• Factor matrix: it is the result of a principal component analy- Euclidean distances between all pairs of items through an
sis. The factor matrix shows the items in rows and the factors optimization process.
in columns. Although VOSViewer implements the VOS mapping
VantagePoint also allows different matrices to be built that technique, the program can also be used to view any
can be used as the input in the mapping process: co-author two-dimensional map constructed with other techniques.
(using the author’s name, affiliation or country), co-citation VOSViewer allows us to perform a community detection
(using the reference, reference’s author or source), and co- using the VOS clustering technique, which is related to the
word (using any set of terms). Furthermore, if the selected technique of modularity-based clustering (Waltman et al.,
lists to build the matrix are different, heterogeneous matrices 2010). Once the map is built, VOSViewer allows its exami-
can be built; for example, the user can build a matrix of author nation through four views:
per year to analyze the author’s productivity. The matrices • Label view. In this view each element is represented by a label
can be exported into a text file, or the user can directly copy and also by a circle. The more important an item, the larger
a selection of the matrix and paste it in Microsoft Excel. its label and its circle. Thanks to an intelligent algorithm,
The correlation matrices can be normalized using Pear- which shows only the most important labels (most frequent)
son’s r, Salton’s Cosine or the Max Proportional measures. depending on the level of zoom, the software tool avoids the
Furthermore, the co-occurrence matrix can be normalized label overlapping. The circles that have the same color belong
using the tf·idf similarity measure. to the same cluster. This color is the same as the corresponding
VantagePoint includes three kinds of maps that corre- cluster’s color in the cluster view.
• Density view. In this view, each item is represented by a label
spond to the three last matrices: cross-correlation map,
in a similar way as in the label view. Each point in the map
auto-correlation map, and factor map. These maps are a has a color that depends on the density of items at that point,
graphical representation of the corresponding matrices. In the which depends both on the number of neighbouring items
cross-correlation maps, the similarity between items is mea- and on the weights of these items. VOSViewer calculates the
sured using the cosine. In the factor map, and auto-correlation density of each point according to the equation defined by
the similarity measure used is Pearson’s r. (van Eck & Waltman, 2010), which uses a Gaussian kernel
Finally, VantagePoint also includes the capability to exe- function. The density is translated using a color scheme (for
cute Visual Basic scripts to make repetitive (and/or complex) more information see van Eck & Waltman; 2010)).
actions that a user may require. • Cluster density view. This view is available only if items
have been previously assigned to a cluster. The cluster den-
sity view is similar to the ordinary density view except that
VOSViewer the density of items is displayed separately for each cluster of
VOSViewer (http://www.vosviewver.com; van Eck & items.
• Scatter view. This is a simple view in which items are indicated
Waltman, 2010) is a software tool specifically designed for
by a small circle and in which no labels are displayed.
constructing and visualizing bibliometric maps, paying spe-
cial attention to the graphical representation of such maps. It
is appropriate to represent big maps since zoom functionality,
Comparative Study
special labelling algorithms, and density metaphors are used.
The software tool was developed by the Centre for Science As mentioned above, in this article, we also present a com-
and Technology Studies at Leiden University (The Nether- parative study of the nine software tools described above. In
lands) and it is freely available to the bibliometric research such a way, we are able to highlight the main differences
community. and positive synergies existing among the different software
Although VOSViewer can be used to construct and visual- tools. To do so, we analyze the nine software tools taking into
ize bibliometric maps of any kind of co-occurrence data, the account five points of view: (a) the preprocessing methods,
software tool does not allow any co-occurrence matrix from (b) the bibiometric networks available, (c) the normalization
the bibliometric data to be extracted and built. To do this, measures used, (d) the type of analysis, and finally, (e) other
an external process is needed. Furthermore, the software tool secondary aspects.

1390 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—July 2011
DOI: 10.1002/asi
15322890, 2011, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/asi.21525 by UFPE - Universidade Federal de Pernambuco, Wiley Online Library on [07/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 3. Preprocessing methods.

Software tool De-duplication Time slicing Data reduction Network reduction

Bibexcel x x
CiteSpace x x x
CoPalRed x x x
IN-SPIRE x
Leydesdorff’s Software
Network Workbench Tool x x x x
Science of Science Tool x x x x
VantagePoint x x x
VOSViewer

TABLE 4. Bibliometric networks.

Bibliographic coupling Co-author Co-citation


Direct
Software Author Document Journal Author Country Institution Author Document Journal Co-word Linkage Others
tool (ABCA) (DBCA) (JBCA) (ACAA) (CCAA) (ICAA) (ACA) (DCA) (JCA) (CWA) (DL)

Bibexcel x x x x x x x x x
CiteSpace x x x x x x x x x
CoPalRed x
IN-SPIRE x
Leydesdorff’s x x x x x x x
Software
Network x x x x x
Workbench
Tool
Science of x x x x x x x x x x
Science Tool
VantagePoint x x x x x x x x
VOSViewer

Preprocessing Methods IN-SPIRE performs the time slicing directly over the data.
It does not need to preprocess the data to split the dataset into
Special modules to perform a preprocessing of the data are
different slices.
important characteristics of a science mapping software tool.
In Table 3, the principal preprocessing modules available in
Bibliometric Relation Between Units of Analysis
each software tool are shown.
The module to detect duplicate items is important, for An important consideration in the use of some science
example, in co-word analysis or co-author analysis. With this mapping software tools is whether they are able to establish
module, a user could decide to join two or more items that different relationships between the units of analysis, that is,
represent the same concept or the same author. This module if they are able to extract different bibliometric networks.
does not only merge two items but also selects or sums up In Table 4, the different bibliometric networks available
the attribute value, for example, the times cited of the original are shown for each software tool. The column “others” means
records. that the software tool is able to build other un-common or
A time slicing option is needed if the user wants to ana- heterogeneous networks or matrices.
lyze the evolution of the domains under study. A module for Although there are no software tools able to build all of
reducing the data is useful if the user wants to filter the data the different varieties of bibliometric networks, Bibexcel,
to analyze the most important information. CiteSpace, Leydesdorff’s Software, Sci2 Tool, and Vantage-
Finally, network reduction is useful to filter the nodes Point are the software tools able to build the majority of them.
or links of a network (similar to the reducing data module), or By contrast, VOSViewer is not able to build any of them; it
to apply a pruning algorithm to the networks. is focused only on visualizing bibliometric maps. CoPalRed
Only NWB Tool and Sci2 Tool have the four prepro- is focused only on one kind of bibliometric network. Finally,
cessing modules. By contrast, Leydesdorff’s Software and although IN-SPIRE can construct the maps using any field of
VOSViewer do not have any of these modules, which is a the dataset, its way of representing the documents, by using
strong drawback. the vector space model, makes it difficult to generate the maps

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—July 2011 1391
DOI: 10.1002/asi
15322890, 2011, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/asi.21525 by UFPE - Universidade Federal de Pernambuco, Wiley Online Library on [07/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 5. Normalization measures.

Software tool Measure

Bibexcel Salton’s Cosine, Jaccard’s Index, or the Vladutz and Cook measures
CiteSpace Salton’s Cosine, Dice or Jaccard Strength
CoPalRed Equivalence Index
IN-SPIRE Conditional Probability
Leydesdorff’s Software Salton’s Cosine
Network Workbench Tool User defined
Science of Science Tool User defined
VantagePoint Pearson’s r, Salton’s Cosine or the Max Proportional
VOSViewer Association Strength

TABLE 6. Methods of analysis.

Software tool Burst detection Geospatial Network Temporal

Bibexcel x
CiteSpace x x x x
CoPalRed x x
IN-SPIRE x x x
Leydesdorff’s Software
Network Workbench Tool x x x
Science of Science Tool x x x x
VantagePoint x x x x
VOSViewer x

using other fields such as the authors. It works better using Only CiteSpace, Sci2 Tool, and VantagePoint use the four
words. kinds of analysis. Leydesdorff’s Software does not carry out
Some software tools allow the extraction of un-common any of them.
networks, for example, the co-grant networks available in CiteSpace and Sci2 Tool have geocoding capabilities.
CiteSpace, co-PI networks available in Sci2 Tool, or the par- CiteSpace uses Google and Yahoo!’s geocoder over the insti-
ticular matrices that are extracted by Bibexcel and Vantage- tutional data available. On the other hand, Sci2 Tool uses
Point using a set of specific documents’ fields. Furthermore, Yahoo!’s geocoding service and an internal geocoder over
some software such as Bibexcel and VantagePoint allow us to any field that contains geographical data, such as institutional
extract heterogeneous networks using different fields in the address and conference location.
rows and columns, for example, a matrix showing the authors
per years can be extracted. Other Aspects
Finally, NWB Tool and Sci2 Tool can extract bibliometric
networks using direct linkage. In this subsection, we compare the software tools accord-
ing to other aspects such as documentation/help, free or
commercial availability, whether the source code is avail-
Normalization Measures able, the possibility of installing the software in different
platforms, and the extendability of the software.
Once the bibliometric networks have been built, a normal- NWB Tool and Sci2 Tool have a great user guide where
ization process can be carried out using different similarity the tools are explained in detail. Furthermore, the user guide
measures. In Table 5, the measures used for each software explains important aspects of science mapping; these are the
tool are shown. only tools that explain this issue. VantagePoint has a good
Three of the analyzed software tools use Salton’s Cosine user guide and online help, and its website provides a large
as a similarity measure. Other software tools like NWB amount of video-tutorials. IN-SPIRE has a great website with
Tool and Sci2 Tool allow the users to define their own video tutorials and online help. VOSViewer has a good man-
measures. ual. CiteSpace has a big wiki where important issues are
described. Leydesdorff’s Software has a good description
and user guide for each of its command-line programs on
Methods of Analysis
its website.
Different methods of analysis can be applied. In Table 6, Only three of the nine described software tools are
the different methods of analysis available for each software commercial: CoPalRed, IN-SPIRE, and VantagePoint. The
tool are shown. remaining software tools are freely available.

1392 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—July 2011
DOI: 10.1002/asi
15322890, 2011, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/asi.21525 by UFPE - Universidade Federal de Pernambuco, Wiley Online Library on [07/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Taking into account the availability of the source code, network build from the co-occurrence of these keywords
only NWB Tool and Sci2 make their source code available. contains an amount of 25,705 links.
CiteSpace, NWB Tool, Sci2 Tool, and VOSViewer were In what follows, the different results obtained by the
developed using the Java programming language, so they software tools are shown. The comparative study has been
can be used with any platform (Windows, MacOS, Linux, performed using the software tools able to visualize the
etc.). On the other hand, Bibexcel, CoPalRed, IN-SPIRE, results. For this reason, Bibexcel and Leydesdorff’s software
Leydesdorff’s Software, and VantagePoint run only under have not been used.
Windows. First, a co-word analysis was performed using CiteSpace.
Finally, taking into account the possibilities of extend- Given that it does not allow us to load the data in csv format,
ing the software tools, NWB Tool and Sci2 Tool are built the dataset had to be loaded without any preprocessing from
over Cyberinfrastructure Shell (CIShell; http://cishell.org/), an ISIWoS format file. In Figure 1, the map generated by
so it can be extended using this platform. According to the CiteSpace is shown. The map was made using the top 200
description given on its website, CIShell “is an open source, keywords. The lines between nodes represent the cosine sim-
community-driven platform for the integration and utiliza- ilarity measure. The shadowed nodes represent clusters and
tion of datasets, algorithms, tools, and computing resources.” the clusters’ names were chosen selecting the most important
VantagePoint can be extended using VisualBasic scripts. keywords from each cluster according to the tf·idf measure.
Inside each cluster there is a sphere which represents its
Analysis of Generated Maps: A Cooperative centroid, and its volume is proportional to the size of the
Study Among Tools cluster.
We have to say that the printed out map does not show
To complete the comparative study among the software the power of CiteSpace. To make a good interpretation of the
tools, in this section, we develop a cooperative study of the obtained results, the analyst should interact with CiteSpace’s
software tools with a set of data. This cooperative use of user interface, which allows us to perform a variety of analy-
different software tools gives us the opportunity to discover ses, different layouts, etc. Furthermore, the analyst can zoom
the possible positive synergies that could generate the joint in and zoom out on the network to appreciate the details of a
use of these software tools. local area.
To make a better comparison between software, a com- Second, in Figure 2 the result obtained by CoPalRed is
mon science mapping analysis over a specific unit of analysis shown. In Figure 2a the generated strategic diagram is shown,
has to be performed. As was shown in Table 4, the analyzed and in Figure 2b the thematic network of a specific theme
software tools are unable to extract the same bibliometric net- (FUZZY-CONTROL) is drawn. CoPalRed generated the maps
works, with co-word network the only one available in each using those keywords with a frequency equal to or higher
software tool. For this reason, we select the words (or key- than five and a co-occurrence value equal to or higher than
words) as the unit of analysis to perform the science mapping three. The whole network contains 229 nodes and 432 links
analysis. between them after this pruning. With this pruning, we main-
As an example, we study the conceptual structure (Cobo tain the most frequent and important keywords. The strategic
et al., 2011) of the research field of fuzzy set theory (FST; diagram shows the main detected themes studied by the FST
Zadeh, 1965, 2008) by using the publications that have field in the studied period, categorizing them in four classes
appeared in the most important and prestigious journals dur- according to their Callon’s density and Callon’s centrality
ing 2005 to 2009, according to their impact factor,3 on the measures. Each theme in the strategic diagram is associ-
topic: Fuzzy Sets and Systems and IEEE Transactions on ated with a sphere and a label. Labels were chosen selecting
Fuzzy Systems. Cobo et al. (2011), who studied their con- the most central node of its associated thematic networks,
ceptual evolution across five different periods of time, made where each node corresponds with a keyword. The volume
a deep analysis of these journals recently. In this section, we of spheres represents the number of documents associated
use the last period of time (2005-2009) of that analysis. with each theme (or keyword in thematic networks). This
The amount of documents analyzed was 1,576, and they information is also associated with the labels. Finally, the
were downloaded4 from the WoS. Specifically, 1,086 docu- size of the lines in thematic networks represents the degree
ments were published by the journal Fuzzy Sets and Systems, of association (equivalence index) between two nodes.
and 490 by IEEE Transactions on Fuzzy Systems. Third, the csv file exported by CoPalRed was loaded in
The author’s keywords and Keywords Plus of each docu- IN-SPIRE. After defining the dataset, and selecting the terms,
ment were used in the analysis. After a de-duplicating step IN-SPIRE generated two maps: the Galaxy view (Figure 3)
(CoPalRed was used to carry out this task), there were 5,034 and Theme view (Figure 4).
keywords. CoPalRed allows us to export the documents with In the galaxy view, the shadows represent groups of doc-
the preprocessed items in a csv file, so this will be the input uments that are considered to be similar. The names of these
for the remaining software tools when possible. The whole themes are generated using the most important keywords
according to their tf·idf measure. In the Theme view, the
3According to the 2009 Journal Citation Report (ISI Web of Science). height of each peak corresponds to topic strength at that
4 The data was downloaded on January 15, 2010. location, and the extent of each peak corresponds to the area

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—July 2011 1393
DOI: 10.1002/asi
15322890, 2011, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/asi.21525 by UFPE - Universidade Federal de Pernambuco, Wiley Online Library on [07/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
FIG. 1. Map generated by CiteSpace. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

and brightness of the corresponding theme in the Galaxy component is shown in Figure 5. The size of the nodes is
view. proportional to the respective keyword’s frequency, and the
As we can see in both the Galaxy and Theme views, IN- size of the lines represents the co-occurrence (without nor-
SPIRE did not detect many themes, because of the way in malization) of the linked nodes. Only the names of the top
which it interprets the data. Unlike the other software tools 50 keywords are shown. The color of the nodes varies in a
analyzed, IN-SPIRE uses the vector space model to repre- linear way from gray to black according to their frequency,
sent the documents, so it needs a large amount of terms to and the color of the links varies from green to black according
correctly detect the themes. In our dataset, the documents to their co-occurrence value. The network was laid out using
do not contain the necessary keywords, so IN-SPIRE could the GUESS plugin.
not determine correctly the similarity among documents. Fifth, a Factor Map was built by VantagePoint (Figure 6)
Probably if we had used the abstract or the full text of the using those keywords with a frequency equal to or higher than
documents, IN-SPIRE would have to obtain better results. five (after this pruning the dataset contains 392 keywords).
Now, the csv file was loaded into Sci2 Tool,5 and a co- Each node represents a cluster of terms. The label of each
occurrence network using the keywords (author’s keywords theme was chosen selecting its most important term. The size
and Keywords Plus) was created. We applied a weak compo- of nodes is proportional to the number of documents, and
nent clustering to the whole network obtained after dropping the line between nodes represents the similarity (Pearson’s r)
the keywords with a frequency below five and the links with between factors.
a co-occurrence value below three (the whole network is Finally, the co-occurrence matrix generated by CoPalRed
the same as the generated by CoPalRed). The biggest weak was transferred to the VOSViewer format to visualize the
results of a co-word analysis. In Figure 7, the cluster view
5 In this example, Sci2 Tool and NWB Tool obtained the same results. is shown. We can observe how the different keywords are

1394 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—July 2011
DOI: 10.1002/asi
15322890, 2011, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/asi.21525 by UFPE - Universidade Federal de Pernambuco, Wiley Online Library on [07/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2005-2009 Density H-INFINITY- CONTROL
GROUP-DECISION- 221
PROBABILISTIC- MAKING
METRIC-SPACE 124
21 FUZZY- TOPOLOGY
CAUCHY-P ROBLEM
22 62
SIMILARITY-
RELATIONS
35
FUZZY- CLUSTERING
FUZZY- REGRESSION
60 UNIVERSAL- 55
APPROXIMATORS
FUZZY- RELATIONAL-
28
FUZZY- ROUGH-SETS EQUATIONS T-NORM
43 64 152 Centrality

L-TOPOLOGY
31 FUZZY-M EASURE
FUZZY-NUM BERS 79
162
CLASSIFICATION
242

FUZZY- CONTROL
346
FUZZY- LOGIC
267
SYSTEM-
IDENTIFICATION
136 UNCERTAINTY
239

FIG. 2a. CoPalRed’s results—(a) Strategic diagram. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

(12 documents) (33 documents)

(8 documents)

(14 documents)

(6 documents)
(16 documents)

(164 documents)
(152 documents)

(28 documents)

(9 documents) (55 documents)

(17 documents)

FIG. 2b. CoPalRed’s results— (b) Thematic network. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—July 2011 1395
DOI: 10.1002/asi
15322890, 2011, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/asi.21525 by UFPE - Universidade Federal de Pernambuco, Wiley Online Library on [07/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
FIG. 3. IN-SPIRE’s Galaxy view. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

FIG. 4. IN-SPIRE’s Theme view. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

laid out over a horizontal line. This means that the keywords color at one point represents the density of this point. The
placed on the left are very dissimilar to those placed on the density is measured using a Gaussian kernel function (van
right side of the maps. The size of the keywords’ labels is Eck & Waltman, 2010).
proportional to their frequency, VOSViewer visualizes only Similarly to CiteSpace, the printed out map of VOSViewer
the labels of the most important ones (most frequent) in the does not show the power of its graphic user interface. In
higher zoomed view. VOSViewer selects a random different each view, the user can zoom in on a specific area to dis-
color for each cluster. Inside each cluster, the strength of a cover the items hidden under the most important ones. As an

1396 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—July 2011
DOI: 10.1002/asi
15322890, 2011, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/asi.21525 by UFPE - Universidade Federal de Pernambuco, Wiley Online Library on [07/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
l
Probability s

Uninorm

Aggregation s
s
Uncertainty Feedback
Operators
Stabilization
Algebra
s
n
Approximation
Prediction
Classification

Spaces Convergence

FIG. 5. Map generated by Science of Science (Sci2 ) Tool. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Factor Map
Keywords (author's) + Keyword... DISTRIBUTED-DELAYS

Factors: 10 (142 documents)


% Coverage: 58% (901)
Top links shown
> 0.75 0 (0)
0.50 - 0.75 1 (0)
BASIS-DEPENDENT-LYAPUNOV-FUNCTION
0.25 - 0.50 0 (0)
LANGUAGE (35 documents)
< 0.25 9 (27)
(157 documents)

LINEAR-MATRIX-INEQUALITY
(323 documents)

RELATIONAL-EQUATIONS
(15 documents)
BASIS-DEPENDENT-LYAPUNOV-FUNCTION
(35 documents)

FUZZY-TOPOLOGY
(51 documents)

QUASI-COPULAS
INTERPRETABILITY
(72 documents)
(345 documents)

CONSENSUS QUADRATIC-PROGRAMMING
(119 documents) (47 documents)

(48 documents)
SMALL-GAIN-APPROACH

FIG. 6. Map generated by VantagePoint. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—July 2011 1397
DOI: 10.1002/asi
15322890, 2011, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/asi.21525 by UFPE - Universidade Federal de Pernambuco, Wiley Online Library on [07/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
FIG. 7. VOSViewer’s cluster view. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

FIG. 8. VOSViewer’s cluster zoom-in view. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

example, in Figure 8, an enlarged, zoom-in view of the cluster The preprocessing capabilities of VantagePoint are one of
visualization focused on the keywords FUZZY-TOPOLOGY its main strengths. It incorporates a high quantity of import
and T-NORM is shown. filters that allows us to load data from almost all the bibli-
ographical sources. Moreover, the clean-up list method and
the possibility of applying a thesaurus to carry out this task,
Lessons Learned
helps the preprocessing task, especially the de-duplicating
As has been shown in both the Comparative Study section process. Vantage-Point allows us to export the results into a
and the Analysis of Generated Maps: A Cooperative Study csv file, so other software tools can read this data to perform
among Tools section, each software tool has different charac- their own science mapping analysis over the preprocessed
teristics. Several software tools have powerful preprocessing data.
techniques, others allow the generation of a high quantity of CoPalRed has a good de-duplicating process too, but it is
bibliometric networks, and others are focused only on one focused only on one kind of unit, the keywords. NWB Tool
kind of bibliometric network. Finally, not all the processes of and Sci2 Tool have a de-duplicating module, but this needs
analysis are available in each software tool. For this reason, an external process to be performed using external software.
a deep science mapping analysis requires the use of different However, both NWB Tool and Sci2 Tool have a good network
tools. reduction process.

1398 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—July 2011
DOI: 10.1002/asi
15322890, 2011, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/asi.21525 by UFPE - Universidade Federal de Pernambuco, Wiley Online Library on [07/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
TABLE 7. Characteristics summary.

Software tool Preprocessing Networks Normalization Analysis

Bibexcel Data and networks reduction DBCA, ACAA, CCAA, Salton’s Cosine, Network
ICAA, ACA, DCA, JCA, Jaccard’s Index, or the
CWA, Others Vladutz and Cook measures
CiteSpace Time slicing and data and DBCA, ACAA, CCAA, Salton’s Cosine, Dice or Burst detection, geospatial,
networks reduction ICAA, ACA, DCA, Jaccard Strength network, temporal
JCA, CWA, Others
CoPalRed De-duplication, Time slicing, CWA Equivalence index Network, temporal
data reduction
IN-SPIRE Data reduction CWA Conditional probability Bust detection, network, temporal
Leydesdorff’s ABCA, JBCA, ACAA, Salton’s Cosine
Software CCAA, ICAA, ACA, CWA
Network De-duplication, time slicing and DBCA, ACAA, DCA, User defined Burst detection, network,
Workbench Tool data and networks reduction CWA, DL temporal
Science of De-duplication, time slicing and ABCA, DBCA, JBCA, User defined Burst detection, geospatial,
Science Tool data and networks reduction ACAA, ACA, DCA, JCA, network, temporal
CWA, DL, Others
VantagePoint De-duplication, time slicing and ACAA, CCAA, ICAA, ACA, Pearson’s r, Salton’s Cosine Burst detection, geospatial,
data reduction DCA, JCA, CWA, Others or the Max Proportional network, temporal
VOSViewer Association Strength Network

The software tools allow us to generate various kinds of • VOSViewer has a powerful user graphic-interface that allows
bibliometric networks, but as was shown in Table 4, there is us to examine the generated maps easily. Detecting (in a visual
no single software tool able to extract all of them. way) the most important themes is not always easy, and in the
Taking into account the maps and visualizations generated cluster view it is difficult to say to which cluster the keywords
by each software tool, as shown in the Analysis of Generated that are between two clusters (borderline keywords) belong.
Maps: A Cooperative Study among Tools section, there are According to the methods of analysis available there are
several differences between them: differences between the software tools; for example, the
• CiteSpace is able to visualize the networks using different lay- geospatial analysis is available only in CiteSpace, Sci2 Tool,
outs. The name of the detected clusters can be assigned using andVantagePoint, and only the first two have geocoding capa-
different metrics. Finally, the user graphic-interface allows us bilities that allow us to represent the network over a world
to interact with the network to carry out a good exploration map (using Google Maps or Yahoo! Maps).
of it. In Table 7, we show a short summary of their characteris-
• CoPalRed groups the items (keywords) under themes, and tics according to the four aspects considered in the analysis
they are categorized in a strategic diagram according to their developed. As we can observe, the software tools CiteSpace,
centrality and density. This categorization allows us to detect IN-SPIRE, NWB Tool, Sci2 Tool, and VantagePoint could be
the motor themes of the field. For each theme, CoPalRed
identified as the more complete ones.
generates a thematic network where the relation between its
We should point out that NWB Tool and Sci2 Tool have
keywords is shown.
• IN-SPIRE allows us to visualize two kinds of map, if sufficient a great deal in common becasue they share algorithms and
data are provided. In the Theme view, the analyst can detect have several algorithms in common. NWB Tool is a network
the most important zones of the map (where more documents analysis, modelling, and visualization toolkit, whereas Sci2
are localized). The Galaxy view allows us to easily detect Tool is a modular toolset specifically designed to perform
similar documents based on their content. the study of science, but the Cyberinfrastructure for Network
• NWB Tool and Sci2 Tool generate similar visualizations. They Science Center has developed them both and they share sev-
allow us to visualize the networks using different plugins and eral algorithms and methods. Nevertheless, some capabilities
applying different layouts and scripts to customize the view. such as geocoding are unique to Sci2 Tool.
Sci2 Tool incorporates thematic maps where the information It is sometimes difficult to import a specific dataset into the
is shown over a world map. nine described software tools. At other times, it is difficult to
• VantagePoint has three kinds of map that allow several views
modify them or incorporate new measures, algorithms, and
to be created. In the map view, VantagePoint shows a legend
that explains the size of the lines, being the only software that visualizations. For this reason, extension capabilities such as
produces this legend. Maybe one strength of this software tool those provided by NWB Tool, Sci2 Tool, and VantagePoint
is the user graphic-interface that allows the user to select a set are very useful.
of items from the map, whereupon it shows the documents As mentioned above, each software tool has different
associated with these items and other information in the detail characteristics and implements different techniques that are
window. carried out with different algorithms. Consequently, each

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—July 2011 1399
DOI: 10.1002/asi
15322890, 2011, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/asi.21525 by UFPE - Universidade Federal de Pernambuco, Wiley Online Library on [07/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
software tool gives its particular view of the studied field. The lack of flexibility of available software tools. Although these
combined use of different science mapping software tools could have similar characteristics to the ones presented here,
could allow us to develop a complete science mapping anal- they remain unpublished. Sometimes, these tools are unpub-
ysis. Therefore, we think that the cooperation among tools lished becuase they were developed to perform a specific
could generate a positive synergy that would give us the possi- and ad hoc analysis and these developed pieces of software
bility of extracting unknown knowledge that would otherwise remained in the background.
remain undiscovered.

Acknowledgments
Concluding Remarks
This work has been developed with the support of Project
An analysis of science mapping software tools has been
90/07 (Ministry of Public Works and Transport) and Excel-
carried out. Specifically, we have analyzed nine represen-
lence Andalusian Project TIC5299. We thank the Pacific
tative science mapping software tools: Bibexcel, CiteSpace
Northwest National Laboratory and Search Technology Inc.
II, CoPalRed, IN-SPIRE, Leydesdorff’s Software, Network
for providing us a trial version of IN-SPIRE and Vantage-
Workbench Tool, Sci2 Tool, VantagePoint, and VOSViewer.
Point, respectively, and for their help and support.
These software tools present different characteristics; for
example, some of them are focused only on visualization and
others have different preprocessing modules. There is not References
one software tool that we could consider the best one. Con-
Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern information retrieval.
sequently, we think that a complete science mapping analysis Boston, MA: Addison-Wesley.
of a particular field should be made using a variety of these Bailón-Moreno, R., Jurado-Alameda, E., & Ruíz-Baños, R. (2006). The sci-
software tools to gather all the important knowledge and dif- entific network of surfactants: Structural analysis. Journal of the American
ferent perspectives; for example, the preprocessing step is Society for Information Science and Technology, 57(7), 949–960.
very important and the analyst has to use a software tool to Bailón-Moreno, R., Jurado-Alameda, E., Ruíz-Baños, R., & Courtial, J.P.
(2005). Analysis of the scientific field of physical chemistry of surfac-
help carry out this task. Not all the software tools are able tants with the unified scienctometric model. Fit of relational and activity
to extract all the bibliometric networks, and, so, different indicators. Scientometrics, 63(2), 259–276.
tools have to be used to analyze a field from different per- Bar-Ilan, J. (2010). Citations to the “introduction to informetrics” indexed
spectives (intellectual, social, or conceptual). The software by WOS, Scopus and Google Scholar. Scientometrics, 82(3), 495–506.
tools have different analysis methods (although some of them Batagelj, V., & Mrvar, A. (1998). Pajek-Program for large network analysis.
Connections, 21(2), 47–57.
are common), which allow the analyst to discover different Batty, M. (2003). The geography of scientific citation. Environment and
knowledge. Finally, because the visualizations are different in Planning A, 35(5), 761–765.
each one, different views of the field can be generated and Borgatti, S.P., Everett, M.G., & Freeman, L.C. (2002). Ucinet 6 for Windows:
these help to interpret and analyze the results. This cooper- Software for social network analysis, analytic technologies, Harvard, MA.
ation among tools gives a positive synergy, which allows us Börner, K., Chen, C., & Boyack, K. (2003). Visualizing knowledge domains.
Annual Review of Information Science and Technology, 37, 179–255.
to extract the knowledge hidden behind the data. Börner, K., Huang, W., Linnemeier, M., Duhon, R., Phillips, P., Ma, N., &
Considering the results obtained in the Analysis of Gener- Price, M. (2010). Rete-netzwerk-red: Analyzing and visualizing schol-
ated Maps: A Cooperative Study among Tools section, where arly networks using the network workbench tool. Scientometrics, 83(3),
co-word was the unique analysis technique used to analyze 863–876.
the FST field, and the positive synergies of using several soft- Boyack, K.W., Wylie, B.N., & Davidson, G.S. (2002). Domain visualiza-
tion using VxInsight for science and technology management. Journal
ware tools drawn in the Lessons Learned section, we think of the American Society for Information Science and Technology, 53(9),
that a thorough analysis of any field could be carried out 764–774.
using the powerful of each tool. So, for example, a co-word Callon, M., Courtial, J., & Laville, F. (1991). Co-word analysis as a tool
analysis performed by CoPalRed could be complemented by for describing the network of interactions between basic and techno-
IN-SPIRE using the terms extracted from abstracts and titles. logical research—The case of polymer chemistry. Scientometrics, 22(1),
155–205.
Moreover, IN-SPIRE could show the conceptual changes Callon, M., Courtial, J.P., Turner, W.A., & Bauin, S. (1983). From transla-
over time using its Time tools. In addition, CiteSpace and tions to problematic networks: An introduction to co-word analysis. Social
Sci2 could perform an intellectual and social analysis. CiteS- Science Information, 22(2), 191–235.
pace could be used for a document co-citation analysis and Carrington, P.J., Scott, J., & Wasserman, S. (2005). Models and methods in
social network analysis. Structural Analysis in the Social Sciences. New
Sci2 for a co-author analysis. The resulting network of authors
York: Cambridge University Press.
could be displayed over a world map using the geolocation Chen, C. (2004). Searching for intellectual turning points: Progressive
capabilities of Sci2 . Finally, VantagePoint could be used to knowledge domain visualization. Proceedings of the National Academy
build a factor map on keywords, and show the institutional of Science of the United States of America (PNAS), 101(suppl. 1),
affiliation related to the most interesting detected factors. 5303–5310.
Chen, C. (2006). CiteSpace II: Detecting and visualizing emerging trends and
We should point out that this study does not incorporate
transient patterns in scientific literature. Journal of the American Society
all of the science mapping software tools used around the for Information Science and Technology, 57(3), 359–377.
world. This is because researchers usually use their own ad Chen, C., Ibekwe-SanJuan, F., & Hou, J. (2010). The structure and dynamics
hoc software tools and algorithms, perhaps motivated by the of cocitation clusters: A multiple-perspective cocitation analysis. Journal

1400 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—July 2011
DOI: 10.1002/asi
15322890, 2011, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/asi.21525 by UFPE - Universidade Federal de Pernambuco, Wiley Online Library on [07/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
of the American Society for Information Science and Technology, 61(7), of the scientific structure of fuzzy sets research in Spain. Information
1386–1409. Research, 14(4), paper 421.
Chen, P., & Redner, S. (2010). Community structure of the physical review McCain, K. (1991). Mapping economics through the journal literature: An
citation network. Journal of Informetrics, 4(3), 278–290. experiment in journal co-citation analysis. Journal of theAmerican Society
Cobo, M.J., López-Herrera, A.G., Herrera-Viedma, E., & Herrera, F. (2011). for Information Science, 42(4), 290–296.
An approach for detecting, quantifying, and visualizing the evolution of a Mikki, S. (2010). Comparing Google Scholar and ISI Web of Science for
research field: A practical application to the fuzzy sets theory field. Journal earth sciences. Scientometrics, 82(2), 321–331.
of Informetrics, 5(1), 146–166. Morel, C.M., Serruya, S., Penna, G., & Guimarães, R. (2009). Co-authorship
Cook, D.J., & Holder, L.B. (2006). Mining graph data. Hoboken, NJ: John network analysis: A powerful tool for strategic planning of research,
Wiley & Sons, Inc. development and capacity building programs on neglected diseases. PLoS
Coulter, N., Monarch, I., & Konda, S. (1998). Software engineering as seen Neglected Tropical Diseases, 3(8), art. no. e501.
through its research literature: A study in co-word analysis. Journal of the Morris, S., & Van Der Veer Martens, B. (2008). Mapping research specialties.
American Society for Information Science, 49(13), 1206–1223. Annual Review of Information Science and Technology, 42(1), 213–295.
Davidson, G.S., Hendrickson, B., Johnson, D.K., Meyers, C.E., & Moya-Anegón, F., Vargas-Quesada, B., Chinchilla-Rodríguez, Z., Corera-
Wylie, B.N. (1998). Knowledge mining with vxinsight: Discovery Álvarez, E., Herrero-Solana, V., & Munoz-Fernández, F. (2005). Domain
through interaction. Journal of Intelligent Information Systems, 11(3), analysis and information retrieval through the construction of heliocentric
259–285. maps based on ISI-JCR category cocitation. Information Processing &
Davidson, G.S., Wylie, B.N., & Boyack, K.W. (2001). Cluster stability Management, 41(6), 1520–1533.
and the use of noise in interpretation of clustering. Proceedings of Moya-Anegón, F., Vargas-Quesada, B., Chinchilla-Rodríguez, Z., Corera-
the IEEE Symposium on Information Visualization (pp. 23–30). doi: Álvarez, E., Munoz-Fernández, F., & Herrero-Solana, V. (2007). Visualiz-
10.1109/INFVIS.2001.963275 ing the marrow of science. Journal of theAmerican Society for Information
Fabrikant, S.I., Montello, D., & Mark, D.M. (2010). The natural landscape Science and Technology, 58(14), 2167–2179.
metaphor in information visualization: The role of commonsense geo- Noyons, E.C.M., Moed, H.F., & Luwel, M. (1999a). Combining mapping
morphology. Journal of the American Society for Information Science and citation analysis for evaluative bibliometric purposes: A bibliometric
and Technology, 61(2), 253–270. study. Journal of the American Society for Information Science, 50(2),
Falagas, M.E., Pitsouni, E.I., Malietzis, G.A., & Pappas, G. (2008). Compar- 115–131.
ison of Pubmed, Scopus, Web of Science, and Google Scholar: Strengths Noyons, E.C.M., Moed, H.F., & van Raan, A.F.J. (1999b). Integrating
and weaknesses. FASEB Journal, 22(2), 338–342. research performance analysis and science mapping. Scientometrics,
Gänzel, W. (2001). National characteristics in international scientific co- 46(3), 591–604.
authorship relations. Scientometrics, 51(1), 69–115. Persson, O., Danell, R., & Wiborg Schneider, J. (2009). How to use Bibex-
Gao, X., & Guan, J. (2009). Networks of scientific journals: An exploration cel for various types of bibliometric analysis. In F. Åström, R. Danell,
of Chinese patent data. Scientometrics, 80(1), 283–302. B. Larsen, & J. Wiborg Schneider (Eds.), Celebrating scholarly commu-
Garfield, E. (1994). Scientography: Mapping the tracks of science. Current nication studies: A festschrift for Olle Persson at his 60th birthday (Vol.
Contents: Social & Behavioural Sciences, 7(45), 5–10. 5, pp. 9–24). Leuven, Belgium: International Society for Scientometrics
Havre, S., Hetzler, E., Whitney, P., & Nowell, L. (2002). Themeriver: Visual- and Informetrics.
izing thematic changes in large document collections. IEEE Transactions Peters, H.P.F., & van Raan, A.F.J. (1991). Structuring scientific activi-
on Visualization and Computer Graphics, 8(1), 9–20. ties by co-author analysis an exercise on a university faculty level.
Herr, B., Huang, W., Penumarthy, S., & Börner, K. (2007). Design- Scientometrics, 20(1), 235–255.
ing highly flexible and usable cyberinfrastructures for convergence. In Peters, H.P.F., & van Raan, A.F.J. (1993). Co-word-based science maps of
W.S. Bainbridge & M.C. Roco (Eds.), Progress in convergence: Tech- chemical engineering. Part I: Representations by direct multidimensional
nologies for human wellbeing (Vol. 1093, pp. 161–179). Boston: Annals scaling. Research Policy, 22(1), 23–45.
of the New York Academy of Sciences. Polanco, X., François, C., & Lamirel, J.C. (2001). Using artificial neu-
Hetzler, E., & Turner, A. (2005). Analysis experiences using information ral networks for mapping of science and technology: A multi-self-
visualization. IEEE Computer Graphic and Applications, 24(5), 22–26. organizingmaps approach. Scientometrics, 51(1), 267–292.
Jarneving, B. (2005).A comparison of two bibliometric methods for mapping Porter, A.L., & Cunningham, S.W. (2004). Tech mining: exploiting new
of the research front. Scientometrics, 65(2), 245–263. technologies for competitive advantage. Hoboken, NJ: John Wiley & Sons,
Kandylas, V., Upham, S.P., & Ungar, L.H. (2010). Analyzing knowl- Inc.
edge communities using foreground and background clusters. ACM Porter, A.L., & Youtie, J. (2009a). How interdisciplinary is nanotechnology?
Transactions on Knowledge Discovery from Data, 4(2), art. no. 7. Journal of Nanoparticle Research, 11(5), 1023–1041.
Kessler, M.M. (1963). Bibliographic coupling between scientific papers. Porter, A.L., & Youtie, J. (2009b). Where does nanotechnology belong in the
American Documentation, 14(1), 10–25. map of science? Nature Nanotechnology, 4, 534–536.
Kleinberg, J. (2003). Bursty and hierarchical structure in streams. Data Price, D., & Gürsey, S. (1975). Studies in scientometrics I: Transience and
Mining and Knowledge Discovery, 7(4), 373–397. continuance in scientific authorship. Ci. Informatics Rio de Janeiro, 4(1),
Leydesdorff, L., & Persson, O. (2010). Mapping the geography of science: 27–40.
Distribution patterns and networks of relations among cities and institutes. Quirin, A., Cordón, O., Santamaría, J., Vargas-Quesada, B., & Moya-
Journal of the American Society for Information Science and Technology, Anegón, F. (2008). A new variant of the pathfinder algorithm to generate
61(8), 1622–1634. large visual science maps in cubic time. Information Processing &
Leydesdorff, L., & Schank, T. (2008). Dynamic animations of journal maps: Management, 44(4), 1611–1623.
Indicators of structural changes and interdisciplinary developments. Jour- Rip, A., & Courtial, J. (1984). Co-word maps of biotechnology: An example
nal of the American Society for Information Science and Technology, of cognitive scientometrics. Scientometrics, 6(6), 381–400.
59(11), 1810–1818. Rosvall, M., & Bergstrom, C.T. (2010). Mapping change in large networks.
López-Herrera, A.G., Cobo, M.J., Herrera-Viedma, E., & Herrera, F. (2010). PLoS ONE, 5(1), e8694.
A bibliometric study about the research based on hybridating the fuzzy Salton, G., & McGill, M.J. (1983). Introduction to modern information
logic field and the other computational intelligent techniques: A visual retrieval. New York: McGraw-Hill.
approach. Internacional Journal of Hybrid Intelligent Systems, 17(7), Schvaneveldt, R.W., Durso, F.T., & Dearholt, D.W. (1989). Network struc-
17–32. tures in proximity data. In G. Bower (Ed.), The psychology of learning
López-Herrera, A.G., Cobo, M.J., Herrera-Viedma, E., Herrera, F., Bailón- and motivation: Advances in research and theory (Vol. 24, pp. 249–284).
Moreno, R., & Jimenez-Contreras, E. (2009). Visualization and evolution New York: Academic Press.

JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—July 2011 1401
DOI: 10.1002/asi
15322890, 2011, 7, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/asi.21525 by UFPE - Universidade Federal de Pernambuco, Wiley Online Library on [07/07/2023]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Sci2 Team. (2009). Science of Science (Sci2 ) Tool. Indiana University and van Eck, N.J., & Waltman, L. (2007). Bibliometric mapping of the compu-
SciTech Strategies. Retrieved from http://sci.slis.indiana.edu tational intelligence field. International Journal of Uncertainty, Fuzziness
Shannon, P., Markiel, A., Ozier, O., Baliga, N., Wang, J., Ramage, D., & and Knowledge-Based Systems, 15(5), 625–645.
Ideker, T. (2003). Cytoscape: A software environment for integrated van Eck, N.J., & Waltman, L. (2009). How to normalize cooccurrence
models of biomolecular interaction networks. Genome Research, 13(11), data? An analysis of some well-known similarity measures. Journal of
2498–2504. the American Society for Information Science and Technology, 60(8),
Shapira, P.,Youtie, J., & Porter, A.L. (2010). The emergence of social science 1635– 1651.
research on nanotechnology. Scientometrics, 85(2), 595–611. van Eck, N.J. & Waltman, L. (2010). Software survey: Vosviewer, a computer
Skillicorn, D. (2007). Understanding complex datasets: Data mining with program for bibliometric mapping. Scientometrics, 84(2), 523–538.
matrix decompositions (Chapman & Hall/Crc Data Mining and Knowl- van Eck, N.J., Waltman, L., Dekker, R., & van den Berg, J. (2010). A com-
edge Discovery Series). London: Chapman & Hall. parison of two techniques for bibliometric mapping: Multidimensional
Skupin, A. (2009). Discrete and continuous conceptualizations of science: scaling and vos. CoRR, abs/1003.2551.
Implications for knowledge domain visualization. Journal of Informetrics, Waltman, L., van Eck, N.J., & Noyons, E.C.M. (2010). A unified approach to
3(3), 233–245. mapping and clustering of bibliometric networks. Journal of Informetrics,
Small, H. (1973). Co-citation in the scientific literature: A new measure of 4(4), 629–635.
the relationship between two documents. Journal of the American Society Wasserman, S., & Faust, K. (1994). Social network analysis: Methods and
for Information Science, 24(4), 265–269. applications. Cambridge, UK: Cambridge University Press.
Small, H. (1997). Update on science mapping: Creating large document White, H.D., & Griffith, B.C. (1981).Author co-citation:A literature measure
spaces. Scientometrics, 38(2), 275–293. of intellectual structure. Journal of the American Society for Information
Small, H. (1999). Visualizing science by citation mapping. Journal of the Science, 32, 163–172.
American Society for Information Science, 50(9), 799–813. Wise, J.A. (1999). The ecological approach to text visualization.
Small, H. (2006). Tracking and predicting growth areas in science. Sciento- Journal of the American Society for Information Science, 50(13),
metrics, 68(3), 595–610. 1224–1233.
Small, H., & Garfield, E. (1985). The geography of science: Disciplinary Zadeh, L. (1965). Fuzzy sets. Information and Control, 8(3), 338–353.
and national mappings. Journal of Information Science, 11(4), 147–159. Zadeh, L. (2008). Is there a need for fuzzy logic? Information Sciences,
Small, H., & Koenig, M.E.D. (1977). Journal clustering using a bibliographic 178(13), 2751–2779.
coupling method. Information Processing and Management, 13(5), 277– Zhao, D., & Strotmann, A. (2008). Evolution of research activities and
288. intellectual influences in information science 1996–2005: Introducing
Small, H., & Upham, S.P. (2009). Citation structure of an emerging research author bibliographic-coupling analysis. Journal of the American Society
area on the verge of application. Scientometrics, 79(2), 365–375. for Information Science and Technology, 59(13), 2070–2086.
Upham, S.P., & Small, H. (2010). Emerging research fronts in science and Zitt, M., Bassecoulard, E., & Okubo, Y. (2000). Shadows of the past in
technology: Patterns of new knowledge development. Scientometrics, international cooperation: Collaboration profiles of the top five producers
83(1), 15–38. of science. Scientometrics, 47(3), 627–657.

1402 JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY—July 2011
DOI: 10.1002/asi

You might also like