0% found this document useful (0 votes)
13 views13 pages

Discovering Dynamic Dipoles in Climate Data

Uploaded by

J Vel Murugan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views13 pages

Discovering Dynamic Dipoles in Climate Data

Uploaded by

J Vel Murugan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/220906792

Discovering Dynamic Dipoles in Climate Data

Conference Paper · April 2011


DOI: 10.1137/1.9781611972818.10 · Source: DBLP

CITATIONS READS
32 179

3 authors, including:

Michael Steinbach Vipin Kumar


University of Minnesota Twin Cities University of Minnesota Twin Cities
184 PUBLICATIONS 23,754 CITATIONS 742 PUBLICATIONS 85,160 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Vipin Kumar on 16 January 2014.

The user has requested enhancement of the downloaded file.


Discovering Dynamic Dipoles in Climate Data
† † †
Jaya Kawale Michael Steinbach Vipin Kumar

Abstract Scientists have known of the existence of such


Pressure dipoles are important long distance climate dipoles for about a century. Two of the best known
phenomena (teleconnection) characterized by pres- pressure dipoles are the North Atlantic Oscillation
sure anomalies of opposite polarity appearing at two (NAO) and the Southern Oscillation (SO). NAO,
different locations at the same time. Such dipoles which measures the difference in anomalies in pres-
have proven important for understanding and ex- sure between Akyureyri in Iceland and Ponta Del-
plaining the variability in climate in many regions gada in the Azores, captures the large scale atmo-
of the world, e.g., the El Niño climate phenomenon spheric fluctuations between Greenland and northern
is known to be responsible for precipitation and tem- Europe. A positive NAO index, which involves higher
perature anomalies worldwide. This paper presents than normal pressure in northern Europe and lower
a novel approach for dipole discovery that outper- than normal pressure around Iceland, is believed to
forms existing state of the art algorithms. Our ap- be connected to warm and wet winters in Europe
proach is based on a climate anomaly network that and cold and dry winters in northern Canada and
is constructed using the correlation of time series of Greenland. Conversely, a negative NAO index is as-
climate variables at all the locations on the Earth. sociated with colder conditions in Europe and milder
One novel aspect of our approach to the analysis of winters in Greenland. Figure 1 shows the time series
such networks is a careful treatment of negative cor- of pressure anomalies for both Ponta Delgada (mea-
relations, whose proper consideration is critical for sured at 37.5N, 25W) and Akyureyri (measured at
finding dipoles. Another key insight provided by our 65N, 17.5W).
work is the importance of modeling the time depen-
dent patterns of the dipoles in order to better capture North Atlantic Oscillation
15
the impact of important climate phenomena on land. Ponta Delgada, Portugal
The results presented in this paper show that these Akureyri, Iceland
10
innovations allow our approach to produce better re-
Monthy mean subtracted anomaly (hPa)

sults than previous approaches in terms of match-


5
ing existing climate indices with high correlation and
capturing the impact of climate indices on land.
0
1 Introduction
−5
Teleconnections, i.e., long distance connections be-
tween the climate of two places on the globe, have
−10
proven important for understanding and explaining
the variability in climate in many regions of the world.
−15
Typically, these teleconnections are represented by
time series known as climate indices [20], which are
−20
often used in studies of the impact of climate phenom- 1993 94 95 96 97 98
Years
99 00 01 02 03

ena on temperature, precipitation, and other climate


variables. One important class of climate indices are
pressure dipoles,∗ which are characterized by pres- Figure 1: Pressure anomaly time series for the North
sure anomalies of opposite polarity appearing at two Atlantic Oscillation. Note that these anomaly time
different locations at the same time. series were constructed from the raw data by remov-
ing the monthly means from each time series.
† Department of Computer Science, University of Min-

nesota, {kawale, steinbac, kumar}@cs.umn.edu


∗ Climate variables other than pressure can be involved in

dipoles. For example, the Dipole Mode Index (DMI) [4], which The Southern Oscillation index (SOI) is mea-
has been investigated in relation to the Indian Monsoon. sured as the difference in the pressure anomalies at

107 Copyright © SIAM.


Unauthorized reproduction of this article is prohibited.
Table 1: List of major pressure based climate indices.

Dipole Climate Variable Description


Sea Level Pressure, Air Characterized by the pressure anomalies at
North Atlantic Oscillation (NAO)
Temperature Ponta Delgada and Akyureyri at Iceland.
Sea Level Pressure, Air
Defined by pressure anomalies in Tahiti and
Southern Oscillation Index (SOI) Temperature and Precipi-
Darwin, Australia
tation
Anomalies at the North Pacific Ocean and
Pacific/North American Index (PNA) Sea Level Pressure
the North America
The first leading mode of the EOF analysis
Antarctic Oscillation (AAO) Sea Level Pressure
of pressure anomalies from 20◦ S poleward
The first leading mode of the EOF analysis
Arctic Oscillation (AO) Sea Level Pressure
of pressure anomalies from 20◦ N poleward
Low frequency variability over the North
Pacific with one center located over the
Kamchatka Peninsula and another broad
Western Pacific (WP) Sea Level Pressure
center of opposite sign covering portions of
southeastern Asia and the low latitudes of
the extreme western North Pacific

Tahiti and Darwin, Australia and captures fluctua- As mentioned, climate indices, including dipoles,
tions in pressure around the tropical Indo-Pacific re- are of great importance in understanding climate
gion that correspond to the El Niño Southern Oscilla- variability. Table 1 lists some dipoles that are well
tion (ENSO) climate phenomenon [13]. A high value known to climate researchers. These dipoles have
of SOI indicates higher pressure anomalies in the east- been discovered by observation, e.g., SOI and NAO,
ern tropical Pacific around Tahiti and lower pressure or by EOF analysis [12], e.g., AO. However, all these
anomalies around Indonesia and northern Australia, discoveries have required considerable research and
while a low value of SOI is associated with the reverse insight on the part of the domain experts involved.
conditions. Figure 2 shows the time series of pressure Because of the amount of effort involved and the
anomalies at Tahiti (measured at 17.5 S, 150W) and possibility of missing indices, an automated approach
Darwin (measured at 12.5S, 130E). to climate index discovery could be quite useful.
One of the first attempts in this direction was
Steinbach et al [8, 9, 10]. The approach used a shared
Southern Oscillation
nearest neighbor (SNN) [2] clustering approach to
4 find climate indices. More specifically, it built a
Tahiti
Darwin, Australia graph of all locations on a latitude-longitude grid
Monthy mean subtracted anomaly (hPa)

3
based on the positive pairwise correlations between
2 the anomaly time series of temperature or pressure at
these locations and then found clusters in this graph.
1
The centroids of these clusters or the differences
0 between two centroids were then used as candidate
climate indices. Many of the resulting candidate
−1 indices showed a high correlation with known climate
−2
indices and were similar in their level of impact on
land climate variables such as temperature.
−3 Tsonis et al. [14] pioneered the use of complex
networks to study climate systems. The authors con-
−4
1993 94 95 96 97 98 99 00 01 02 03 structed networks using nodes on a 5◦ x 5◦ grid on the
Years
globe, where the edges of the network were defined
in terms of the (absolute) correlation values between
Figure 2: Pressure anomaly time series for the South- the anomaly time series of climate variables (SST,
ern Oscillation SLP) of all the pairs of nodes. From this complete

108 Copyright © SIAM.


Unauthorized reproduction of this article is prohibited.
correlation graph, only the edges with significant cor- utility of dynamic dipoles has been discussed by
relation (> 0.5) were retained. In the tropics, the Portis et. al in [7]. However, to the best of our
network had very high connectivity and resembled a knowledge, this paper is the first one to compute
complete graph, while away from the equator, the dynamic climate indices automatically and study
network showed characteristics typical of a scale free their impact on the local climate variables.
network. The authors further showed that the su-
pernodes in the scale-free network corresponded to 1.2 Organization of the paper The paper is
major climate indices such as NAO and PNA [14, 15]. organized as follows: Sections 2 and 3 describe the
Other researchers have also applied complex networks data and the preliminary analysis needed for the
to climate for examining the structure of the climate network construction, respectively. Our algorithms
system Donges et al. [1], analyzing hurricane activ- for dipole detection are presented in Section 4, while
ity Forgarty et al. [3], and finding communities in Section 5 discusses the dipoles discovered from the
climate networks and how they correspond to known data. Section 6 provides an evaluation of the results
climate patterns Steinhaeuser et al. [11]. with respect to existing climate indices and other
In our work, we present comprehensive tech- work. Conclusions and possibilities for future work
niques to systematically find climate indices that are are presented in Section 7.
dipoles from the climate data. In the other ap- 2 Dataset
proaches, negative correlations have often been ig-
nored [8] or only absolute values of correlations have For our analysis, we use pressure climate data from
been considered [14]. However, as we show in Section the NCEP/NCAR Reanalysis project provided by the
3, negative correlations are key for detecting dipoles, NOAA/OAR/ESRL PSD, Boulder, Colorado, USA
and thus, must be preserved in both sign and magni- [17]. The NCEP/NCAR reanalysis project has data
tude. In addition, a threshold is often used to elimi- assimilated from 1948 – present which is available
nate spurious correlations, but using the same thresh- for public download at [18]. We focus on sea level
old for positive and negative correlations is not appro- pressure, which consists of monthly mean values at a
priate since negative correlations are usually weaker grid resolution of 2.5◦ longitude x 2.5◦ latitude on the
and many nearby locations have high positive corre- globe. In all, we have 62 years of data (corresponding
lation. We also study the change in climate indices to 744 monthly values) for 10512 grid locations on the
over time unlike [8]. Although some of the approaches globe. We chose pressure because it is an important
based on complex networks have taken time into con- climate variable and many of the well known climate
sideration, we go further, defining dynamic climate indices are based upon it. Air temperature, although
indices and evaluating the improvement that results also important, is locally correlated with pressure.
in terms of evaluating the impact on land. 3 Network Construction Method
1.1 Our Contributions: More specifically, the We use a network construction method similar to [14]
contributions of our paper are as follows: and [1] except that we do not threshold the networks
by taking the absolute value of correlation and us-
1. We show the importance of treating negative cor-
ing a single threshold. Instead, we define separate
relations in climate data differently than positive
thresholds for positive and negative correlations as is
correlations unlike [8, 14, 1].
further discussed in section 3.3. We also use smooth-
2. We present algorithms for discovering dipoles ing. The details of network construction are provided
from climate data that are cognizant of the pos- in the following subsections.
itive and negative nature of correlation. This 3.1 Data Smoothing The NCEP/NCAR Re-
includes an algorithm based on discovering com- analysis data consists of monthly mean values for
munities in complex networks. These algorithms each of the climate variables. When we consider pair-
are able to identify most of the major existing wise linear correlation between two time series, an
dipoles in climate data with higher correlation anomalous peak or a valley around a month can dis-
than current techniques. tort the correlation value. In order to remove such
3. Our approach provides a novel framework for inconsistencies, we smooth the data by considering
studying the change in the dipoles across both the moving average of three months.
space and time. Investigations using this frame- 3.2 Seasonality Removal Generally, climate
work reveal that the area weighted impact on the data has a strong seasonality signal due to the Earth’s
land is higher if the dipole climate indices are de- revolution. The seasonality component is typically
fined by moving rather than fixed locations. The uninteresting and masks other more interesting sig-

109 Copyright © SIAM.


Unauthorized reproduction of this article is prohibited.
nals. In order to handle this problem, we preprocess quite similar in both Fig. 3 and 4. This gives cre-
the raw data by removing the monthly means in or- dence to our assumption that most of the very high
der to obtain anomaly values for each month. The positive correlation comes from nearby links. This
data normalization for every location is performed as also makes it harder to prune edges due to positive
follows: correlation since the pruning threshold needs to be
end cognizant of the physical distance between the nodes.
1 X
µm = xy (m), ∀m∈{1..12}
end − start + 1 y=start
6
x 10
4.5

4
xy (m) = xy (m) − µm , ∀y∈{1948..2009}
3.5
In this formula, start and end represent the start
3
and end years to consider for the mean and define the

Frequency
base for computing the mean for subtraction (in our 2.5
case 1948 and 2009). µm is the mean of the month
2
m and xy (m) represents the value of pressure for the
month m and year y.Once we remove the monthly 1.5
means, the resulting values are the anomaly time 1
series for that location.
0.5
3.3 Edge Weight Estimation After we get
the anomaly values for every node, the networks 0
−1 −0.5 0 0.5 1 1.5
are constructed by looking at the similarity values Correlation

between the anomaly timeseries of two nodes. We


compute the similarity between two nodes by taking Figure 3: Distribution of correlation
the Pearson correlation between the two time series
at the nodes. Pearson correlation is a linear measure 6
of similarity and is expressed as follows: x 10
8
Pn
(xi − x̄)(yi − ȳ) 7
rxy = i=1
(n − 1)sx sy 6
where x̄ and ȳ are the mean of the two series X and 5
Frequency

Y, and sx and sy are the standard deviations of the


two series. 4
We do not want to consider all the edges in the 3
complete graph formed to be a part of the network for
analysis as there are about 100 million edges in our 2
case and most of them are uninteresting. Thus, the 1
correlation threshold plays an important role in defin-
ing the network and must be chosen appropriately. 0
−1 −0.5 0 0.5 1
Fig 3 shows the distribution of the correlations in the Correlation
pressure network. Due to autocorrelation in space,
the positive correlation goes as high as 1. However Figure 4: Distribution of correlation after filtering
the negative correlation between any two nodes does edges < 5000 km
not go as high. If we threshold the graph using a sin-
gle absolute value (for e.g. 0.5) we will be using a very We formally define the network or the graph to
harsh filter for negative correlations but a weak filter be an ordered pair G = (V, E) where V = {1,2.. N} is
for positive correlations that allows many spurious the set of nodes on the globe grid and E is the set of
values to pass through. Fig 4 shows the distribution undirected edges (i, j) such that the rij is significant
of edges which are greater than 5000km away. Note or above the threshold, which is different for negative
that most of the high positive correlation edges have and positive correlations. We construct networks for
disappeared and the distribution of the positive and the pressure variable using a 20 year long window and
negative correlation is now quite similar. In particu- slide the window by 5 years at a time so as to get 9
lar, the distribution of negatively connected edges is separate networks spanning 20 years each for our 62

110 Copyright © SIAM.


Unauthorized reproduction of this article is prohibited.
years of data. Constructing such networks enables us it becomes a part of the dipole region defined by pt1
to study the changes over time and is important in and pt2 . The details of the algorithm are presented
understanding the dynamics of the climate network. below.
4 Our Approach to Discover Dipoles Algorithm 1 A1: Nearest neighbor approach to find
Automatic discovery of candidate dipoles faces sev- dipoles
eral challenges. First, a formal definition is needed. Require: Two starting points pt1, pt2 of the dipole, K
For this work will will define dipoles as pairs of re- the number of nearest neighbors to examine
gions whose locations have strong negative correlation Region1 ⇐ pt1
with locations in the other region and strong positive Region2 ⇐ pt2
P 1 ⇐ K number of positive nearest neighbors of pt1
correlation with locations in the same region. The
N 1 ⇐ K number of negative farthest neighbors of pt1
reason to look at regions instead of single locations is
P 2 ⇐ K number of positive nearest neighbors of pt2
that the correlation between single locations can eas- N 2 ⇐ K number of negative farthest neighbors of pt2
ily be spurious. On the other hand, if the size of the for i = 1 to K do
regions gets too large, the climate phenomenon will {For every node in N 1, N 2 check if it is in P 2, P 1
be diluted or disappear, so a careful balance needs to respectively}
be striked out. ind ⇐ f ind(N 1(i), P 2)
In the previous section, we presented a method if ind 6= 0 then
to construct a weighted undirected graph where the Region2 ⇐ Region2 ∪ N 1(i)
nodes are location on the earth and the edge weights end if
represent the strength in correlation in the time series ind ⇐ f ind(N 2(i), P 1)
if ind 6= 0 then
of climate data at the two end points of the edge. The
Region1 ⇐ Region1 ∪ N 2(i)
scale of the correlation ranges from -1 to 1, where
end if
1 means perfect correlation and -1 indicates perfect end for
anti-correlation. From the definition of a dipole, two if size(Region1) ≥ MIN-DIPOLE-SIZE then
locations within the same region of the dipole should if size(Region2) ≥ MIN-DIPOLE-SIZE then
share a strong positive correlation with each other return (Region1, Region2)
while two locations in different regions of the dipole end if
should have a strong negative correlation between end if
them. Additionally, each region of the dipole should return ”no dipoles found”
be geographically contiguous.
4.1 Algorithm for Constructing Dipoles(A1) It could happen that the starting edge has a
Our algorithm captures the essential characteristics spurious correlation and the regions around it do not
of the dipole by a very simple mechanism. It first lead to a dipole. In order to verify this, we check
picks a negatively weighted edge from the graph and whether the size of the two regions has grown to be
builds regions around it. This edge can be picked in sufficiently large enough and only then label the two
several ways and the results of the algorithm depend regions as a dipole. After finding a dipole, we remove
on this choice. We use a simple approach that starts the edges of the dipole from the network and continue
with the most negative edge in the network. The finding further dipoles by picking up the next most
two end points, say pt1 and pt2, of the starting edge negative edge in the graph until the resultant graph
constitute two points of the dipole (of opposite po- becomes very sparse or the most negative edge in the
larity). Consider two sets of K number of locations, graph falls below a threshold. We used -0.4 as the
N 1 and N 2, that have the most negative correlations threshold a negative edge must have in order to be
with pt1 and pt2, respectively. Similarly consider two considered by the dipole algorithm. There are about
sets of K number of locations, P 1 and P 2, that have 1.5 million edges in the graph that have a negative
most positive correlation with pt1 and pt2, respec- correlation lower than -0.4.
tively. If a node in N 1 which belongs to the list of Apart from the choice of starting node, this
most negative edges on pt1 is in P 2, i.e. it is also algorithm also depends on the value of K. We studied
very highly positively connected to node pt2, then it the impact of several different choices for the value of
becomes part of the dipole region consisting of pt2. K (100, 300, 500, 1000, 2000). If we choose K to
Similarly if a node in N 2 is also in P 1 then it becomes be very large, the regions of dipoles are very large
part of the region consisting of pt1. In other words, if (and often non-contiguous) whereas for a very small
a node is highly negatively connected with node pt1 value of K the size of dipoles is small and might
and highly positively connected with node pt2 then not be a good representative of the actual dipole.

111 Copyright © SIAM.


Unauthorized reproduction of this article is prohibited.
Based on these empirical observations, we set K to be 2. Construct the network using the value of correla-
300. This choice of K ensures that the two regions of tion from the anomaly data for different windows
dipoles are of a reasonable size. However we evaluate of time periods and for each network, threshold
our results for different values of K. it to retain only the edges with significant corre-
Another key point to observe about this algo- lations as described in section 3.3.
rithm is that we do not explicitly check if region 1 or
region 2 are contiguous or not. Since we consider the 3. Generate clusters from the network data using
top 300 positive neighbors of a point due to spatial clustering/community detection as mentioned in
autocorrelation they will very likely be contiguous, 4.2. (Alternatively we can run A1 directly on the
however this is not guaranteed. entire graph.)
4.2 Community Based Method In the previ- 4. Using algorithm A1 within a cluster, separate
ous subsection, the method presented considered all the two ends of all negatively correlated edges
the locations on the earth for building the dipole re- within it into two buckets such that nodes within
gions but the regions around the dipoles could be non- a bucket are positively correlated and the nodes
contiguous. In order to overcome this limitations, we in opposite buckets are negatively correlated .
consider partitioning the network before running A1.
Network paritioning makes the resulting process more 5. The algorithm returns the two buckets formed
robust by constraining the search of dipoles within a from step 4 as dipoles if the size of each bucket
much smaller region than the whole Earth, i.e., within is greater than a threshold.
a community consisting of positively and negatively
correlated nodes. 5 Results
We use a community detection algorithm for par- We ran our dipole detection algorithm A1 and its
titioning the network. The main goal of a commu- community version on the pressure data set from the
nity based approach is to partition the network into NCEP/NCAR. We constructed anomaly data using
several smaller subsets of nodes and make the algo- the base for mean to be the entire 62 year duration.
rithm A1 less sensitive to the variability at smaller The networks were constructed for a period of 20
non contiguous locations that should not be a part years each with a sliding window of 5 years so as
of the dipole. The aim of clustering is to find regions not to introduce abrupt changes between networks.
containing nodes that are highly positively or nega- Thus we had 9 networks spanning 20 years each.
tively correlated. We can achieve this goal by guiding Before running the community detection algorithm,
the community detection algorithm into partitioning we threshold the graph using a 0.85 value for positive
the network into appropriate clusters by choosing the correlation and -0.4 value for negative correlation.
correlation thresholds as discussed in 3.3. Using the We use these thresholds with the intuition that it
histogram of thresholds and empirical evaluation, we helps us include all the significantly negative edges
set the positive threshold to be very high (close to in the graph but we are still very strict for the
0.85) so that only nearby contiguous locations form positive edges since we need them only to construct
an edge and the negative threshold to be lower (close homogenous regions around each end of the dipoles.
to -0.4) so that the significant dipoles are still cap- This threshold helped us get all the major dipoles
tured. as a part of a community each. In order to find
For community detection, we chose Walktrap communities, we used walktrap version 2 with default
algorithm [6] which is based upon random walks. parameters (random walk length=4). The following
This algorithm is based on the fact that random walks sections describe some of the well known dipoles that
tend to become trapped in dense part of the network we discovered from the data.
corresponding to communities (clusters). Once we
get communities from the entire network, we find • Southern Oscillation: The SO is one of the most
dipoles within a community by using algorithm A1 important dipole in climate. It is clearly seen in
and picking up the most negative edge within the all the networks with correlations close to 0.9.
community as the starting edge.
4.3 Summary: Thus to summarize, our algorithm • North Atlantic Oscillation: NAO is a well es-
for dipole detection consists of the following steps - tablished fluctuation in opposite phase of the
climate of Greenland/Iceland and northern Eu-
1. Construct the anomaly series from the smoothed rope. From the data we see that NAO is one of
data by removing the seasonality as mentioned the strongest signals. When we pick up the most
in section 3.2. negative edge in the graph, most of the time it is

112 Copyright © SIAM.


Unauthorized reproduction of this article is prohibited.
Figure 5: Different phases of NAO seen in pressure data

Figure 6: Different phases of SOI seen in pressure data

in the NAO region. The North Atlantic Oscilla- 1. Strength of the negative correlation between
tion is seen very clearly in all the 9 networks of the two regions of the dipole. Higher negative
20 year periods. correlation implies a stronger dipole.

• Arctic Oscillation: The Arctic Oscillation is 2. Correlation with known dipole indices. This
the pressure anomaly around the North pole highlights the ability to reproduce known
and is defined on the basis of the first leading dipoles.
component of an EOF analysis using the region
north of 20N latitude. It does not have a pair of 3. Impact of the dipole indices on land by comput-
physical locations associated with it. However ing an area weighted correlation of land temper-
using our method we are able to find it in all the ature anomalies with the dipole indices. This
9 networks with a very high correlation. highlights the ability of data driven dipoles to
potentially outperform known dipoles.
• Antarctic Oscillation: The Antarctic Oscilla-
6.1 Negative Correlation within regions of
tion measures the anomaly of pressure around
Dipole From the definition of the dipole, the two
the Antarctic region. This oscillation is the ana-
regions forming a dipole should be negatively corre-
log of the Arctic oscillation in the southern hemi-
lated with each other. To compute the strength of
sphere and is also defined by EOF analysis of
the negative correlation across the two regions, we
locations south of 20S. We see the Antarctic Os-
look at three values -
cillation in all the climate networks. However
the climate indices data from the Climate Pre- 1. The mean value of the correlation between all the
diction Center is defined from 1979 onwards[19]. locations pairs across two regions constituting
Hence we can only compare its correlation with the dipoles. We call this value mean of all pairs.
known climate indices for the last two networks.
2. The best correlation in the two regions of the
• Western Pacific Index: The Western Pacific in- dipole represented by the most negative edge in
dex is north south dipole around the western Pa- the two regions. We call this value the best pair.
cific with one end located over the Kamachatka
peninsula and the other end in southeastern Asia 3. Compute the mean of the anomalies of all the
and the subtropical north Pacific. locations at each region and then take the corre-
lation between them. We call this pair of means.
6 Experimental Evaluation Table 2 shows the three correlation values of the
In order to evaluate the goodness of the dipole regions dipole regions discovered by our algorithms. The
generated, we look at three things - table reports the mean values for all the 9 networks.

113 Copyright © SIAM.


Unauthorized reproduction of this article is prohibited.
Figure 7: Different phases of AO seen in pressure data

Years:1983 −1993 Years:1988 −1998 Years:1993 −2003


90° N 90° N 90° N

45° N 45° N 45° N

0° 0° 0°
180° W 135° W 90° W 45° W 0° 45° E 90° E 135° E 180° E 180° W 135° W 90° W 45° W 0° 45° E 90° E 135° E 180° E 180° W 135° W 90° W 45° W 0° 45° E 90° E 135° E 180° E

45° S 45° S 45° S

° °
90 S 90 S 90° S

Figure 8: Different phases of Antarctic Oscillation seen in pressure data

From the table it can be seen that all the regions 6.2 Comparison with known Climate Indices
are strongly negatively correlated, indicating that the In order to evaluate the goodness of the dipole
regions indeed consist of strong opposing pressure clusters found, we compared them with some well
polarities. known climate indices. For each of the 9 network
We performed a further analysis of the SOI region periods, we generated a set of dipoles from the
and found that the negative correlation between corresponding network. For every dipole belonging to
Tahiti and Darwin is not as strong as several other a time period, we took the two clusters belonging to
location pairs. Fig 9 shows the correlation between the dipole and computed their centroids by taking the
Tahiti and Darwin as well as the best pair results mean of the anomaly at those locations during that
from our two dipole finding algorithms. This results time period. We computed the difference in between
indicates that the underlying phenomenon leading to the two cluster centroids to create a time series which
the negative correlation is not fixed at Tahiti and is then compared with all the climate indices over
Darwin and that SOI and other climate indices are that period using linear correlation. We kept track
perhaps better captured with dynamic clusters. of the best correlation to the climate indices during
the period and recorded the dipole cluster that best
−0.4
matched each climate index. We performed this step
Tahiti−Darwin for all the time periods. Table 3 shows the the best
A1(best pair)
−0.45 A1 +Community(best pair)
correlation to each climate index of the dipoles found
using the two variations of algorithm A1 with a bin
−0.5
size of 300. Although A1 + community shows weaker
Correlation

−0.55 correlation than A1 in a number of cases, the impact


of A1 + community is sometimes still better as is
−0.6 shown later in the paper.
From Table 3 we see that using our algorithm
−0.65
for with A1 with or without a community, we are
−0.7 able to match that with an average precision of 0.88
and 0.86, respectively to find SOI. Another important
−0.75 dipole that we find with very high correlation is the
0 2 4 6 8 10
Networks Arctic oscillation. Climate scientists define the Arctic
Oscillation as the first leading component of an EOF
Figure 9: Best negative correlation for the SOI analysis and thus it does not have interpretation
cluster. Lower curves are better. in terms of pairs of locations. However, using our
method we are able to find a pair of negatively

114 Copyright © SIAM.


Unauthorized reproduction of this article is prohibited.
Dipole A1 A1 + community
Mean of all pairs Best pair Pair of means Mean of all pairs Best pair Pair of means
SOI -0.4425 -0.5993 -0.3658 -0.4482 -0.6221 -0.3426
NAO -0.4584 -0.7171 -0.6997 -0.4598 -0.7170 -0.7019
AO -0.5071 -0.7405 -0.6950 -0.5063 -0.7405 -0.6974
AAO -0.4187 -0.5478 -0.4988 -0.4173 -0.5639 -0.4345
WP -0.4000 -0.5139 -0.4424 -0.4069 -0.5301 -0.4635

Table 2: Strength of negative correlation of identified dipoles using our algorithm.

correlated clusters whose difference correlates very ing shifted correlations will only improve the numbers
well with the AO climate index (as high as 0.85). further. From 4 we see that our algorithm is better
To evaluate the sensitivity of our analysis to than the existing approaches to find climate indices.
the choice of the dipole bin size, K, we looked at Note that for A1, we report the mean values that we
the mean correlation with the climate indices using got from choosing K=300 as shown in Table. 3.
different values of K. These results are shown in
6.3 Area weighted correlation with land tem-
Fig 10. As expected a small value of K gives very
perature From the previous sections we see that we
focused patterns for SOI and NAO and leads to better
can generate dipoles that dynamically change over
correlation because they are actually defined by single
time and from the results we see that their corre-
point locations. However we see that at very small
lation with known climate indices is very high. In
values of K, the correlation of the dipole cluster with
order to study the changes in the dipole clusters over
AO or AAO is not as high as for larger values of
time, we take their centroids and plot them on the
K. This is because the AO and AAO patterns are
globe. Fig.11 shows the plot of moving centroids of
not defined using single point locations, but instead
the Arctic Oscillation dipole. Fig 12 shows the plot
are defined as a summary of the behavior of a large
of moving centroids of the North Atlantic Oscillation
region.
cluster.

1
SOI NAO AO WP AAO

0.95

0.9
Correlation

0.85

0.8

0.75

0.7

0.65
100 300 500 1000 2000
K
Figure 11: Moving Centroids of Arctic Oscillation
Figure 10: Effect of varying the region size K on A1

We hypothesize that the climate indices are bet-


6.2.1 Comparison with existing approaches ter captured by considering them to be moving.To
We also compare our approach for finding dipoles verify this hypothesis, we compute the area weighted
with existing approaches to find them. In [8], Stein- correlation with the land temperature anomalies for
bach, et al present a SNN clustering based approach both static and the dynamic index for each of the 9
to find climate indices. We use the numbers as re- different network periods. Our dynamic index for SOI
ported in [8] for our comparison. The SNN cluster- is computed by taking the mean of the two regions of
ing numbers are shifted correlations, however we for the dipole and their difference. We also generated a
our numbers we use linear correlations only. Comput- random baseline and compute the correlation of land

115 Copyright © SIAM.


Unauthorized reproduction of this article is prohibited.
Table 3: Correlation of our dynamic indices with known climate indices (K=300)
Network A1 A1 + community
SOI NAO AO AAO WP SOI NAO AO AAO WP
1 0.8885 0.7686 0.8665 - 0.7166 0.8761 0.7686 0.8665 - 0.7163
2 0.8696 0.7729 0.8506 - 0.7231 0.7378 0.7711 0.8529 - 0.7232
3 0.9012 0.7312 0.8560 - 0.7400 0.8952 0.7317 0.8580 - 0.7399
4 0.8895 0.8044 0.8353 - 0.7306 0.8828 0.8043 0.8353 - 0.7298
5 0.8983 0.7279 0.8037 - 0.7523 0.8540 0.7283 0.8037 - 0.5017
6 0.9214 0.7498 0.8648 - 0.7602 0.9195 0.7488 0.8702 - 0.7408
7 0.8387 0.7769 0.8137 - 0.7604 0.8318 0.7819 0.8137 - 0.7318
8 0.8946 0.7581 0.8407 0.8763 0.7240 0.8933 0.7582 0.8407 0.8797 0.4369
9 0.8746 0.7609 0.8597 0.8835 0.7103 0.8737 0.7621 0.8597 0.8809 0.7095
Mean 0.8863 0.7612 0.8434 0.8799 0.7353 0.8627 0.7608 0.8445 0.8803 0.6642

Climate Indices SNN Clustering Our approach


(Shifted Correlation) A1 A1 + community
SOI 0.7312 0.8863 0.8627
NAO 0.7519 0.7612 0.7608
AO 0.7577 0.8434 0.8445
AAO - 0.8799 0.8803
WP 0.2857 0.7353 0.6642

Table 4: Comparison with existing approaches.

baseline is shown in Fig.13 and 14 and the box around


the mean shows the interquartile range and the me-
dian. Fig.13 shows the comparison of impact on land
of our index, SOI, and the random baseline. These
results also show that both algorithms A1 and A1
+ community have a stronger impact on land tem-
perature anomalies than SOI—sometimes up to 90%
better. In fact, A1 + community gives a better per-
formance for all the network years. Note that A1 +
community gives better results than A1 even though
A1 showed slightly better correlation with static SOI
as shown in Table 3.
Instead of just looking at the centroid we also
compared the index generated from the best correla-
tion pair. The best correlation pair is the edge with
Figure 12: Moving Centroids of North Atlantic Os- the strongest negative correlation in the dipole clus-
cillation ter. Fig. 14 shows the area weighted correlation from
the best pair. We see that the best pair does not al-
ways perform better than the mean which provides
more stability to the index. The best pair still has
temperature anomalies using any two random loca- a better impact on land temperature anomalies as
tions. For each network period we picked 100 pairs compared to the SOI index. Fig. 15 and 16 show
of random locations from the globe having a corre- the correlation with land temperature anomalies for
lation > 0 among them and computed the difference a single network. From the figures it can be seen that
between their anomalies and calculated their average the dynamic index is similar in pattern to the static
impact on land temperature anomalies. This consti- index but has a much stronger correlation.
tutes the random baseline. The mean of the random

116 Copyright © SIAM.


Unauthorized reproduction of this article is prohibited.
0.09
SOI Index SOI Index
0.08 A1 0.08 A1 (best pair)
A1+Community A1+Community (best pair)
Mean Random 0.07 Mean Random
Impact on land

0.06 0.06

Impact on land
0.05
0.04 0.04

0.03
0.02
0.02

0.01
0
1 2 3 4 5 6 7 8 9 0
Networks 1 2 3 4 5 6 7 8 9
Networks

Figure 13: Area weighted correlation of land tem-


perature anomalies using SOI index vs our indices Figure 14: Area weighted correlation of land temper-
generated from the cluster centroids ature anomalies using SOI index vs best correlation
pair in our dipole cluster given by the algorithms

7 Conclusion and Future Work claim is that the area weighted correlation of the SOI
index with land temperature anomalies is improved
This paper presents a novel approach to find dipoles by up to 90% by capturing the index as a centroid
using the climate data. The problem of finding of moving clusters rather than fixed locations. Given
dipoles has been of key interest to climate scientists as the importance of the Southern Oscillation on the
it helps in a greater understanding of the teleconnec- climate of the globe, this result has significant impact
tions and several important extreme phenomenons. in terms of predictions in climate science. The
Finding dipoles has been particularly interesting to Southern Oscillation is closely tied with the El Niño
the data mining community as the underlying data phenomenon which drives the extreme weather events
is not only large but also has a spatio-temporal na- like tropical cyclones, droughts, hurricanes, etc. A
ture presenting challenges such as seasonality, high thorough evaluation of this is part of future work.
variability, autocorrleation, etc. In this setting, we In addition to further evaluation and improve-
propose a method based on greedy heuristics to iden- ment of the approaches presented in the paper, we
tify dipoles. Our methodology seems to produce con- need to go beyond comparisons to current climate in-
siderably better results than the current state-of-art dices to see if any novel dipoles can be discovered.
algorithms. Although it is unlikely that any of these would be as
The algorithm A1 proposed in the paper and significant as NAO or SOI, such dipoles could still be
it’s community version is effective and efficient to of great regional importance.
implement. Our community based approach to first
partition the large network of all locations on the Acknowledgement
globe narrows the search space for A1 algorithm, This work was supported by NSF grants III-0713227,
generates fewer candidate dipoles, removes spurious IIS-0905581, and IIS-1029711. We also thank Dr.
connections and is able to match the performance Stefan Liess and Dr. Shyam Boriah for their com-
of A1. However, further investigation is needed to ments and feedback.
determine if one of these algorithms is to be clearly
preferred to the other. References
A larger significance of this work, which might
impact how climate scientists perceive the climate [1] Donges, J. F., Zou, Y., Marwan, N. Complex net-
indices, is that it shows climate indices are better works in climate dynamics. In European Physical
explained as centroids of dynamic clusters. So far, Journal Special Topics, 174 (1), pp. 157–179, 2006.
climate scientists have mostly considered climate [2] Ertoz, L., Steinbach, M., Kumar. V. A new shared
indices to be fixed. The evidence that supports our nearest neighbor clustering algorithm and its appli-

117 Copyright © SIAM.


Unauthorized reproduction of this article is prohibited.
Static Index Dynamic Index
0.5 0.5

0.4 0.4

0.3 0.3
−45 −45
0.2 0.2

0.1 0.1
Latitude

Latitude
0 0 0 0

−0.1 −0.1

−0.2 −0.2
45 45
−0.3 −0.3

−0.4 −0.4

90 −0.5 90 −0.5
−90 0 90 180 −90 0 90 180
Longitude Longitude

Figure 15: Area weighted correlation of land temper- Figure 16: Area weighted correlation of land temper-
ature anomalies using SOI index for network 2 ature anomalies using our dynamic index generated
from A1 + Community for network 2.

cations. In Workshop on Clustering High Dimen-


issues and results. In Proceedings of the 4th KDD
sional Data and its Applications, SIAM Data Min-
Workshop on Mining Scientific Datasets, 2001.
ing, 2002.
[11] Steinhaeuser, K., Chawla, N. V., Ganguly, A. R. An
[3] Fogarty, E. A., Elsner, J. B., Jagger, T. H., Tsonis,
exploration of climate data using complex networks.
A. A. Network Analysis of U.S. Hurricanes, Hurri-
KDD Workshop on Knowledge Discovery from Sen-
canes and Climate Change, 1-15, 2009.
sor Data, pp. 23–31, 2009.
[4] Gadgil, S. and Vinayachandran, PN and Francis, PA
[12] Storch, H. V. and Zwiers, F. W. Statistical analysis
and Gadgil, S. Extremes of the Indian summer mon-
in Climate Research. Cambridge University Press,
soon rainfall, ENSO and equatorial Indian Ocean
1999.
oscillation. In Geophysical Research Letters, 174 (1),
[13] Taylor, G. H. Impacts of the El Niño/southern
pp. L12213–1, 2004.
oscillation on the pacific northwest. Technical report,
[5] Gozolchiani, A., Yamasaki, K., Gazit, O., Havlin S.
Oregon State University, USA, 1998.
Pattern of climate network blinking links follows El
[14] Tsonis, A. A., Swanson, K. L., Roebber, P. J. What
Niño events. In Europhysics Letters, vol 83, issue 2,
Do Networks Have to Do with Climate. In Bulletin
2008.
of the American Meteorological Society, vol. 87, no.
[6] Pons, P., Latapy, M. Computing Communities
5, pg. 585–595, 2006.
in Large Networks Using Random Walks. Jour-
[15] Tsonis, A. A., Swanson, K. L., Wang, G. On the
nal Graph Algorithms Applications. 10(2): 191-218,
role of atmospheric teleconnections in climate. In
2006.
Bulletin of the American Meteorological Society, vol.
[7] Portis, D. H., Walsh, J. E., El Hamly, Mostafa and
21, issue 12, 2008.
Lamb, Peter J., Seasonality of the North Atlantic
[16] Tsonis, A. A. and Swanson, K. L., Topology and
Oscillation, Journal of Climate, vol. 14, pg. 2069-
Predictability of El Niño and La Niña Networks. In
2078, 2001.
Physics Review Letters, vol. 100, no. 22, 2008.
[8] Steinbach, M., Tan, P., Kumar, V., Klooster, S., and
[17] E. Kalnay, et al, 1996. The NCEP/NCAR 40-
Potter, C. 2003. Discovery of climate indices using
Year Reanalysis Project Bulletin of the American
clustering. In SIGKDD international Conference on
Meteorological Society, Vol. 77, No. 3. (1 March
Knowledge Discovery and Data Mining. KDD, pg.
1996), pp. 437-470.
446–455, 2003.
[18] http://www.esrl.noaa.gov/psd/data/
[9] Steinbach, M., Tan, P., Kumar, V., Potter, C and
[19] http://www.cpc.ncep.noaa.gov/
Klooster, S. Data mining for the discovery of ocean
[20] http://www.cgd.ucar.edu/cas/catalog/climind/
climate indices. In Mining Scientific Datasets Work-
shop, 2nd Annual SIAM International Conference on
Data Mining, 2002.
[10] Steinbach, M., Tan, P., Kumar, V., Potter, C.,
Klooster, S. Clustering earth science data: Goals,

118 Copyright © SIAM.


Unauthorized reproduction of this article is prohibited.

View publication stats

You might also like