0% found this document useful (0 votes)
35 views30 pages

SW Requirements and Data Analysis in Confocal Raman Micros

Chapter 4 discusses the software requirements and data analysis techniques necessary for confocal Raman microscopy, focusing on the acquisition and processing of large spectral data sets. It outlines the challenges faced by software in handling high-speed data streams, ensuring spatial and spectral correlation, and performing necessary pre-processing steps like cosmic ray removal and smoothing. The chapter emphasizes the importance of software in facilitating accurate data acquisition and visualization for effective analysis of the material properties observed in the spectra.

Uploaded by

SohaEk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views30 pages

SW Requirements and Data Analysis in Confocal Raman Micros

Chapter 4 discusses the software requirements and data analysis techniques necessary for confocal Raman microscopy, focusing on the acquisition and processing of large spectral data sets. It outlines the challenges faced by software in handling high-speed data streams, ensuring spatial and spectral correlation, and performing necessary pre-processing steps like cosmic ray removal and smoothing. The chapter emphasizes the importance of software in facilitating accurate data acquisition and visualization for effective analysis of the material properties observed in the spectra.

Uploaded by

SohaEk
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Chapter 4

Software Requirements and Data Analysis


in Confocal Raman Microscopy

Thomas Dieing and Wolfram Ibach

Abstract In confocal Raman microscopy experiments, tens of thousands of spectra


are commonly acquired in each measurement. Every spectrum carries a wealth of
information on the material at the position where the spectrum is recorded. From
each of these spectra the relevant information can be extracted to allow, i.e., the
determination of the various phases present in the sample or variations in the strain
state. For this purpose, the spectra need to be prepared (i.e., background subtraction)
before the relevant information can be extracted using appropriate filters and algo-
rithms. This information can then be visualized as an image, which can be further
processed and exported for the presentation of the results of the experiment.
In this chapter, the requirements of the software in terms of handling the data
streams and maintaining the spatial and spectral correlation between the spectra
and the created images are illustrated. Spectral data processing features, simple and
multi-variant algorithms for image creation as well as advanced data processing
features are discussed.

4.1 Introduction

In the previous chapters the theoretical background as well as the instrumentation


requirements for a confocal Raman microscope were illustrated. In addition to the
hardware, software enabling the data handling and analysis plays a crucial role in
enabling the acquisition and presentation of results of confocal Raman data.
Generally, the software included with commercially available confocal Raman
microscopes facilitates data acquisition as well as data evaluation to varying degrees.
Other software available on the market is purely focused on data evaluation.
The requirements for the software for data acquisition are described in Sect. 4.2
before a general description of the data sets acquired in confocal Raman microscopy
experiments is presented in Sect. 4.3.
Once the data are acquired, it can be treated in various ways. In most cases, a pre-
processing will first take place such as cosmic ray removal, background subtraction,
or smoothing of the spectra. These are usually applied to both single spectra as
well as multi-spectral data sets and are described in Sect. 4.4. Multi-spectral data
sets are then commonly evaluated further to extract the information in a way that

61
62 T. Dieing and W. Ibach

it can be displayed. For data sets originating from confocal Raman microscopy
experiments, where at each image pixel a full spectrum was recorded, this evalu-
ation will result in an image. This image generation can be performed using either
single-variant or multi-variant methods. The resulting images and masks can then
be evaluated further in combination with the multi-spectral data sets in order to
obtain, for example, average spectra originating from certain areas on the sample.
Combining the information contained in single spectra with the multi-spectral data
sets allows further enhancement of the image contrast. These steps are described in
Sects. 4.5–4.7. Combining the images generated allows the information contained
in various images to be displayed in one multi-colored image (Sect. 4.8). From these
images, phase separation and/or mixing can easily be identified.
In Sect. 4.9 it will be illustrated that even very noisy spectra acquired with
extremely little signal can be treated through the described methods to extract the
relevant information and obtain images with excellent contrast.
All the above-mentioned data processing techniques will be shown using a few
sample systems which are introduced together with the acquisition parameters in
Sect. 4.10.

4.2 Requirements for Data Acquisition Software

The requirements for any data acquisition software for confocal Raman microscopy
are extensive. The main tasks can, however, be sorted into several groups, which
will be described in the following.

4.2.1 Data Acquisition

4.2.1.1 Acquisition of Spectra


For the acquisition of spectral data the software must first read and store the data
acquired by the spectroscopic CCD camera. While this may at first appear straight-
forward, it can already pose some significant challenges for the software.
Consider a spectroscopic CCD camera with 1600 × 200 pixels. Such cameras can
be readout in a full vertical binning mode, by reading out only a certain region of
interest or by reading out the entire chip as an image. The first cases, however, allow
a significantly faster readout. A single spectrum readout from such a CCD camera
will in this case consist of 2 × 1600 integer values (pixel number and intensity),
which corresponds to a size of 2×1600×2 bytes/integer = 6400 bytes per spectrum.
State-of-the-art CCD cameras such as EMCCDs can acquire single spectra in
extremely short times (i.e., 0.76 ms has already been demonstrated [1]). This cor-
responds to more than 1300 spectra/s. The software must therefore be capable of
handling this data stream of 1300 spectra/s × 6400 bytes/spectrum ≈ 7.9 MB/s.
The challenge for the data acquisition software is that most modern cameras are
connected to the PC via USB and thus the data stream needs to be handled by the
4 Software and Data Analysis 63

processor which at the same time needs to be able to manage all other tasks for the
control of the microscope. Additionally, the data stream must be handled in a way
that allows data acquisition without interruptions. If the software needs pauses after
a certain number of acquired spectra to process and/or store the data, the advantage
of the fast data acquisition is lost.
If the entire CCD chip of the camera must be readout (i.e., due to the necessity
to acquire a reference, calibration spectrum with each spectrum recorded) the data
stream would multiply by the vertical size of the chip (200 in the example above)
thus lowering the maximum acquisition speed significantly.
An additional challenge for the software is memory space required by the spectra
recorded. If, for example, a Raman image of 512 × 512 spectra is recorded, this
results in a data amount of

512 × 512 spectra × 6400 byte/spectrum = 1.56 GB

This is only the amount for one layer. Note also that once data processing starts, the
data are typically transformed from integer to double precision to increase the accu-
racy of the calculations. This almost triples the storage space needed per spectrum.
Luckily, however, such a high resolution is not always necessary and due to the time
taken per Raman spectrum is often impractical. Therefore a resolution in the range
of 150 × 150 pixels is generally sufficient.
Adding to the issue of memory space is that many analysis methods such as
multivariate analysis require a substantial additional amount. Programmers must
therefore balance computation time against available memory.

4.2.1.2 Control of the Microscope and Sample Positioning


In addition to the acquisition of the spectra discussed in the previous section, the
software also needs to control all other parameters of the microscope. These are,
but are not limited to, the control of the positioning of the grating within the spec-
trometer as well as the control over the position at which the spectra are recorded.
The latter is done either by moving the sample through the focused laser spot or by
moving the laser spot itself. In order to achieve high-resolution images it is neces-
sary to perform this positioning with a high level of accuracy from both the control
electronics as well as the controlling software. The software should also allow the
acquisition of white light images at the position where measurements are taken. For
high-resolution objectives, the field of view is sometimes small compared to the
desired measurement area. In these cases it is advantageous if the software allows
automated stitching of white light image in order to obtain an overview image.

4.2.2 Correlation of Spatial and Spectral Data

Apart from the above-discussed data acquisition, the software needs to establish
correlations between and among the data acquired. For example, the software should
64 T. Dieing and W. Ibach

be capable of indicating where the spectra were recorded on the bitmap acquired
from the white light image. Also, after the generation of a confocal Raman image
though, i.e., an integral filter over a certain spectral region, the software needs to
allow the display of the spectra at each position of the image by a simple mouse
click to facilitate the analysis. Additionally, if spectra were taken with different
gratings (and thus different spectral resolutions), the software needs to be able to
correlate these spectra with each other. Figure 4.1 displays these correlations using
the example of several measurements on a polymer blend sample (PS-PMMA on
glass).

Fig. 4.1 Spectral and spatial correlation within the data acquisition software. The spectra recorded
(a, c, e) are linked to the position where they are recorded as indicated by the yellow and blue
crosses. The spatial correlation between the video image (b) and the confocal Raman image (d) is
indicated by the red box. Additionally, the spectral axes are correlated for spectra recorded with
different gratings as indicated by the green bar for a spectrum recorded using a 600 g/mm grating
(a) and an 1800 g/mm grating (c). (e) One of the spectra recorded for the confocal Raman image
(d) and its position correlation (blue cross) is shown
4 Software and Data Analysis 65

4.3 Description of the Data Sets Acquired in Confocal


Raman Microscopy

Data acquired in confocal Raman measurements are generally five or even six
dimensional. The dimensions are
• The spatial X, Y, and Z coordinates of the point where the spectrum was recorded
(typically given in μm).
• The spectral position given as the wave number (cm−1 ), relative wave number
(rel. cm−1 ), or wavelength (typically given in nm).
• the intensity recorded at this spatial and spectral position (typically given in CCD
counts or counts per second [cps]).
• Time may also be present as a sixth dimension.
Such a data set is sometimes referred to as a hyperspectral data set.
Individual spectra can of course be displayed in a straightforward way (intensity
vs. spectral position) with the coordinates (and time if applicable) added in writing.
Displaying data sets containing more than one spectrum, however, becomes more
complicated. Line scans (spectra collected along a single line) as well as time series
(spectra recorded at the same position as a function of time) are sometimes displayed
in a so-called waterfall display as shown in Fig. 4.2.
For confocal Raman image scans, an entire Raman spectrum is collected at every
image point. A confocal Raman image scan consisting of 256×256 points will
therefore contain 256 × 256 = 65, 536 individual spectra. One may distinguish
between a single image scan, in which the spectra are recorded in one layer in
three-dimensional space, and a multi-layered or stack scan, in which several parallel
layers offset by a specific distance are recorded.
In either case, the information contained within each spectrum needs to be
reduced to a single value, which will then determine the coloring of the pixel at
this position (see also Sect. 4.5).

Fig. 4.2 Display of a line scan recorded along the line represented in red in the video image (b) in
the form of a waterfall plot (a)
66 T. Dieing and W. Ibach

In the case of an image stack one can then display each layer of the stack
individually or combine them with software in order to display the distribution
in three dimensions. Some examples of this can be found in Chap. 12 by Thomas
Wermelinger and Ralph Spolenak.

4.4 Pre-processing of Raman Spectra

Pre-processing of Raman spectra refers to the treatment of the Raman spectra before
the generation of images or final presentation of the spectra. The steps described
below are universal to spectra recorded and should generally be followed before
any further processing.

4.4.1 Cosmic Ray Removal

Cosmic rays are high-energy particles from outer space which interact with atoms
and molecules in the Earth’s atmosphere. Due to their high energy, a large number
(often called a shower) of particles are generated upon this impact which are mainly
charged mesons. These quickly decay into muons. Due to their relativistic speeds
(and thus the time dilation) some of these muons reach the surface of the Earth.
Despite this exact reaction path, the term “cosmic ray” is also used (even if not
100% correct) for the muons interacting with devices on the Earth’s surface and for
simplicity, this term will be used in the following as well.

Fig. 4.3 Cosmic ray removal. The red spectrum was recorded with a short integration time and
shows two cosmic rays near 3000 cm−1 . The blue spectrum is the same as the red, but after having
undergone cosmic ray removal, and the black spectrum is the spectrum of this component (PMMA)
recorded with a longer integration time for a better signal to noise ratio
4 Software and Data Analysis 67

If such a cosmic ray hits a CCD detector it will generate a false signal in the
shape of a very sharp peak in the spectrum that is not related to the Raman signal.
An example can be seen in red in Fig. 4.3.
Cosmic rays can be filtered out as shown in Fig. 4.3 and described below.
There is, however, also the possibility to minimize the amount of cosmic rays
recorded through the readout method of the CCD camera. As already described
in Sect. 4.2.1.1, one method is the full vertical binning mode, which is the fastest
readout method. In this case, all pixels are used even though typically only a few
percent of the pixels are exposed to the Raman signal. If one limits the readout to
the few lines in the detector at which the Raman photons hit the CCD camera, one
typically excludes more than 90% of the pixels from the readout. This will of course
reduce the probability of recording a cosmic ray. The disadvantages of this method
is that it is a slower readout method and that during the readout, the light hitting
the camera cannot be recorded. This method is therefore recommended for single
spectra, whereas for Raman imaging it is not very suitable.
Once the spectra are recorded, various mathematical methods can be used to
filter the cosmic rays from the spectra. In these, two principal approaches can be
distinguished. These will be discussed in the following.

4.4.1.1 Spectral Cosmic Ray Removal


Filtering spectra in the spectral domain for cosmic rays has the advantage that this
method works for true, one-shot exposures as well as for multiple accumulations,
time series, or Raman image scans.
The principle of this method is that each pixel is compared to its adjacent pixels
and if it exceeds a certain threshold, then it is identified as a cosmic ray. This method
is heavily dependent on the size of the filter (= the number of adjacent pixels taken
into account) and the threshold (or multiplicative factor). Therefore, any spectro-
scopic software should allow the adjustment of these values as well as the preview
of the resulting spectrum to allow the user to select the correct parameters.
While this method can be problematic when using very sharp atomic emission
lines, it works very well for Raman spectra in which (with only few exceptions)
the natural line widths (FWHM) are typically >3 cm−1 . Additionally, Raman lines
typically display a relative broad base of the peaks similar to Lorentzian curves.
The Raman lines therefore have a certain rise, which is beneficial for this type
of detection algorithm. It of course also depends on the spectral resolution of the
spectrometer used.
Other methods for cosmic ray removal involving further computational methods
are reviewed in the literature (i.e., [2]), but discussing these would exceed the scope
of this chapter.

4.4.1.2 Temporal Cosmic Ray Removal


Removal of cosmic rays based on variations over time is also a very popular method.
In this case spectra recorded one after another are compared and each pixel is
compared to its variation in the time domain.
68 T. Dieing and W. Ibach

This method works well when evaluating single spectra with various acquisi-
tions on the same position. It requires that there are only negligible changes from
spectrum to spectrum. If the sample changes its spectral signature in a rapid way (for
example, due to a chemical process taking place), the usage of this type of algorithm
is problematic.
For confocal Raman imaging data sets, this method can also be applied. The user
must be aware that in this case the spectra are recorded not only at different times
but also at different spatial positions. In this case it additionally depends on the
compositional variation of the sample compared to the resolution of the scan. If the
changes from spectrum to spectrum are too dramatic, one faces again the problem
that the algorithm might filter out real peaks.

4.4.2 Smoothing

Smoothing is a common practice used to reduce the noise potentially associated


with a recorded spectrum.
Most smoothing algorithms rely on the fact that spectral data (in our case Raman)
are assumed to vary somewhat gradually when going from spectral data point to data
point, whereas noise associated with the spectrum typically changes very quickly.
In this case, it can be useful to replace each value by a value calculated from the
surrounding values in order to reduce the noise. Therefore, most filters for this
purpose can be considered as low pass filters. Independent of the filter used, care
must be taken in order to avoid alteration of the true Raman signal. Overly exten-
sive smoothing, for example, will result in a “smearing” of the Raman peaks, thus
altering their height and/or width. Additionally, small shoulder peaks might be lost.
The various filters which can be applied to the data essentially differ most sig-
nificantly in the way that the replacement value is calculated. Most filters will allow
the definition of how many values are taken into account around the value to be
replaced. The software should allow the user a preview of the results in order to
facilitate the choice of parameters.
The following filters are a typical selection of those used to smooth spectral
data:

1. Moving average This filter is arguably the simplest filter for smoothing. For this
filter a definable number of values to the left and right of the current value are
averaged and the current value is replaced. Then this “window” moves to the
next value and so on. For very slow changing signals (as might be the case in
photoluminescence [PL]) this filter can be suitable.
2. Weighted average This filter differs from the Moving Average in that it does not
take each value with the same weight, but multiplies each one with a binomial
weighting factor or a Gaussian distribution. Table 4.1 shows the distribution of
the binomial coefficients for the average calculation for various filter sizes. This
filter ensures that the resulting value is closer to the real value as compared to
the Moving Average filter even if the signal is changing more rapidly.
4 Software and Data Analysis 69

Table 4.1 Matrices for the graph average filter


Filter size Range Filter coefficients

4 (1, 2, 1)
1
1 3
16 (1, 4, 6, 4, 1)
1
2 5
64 (1, 6, 15, 20, 15, 6, 1)
1
3 7
256 (1, 8, 28, 56, 70, 56, 28, 8, 1)
1
4 9

3. Median The median filter is generally less influenced by single data points that
fall out of range of the “normal” signal. For example, if a cosmic ray is within the
search window, the median filter will be less influenced by this than an average
filter. This filter is a good choice for removing spikes in a line graph without
heavy rounding of the edges of Raman peaks.
4. Savitzky – Golay The Savitzky – Golay filter (sometimes also known as DISPO
(Digital Smoothing Polynomial) filter) essentially uses the surrounding values in
a weighted way and fits a polynomial through these points in order to determine
the “fitted” value at the current position. While the smoothing of this filter will
not be as strong as, for example, a Moving Average filter, it will smooth the
data considerably while largely maintaining the curve shape (peak width, peak
intensities,...). A detailed discussion of the functionality as well as examples
of this filter can be found in [3]. This filter has the additional advantage of
allowing the calculation of the derivative of the spectrum, which can be use-
ful for peak location (see black vertical lines in Fig. 4.4). Figure 4.4 shows
an example of the usage of the Savitzky – Golay filter. The use of this fil-
ter is especially recommended if the widths of the peaks in the spectrum are
comparable.
5. Wavelet transformation techniques Wavelet transformation is a mathematical
technique somewhat similar to Fourier transformation but with the advantage
that both time and frequency information are maintained. Wavelets consist essen-
tially of a family of basic functions which can be used to model the signal.
Each level of the wavelet decomposition will result in an approximation and
a detail result. The approximation result is then used as the basis for the next
decomposition and this is repeated until a defined threshold. By using the correct
combination of the detail results (one available per decomposition level) and the
approximation result (only the last one is typically used here) one can perform
the reconstruction (or inverse discrete wavelet transformation [IDWT]) to obtain
a spectrum with either a strong noise reduction, a removed background or both.
More detailed descriptions as well as some illustrative examples can be found in
[4, 5].
6. Maximum entropy filter The maximum entropy method uses the fact that certain
aspects of the instrument functionality are known. Through this, neighboring
pixels in a spectrum are not statistically independent anymore and a filtering of
these values is therefore possible.
70 T. Dieing and W. Ibach

Fig. 4.4 Effect of the Savitzky – Golay filter. The black points in the top spectrum correspond to
the data points recorded from the CCD camera (background subtracted) and the red line shows the
spectrum after smoothing using the Savitzky – Golay filter. The blue curve in the bottom is the
derivative of the spectrum obtained through the Savitzky – Golay filter

4.4.2.1 Smoothing in Raman Imaging


The filters discussed above can be applied to single spectra as well as to entire
spectral data sets. In the case of Raman imaging, the experimenter can also take
the spatial correlation between various spectra into account. In other words: for
pure spectral filtering only the spectrally neighboring pixels are taken into account,
whereas for Raman imaging, the spatially neighboring pixels may also be used. This
is especially true for the case of the Median and Average filters.
Images generated from Raman data (see Sect. 4.5) can also be smoothed through
filters and here Median and Average filters (in two dimensions) are the most promi-
nent, but Fourier space filtering can also be employed.

4.4.3 Background Substraction and Subtraction


of Reference Spectra
Any Raman spectra read from a CCD camera show some background signal. The
sources of the background can be divided into two main categories, and the recom-
mended removal of this background may differ somewhat. These two categories as
well as the respective background removal are described below.

4.4.3.1 Background Originating from the CCD Camera


A CCD camera generally adds a DC voltage to the detected signal in order to ensure
that the A/D converter, which converts the analog charge into a digital signal, will
not receive a negative voltage due to noise. This results in a constant background of
typically a few hundred to a few thousand CCD counts.
Most cameras additionally have a slight non-linear background due to inhomo-
geneities in the chip itself or a non-homogenous cooling of the CCD chip.
4 Software and Data Analysis 71

Removal of a Background Originating from the CCD Camera


The constant background originating from the CCD camera can be subtracted quite
easily by subtracting the constant value added by the camera. One can, for example,
find this value by averaging the part where the Raman edge filter is still blocking
the signal but no Rayleigh peak is present.
The non-linear background of the CCD camera is best eliminated by the sub-
traction of a reference or a “dark” spectrum. A dark spectrum is ideally recorded
using exactly the same integration parameters as were used for the recoding of the
Raman spectrum. To minimize the noise of the “dark” spectrum, it is recommended
to record it with the same integration time as the Raman spectrum, but to use many
accumulations using this integration time. The software then needs to offer a simple
functionality to calculate [Raman spectr um] − [dar k spectr um]. However, the
raw data should still be available to the user.

4.4.3.2 True Signal Background


There are very few “real-world” materials that will show a complete zero back-
ground in their Raman spectrum. Good confocality of the system, however, will
largely reduce this background as described in the earlier chapters. Nevertheless, a
remaining background such as fluorescence or some signal from the substrate often
underlies the Raman spectrum.

Removal of Signal Background


There are several ways to subtract the background from the recorded spectrum.
Some of the most prominent are

Single-pass polynomial background subtraction


For this type of background subtraction, a polynomial needs to be fitted to the
spectrum and subtracted. Care must be taken to ensure, however, that the regions
of the spectrum containing Raman peaks are not included in the values used to
calculate the fit of the polynomial. This selection can either be done automatically
through a peak recognition routine or manually through defining the regions in
the spectrum where no peaks are present. The order of the polynomial is then
the next variable which needs to be adjusted. Using a very high (i.e., ninth order)
polynomial often fits the data well at first glace, but will often introduce “wave”
-like artifact oscillations around the 0 value of the background. Using an order
that is too low, on the other hand, might not allow the polynomial to follow the
spectrum closely enough.
Moving average background subtraction
The calculation of the moving average was already described in Sect. 4.4.2. Using
this method for background subtraction essentially calculates the moving average
of definable regions, which should be the ones without Raman peaks. For the
regions in which there are peaks, an interpolation between the points is calculated.
72 T. Dieing and W. Ibach

This resulting curve is then subtracted. Using this method, great care must be
taken to ensure that no relevant information of the spectrum is altered.
Wavelet transformation techniques
The principles of wavelet transformation techniques have already been described
in Sect. 4.4.2. Through the appropriate combination of the detail results (one
available per decomposition level) and the approximation result (only the last one
is typically used here) one can subtract the background. In this the approximation
result is generally omitted from the reconstruction.

4.5 Image Generation


Confocal Raman imaging data sets (as described in Sect. 4.3) are typically five to
six dimensional and thus the dimensionality needs to be reduced in order to display
the information. As stated earlier, the dimensions are X,Y,Z, wave number, intensity,
and time.
Considering a typical Raman image, which is a single scan along one plane, the
dimensionality reduces to the two directions of the plane (let us assume X and Y for
simplicity), the wave number and the intensity. While this is still a four-dimensional
data set, we can in a quite straightforward way display the information from this.
For the image generation two main methods may be distinguished: univariate
and multivariate analysis. The primary difference is that for a univariate analysis
one spectrum at a time is evaluated and used for the generation of one data point per
resulting image. Here the surrounding points also are considered for the value of the
current pixel. In contrast to this all spectra in the data set play a role for each data
point in the case of the multivariate data analysis. Typical univariate and multivariate
methods will be described in the following sections.

4.5.1 Univariate Image Generation

As stated above, in univariate data analysis, each spectrum determines one value of
the corresponding pixel in the image or the images. The value of these pixels can be
determined by simple filters or by fitting procedures.

4.5.1.1 Simple Filters


Simple filters typically evaluate a certain part of the spectrum. Figure 4.5 shows an
example of an integrated intensity (sum) filter evaluating the integrated intensity of
various specific peaks found in the image scan of an oil – water – alkane immersion.
The resulting images (Fig. 4.5c–e) show in more brightly colored areas the high
intensities of these peaks and in darker colored areas the low intensities. Other types
of filters can evaluate, for example, the peak width or the peak position and display
this as an image.
4 Software and Data Analysis 73

Fig. 4.5 Usage of an integrated intensity filter with an oil – water – alkane immersion. The spectra
(a) are integrated in three different spectral areas. The water peak (blue) is evaluated without back-
ground subtraction and results in image (c). The oil peak is integrated as shown in green with the
pixels adjacent to the higher wave number side of the peak used as the background level and results
in image (d). The alkane is using pixels to the left and right of the integration area for background
calculation (red) and results in image (e). Image (b) shows the combined image of (c), (d), and (e)

It should be noted that many of the filters used allow the extraction of a large
amount of information from the data. However, there is also the danger of misinter-
pretation. The list below shows some typical simple filters and their usage as well
as considerations which should be taken into account to avoid misinterpretations.

• Integrated intensity filter (sum filter)


Information content:
A typical measure of the amount and scattering strength of a certain material
within the focus.
Misinterpretation dangers:

– Other materials present in the sample might also have peaks at this position.
– The amount of material between the objective and the focal point might change
and this would also have an influence on the absolute intensity of the peak.
– The polarization direction of the laser relative to the structure can also have an
influence on this peak intensity.
– If the software does not provide good background subtraction methods, then
changes in the background can influence the result.
74 T. Dieing and W. Ibach

• Peak width (i.e., FWHM)


Information content:
A measure of the crystallinity and the structural orientation relative to the
polarization direction of the laser. Also, inhomogeneous peak broadening can
occur in samples under stress for example.
Misinterpretation dangers:

– Non-resolved shoulder peaks can influence this result.


– If the software does not provide good background subtraction methods, then
changes in the background can influence the result.

• Peak position (i.e., center of mass position)


Information content:
A measure for the strain within a material and for the general chemical neigh-
borhood of the molecule
Misinterpretation dangers:

– Non-resolved shoulder peaks can influence this result


– If the software does not provide good background subtraction methods, then
changes in the background can influence the result.

Simple filters have the advantage of being relatively low in processor load and
thus they can be applied during ongoing data acquisition.

4.5.1.2 Fitting Filters


Fitting filters are a more sophisticated way of extracting the information required
from the spectra. In this case the peaks of the spectra are fitted with polymeric or
peak functions. Depending on the state (solid, liquid, or gas) and the environment
of the molecules observed by Raman spectroscopy, the line shape of the emitted
Raman line can lie anywhere between a Gaussian and a Lorentzian curve.
Voigt curves represent a convolution between Lorentzian and Gaussian curves
and pseudo-Voigt curves are calculated by

m u × Gauss + (1 − m u ) × Lorentzian

Here m u is the profile shape factor. Pseudo-Voigt curves can be additionally differ-
entiated in the degree of freedom of the FWHM (identical for Gauss and Lorenz or
variable).
In addition to the above-discussed mixture of Lorentzian and Gaussian line pro-
files, the Raman curves may also be distorted by the local sample environment.
Instrument functions will play an additional role. The change of the signal due to
these instrument functions is heavily dependent on the microscope and spectrometer
design [6] and it will depend on the instrument if this needs to be taken into account
for peak fitting.
4 Software and Data Analysis 75

Table 4.2 Fitting filters for spectral data


Filter Information content
Linear fit – Slope
– Offset
Quadratric fit – Peak position
– Curvature
– Peak intensity
Gaussian fit – Peak position
– Width
– Integrated intensity
– Offset
Lorentzian fit – Peak position
– Width
– Integrated intensity
– Offset
Pseudo-Voigt fit – Peak position
– Width (Gaussian and Lorentzian)
– Integrated intensity
– Offset
– Profile shape factor

For these reasons, there is not one perfect peak function which can be used to
perfectly match all Raman peaks. Table 4.2 shows some common fitting filters and
which information can be obtained through them.
As can be seen from Table 4.2, all of the filters listed deliver more than one
result per curve fitted. Therefore, the results after applying such filters to a confocal
Raman image data set are multiple images as can be seen in Fig. 4.6. In addition
to the results described in Table 4.2, the software should provide an error image in
order to allow the user to quickly determine if the fitting error in a certain region
is larger due to, for example, a line distortion. In Fig. 4.6 a confocal Raman image
was recorded of a Vickers indent in Si and the resulting spectra were fitted using a
Lorentzian function. The peak shift as well as the broadening can clearly be seen
from the spectra (Fig. 4.6a) as well as from the images (Fig. 4.6b and c).

4.5.2 Multivariate Image Generation

As mentioned in a previous section, multivariate image generation uses the infor-


mation of the entire hyperspectral data set for the determination of the value (and
thus the color) of each image pixel. In the following the two most commonly used
multivariate data analysis methods for the analysis of Raman spectra are presented.
First the so-called Principal Component Analysis is discussed. This method is used
not only for image generation but also for data reduction and distinction of sample
properties based on the principal components.
The second method described is the Cluster Analysis which can be used for direct
image generation or average calculation for further data analysis (see, for example,
Sect. 4.7).
76 T. Dieing and W. Ibach

Fig. 4.6 Lorentzian fitting of first-order Si peaks around a Vickers indent. The spectra (a) are
extracted from the points indicated by the corresponding colors in the images. (b) Shows the
position of the first-order Si peaks and (c) shows the width of the line. Both are results from a
Lorentzian fit

Vertex component analysis (VCA) is another multivariate method which finds


the most dissimilar spectra in the data set as described in Chap. 7 by Christian
Matthäus et al.

4.5.2.1 Principal Component Analysis


Principal component analysis (PCA) was first proposed in 1901 by Karl Pearson
[7]. However, only the use of computers allowed its usage as we know it today
due to the large computational effort required by the method. PCA is the under-
lying method for many other multivariate methods since it is very effective for
data reduction. In this way a spectrum consisting of 1600 pixels and thus 1600
dimensions can often be reduced to approximately 4–6 dimensions. In some cases,
this is then also the number of different components in the sample, since all the
other spectra are a linear combination of the pure component spectra (see also
Sect. 4.7).
4 Software and Data Analysis 77

In the following this reduction is explained in a greatly simplified way. The user
is referred to, for example, [8, 9] for further reading and detailed explanation of the
method.

Principal Function of the PCA


If we assume a single spectrum consisting of 1600 pixels, then this spectrum can be
described as a single point in 1600-dimensional space. Each of the CCD pixels is one
axis and the value recorded at this CCD pixel is the value along this axis. If we now
look at a confocal Raman image, then we might have, for example, 150 × 150 (=
22,500) spectra and we will thus have 22,500 points in our 1600-dimensional space.
Now let us consider a Raman peak which is present in some of the spectra. As
most Raman peaks are significantly wider than a single pixel (at least using the
commonly used spectrometers and gratings) the pixels describing this peak are not
statistically independent. Furthermore, most Raman spectra consist of more than
one peak. Therefore the pixels which will increase if such a Raman spectrum is
recorded are linked. In our 1600 dimensional space, this corresponds to a certain
direction. Analyzing the entire data set will result in a variety of such directions.
These directions describe a subspace in the 1600- dimensional space wherein all
spectra can be found. These directions are orthogonal to each other, but cannot be
identified as pure components, since, for example, they might point in the negative
direction.
Each of those directions has a certain Eigenvalue associated with it and the value
depends on the amount of variation within the data set explained by this direction.
There is a strong spread in their value and some might be hundreds of times the
value of others. Sorting them by value allows the quick determination of the rele-
vant directions (= the principal components). If the sample consists, for example,
of three components, then three principal components should describe the data set
sufficiently, while the other directions should then only describe noise.

Noise and PCA


PCA can only work if the data set is not dominated by noise. Otherwise, even
the principal components with the highest Eigenvalues will only describe noise.
Figure 4.7 illustrates this for a simple two-dimensional example. Figure 4.7(a)
shows a data set with two principal components PC1 and PC2 which perfectly
describe the data set.
From the noise-dominated image in Fig. 4.7(b) it is obvious that no such principal
components (or directions) can be defined, because for the noise-dominated data set
all directions are equally good or bad.

Results from a PCA


Following the PCA, one can extract a variety of results. The most commonly used
for the analysis of confocal Raman data are
78 T. Dieing and W. Ibach

Fig. 4.7 Principal components of a typical two-dimensional data set (a) Noise-dominated data set
in which principal components can no longer be found (b)

Reduced data set


Each of the 22,500 spectra used for the PCA is now described through principal
components. As mentioned above a relatively small number of components is
sufficient to describe the spectrum clearly. We can therefore export these “spectra
of components” which now consist of only a few pixels instead of 1600 at each
image pixel and use this for further investigation such as cluster analysis, which
will then be much faster.
Images of the principal components
Each of the spectra used for the analysis can be described by a weighted com-
bination of the principal components. If we now consider only one component,
then this will result in one value per image pixel and we can of course display
this, which then corresponds to the Raman image describing the abundance of a
certain spectrum (direction) within the data set.
Reconstructed spectra from the principal components
If we calculate the spectra using, for example, only five components, then we will
find spectra with much lower noise. Due to the fact that the noise is statistical and
will not be described by one of the first approximately 20 directions, the noise
will be reduced.
Please note here that this is only true if the spectra are recorded with a sufficient
signal to noise ratio (S/N ). For a S/N close to 1, no PCA is possible. In this case
the noise would determine the “directions” of the principal components.
Cross correlation plots of various components
If we consider our spectra again and plot a point for each spectrum at a corre-
sponding position in a coordinate system of principal component A vs. principal
component B (or C, D, etc.) we can visualize the correlation between the com-
ponents. We might have clearly separated point clusters, or we might have them
aligned along a diagonal, for example.

4.5.2.2 Cluster Analysis


Cluster analysis applied to confocal Raman images is essentially the sorting of
the tens of thousands of spectra in a data set according to their similarities. As a
4 Software and Data Analysis 79

result one gets a certain number of areas or masks which indicate where the spectra
belonging to the various clusters were acquired as well as the average spectra of
each cluster. Other applications also include the identification of bacteria strands
and even their position in their life cycle or the identification of pathogenic cells.
Cluster analysis has the advantage of being an automated and objective method
to find similar regions in spectral data sets. It can, however, require significant pro-
cessing power and time.
There are various ways of clustering the data and each has its advantages and
disadvantages. In the following, some clustering principles are briefly introduced
before two typical clustering methods as well as one variation are described. For
detailed descriptions of cluster analysis, the reader is referred to the literature, for
example, [9].

Distance Calculation in Cluster Analysis


As already introduced in Sect. 4.5.2.1, each spectrum can be seen as a single point in
1600 dimensional space (if the spectra contain 1600 pixels). For a confocal Raman
image, this might be tens of thousands of points and the cluster analysis tries to
group (or cluster) these points. The clustering is mainly dependent on the distance
between the points. Spectra which differ significantly from each other will be fur-
ther apart in the 1600-dimensional space than spectra that are similar or even close
to identical. The distance therefore determines which spectrum belongs to which
cluster.
There are various ways of defining the distance between the spectra. The
Euclidean and the Manhattan distance calculations are but two examples of methods
which work well for Raman imaging.
Care must be taken if some of the spectra in the data set contain a high fluo-
rescence background, because depending on the distance calculation method, the
fluorescence might have a quite strong influence on the clustering. The background
of the spectra should in general be subtracted prior to cluster analysis.

Hierarchical Cluster Analysis


As indicated by the name, hierarchical cluster analysis creates a hierarchy of clus-
ters. This hierarchy is often represented by a tree. In this analog, the trunk repre-
sents the main cluster containing all spectra. Following this, the clusters are split up
into sub-clusters of various sizes (the branches) and these into further sub-clusters
(smaller branches, then twigs, smaller twigs, etc.). The leaves are then the individual
spectra in the data set.
In order to calculate the clusters one either starts at the base cluster and splits this
up (divisive clustering) or with the individual spectra and merges them together
(agglomerative clustering). Now the distance between the clusters must also be
taken into account and there are again a variety of methods to calculate this distance
such as the maximum distance between elements of each cluster, the minimum dis-
tance between elements of each cluster or the mean distance between elements of
80 T. Dieing and W. Ibach

each cluster. Once the cluster tree is calculated the height, or extraction level, must
be defined and from this the masks and average spectra can be extracted.
While this method is almost completely unsupervised (with the exception of the
extraction level) it requires a huge amount of processing power and/or time.

K-Means Cluster Analysis


The K -means cluster analysis is often assigned to the partial clustering meth-
ods. It differs from the hierarchical cluster analysis in that the number of clus-
ters must be selected prior to the clustering. However, the method can also then
be applied to the sub-clusters in order to sort them further into sub-clusters and
thus generate a pseudo-hierarchical cluster tree as shown in Fig. 4.8. Here an
oil – water – alkane immersion is first clustered into three clusters before the three
clusters were again split up into sub-clusters. In this example one can clearly see
the mixed phases which are partly due to edge effects and positions at which the
laser spot hit the boundary between two phases. Note that water does not exist

Fig. 4.8 K -means cluster analysis of an oil – water – alkane immersion. (a) Cluster tree with
the root cluster on the left, the first level of clusters in the middle, and a further division into
sub-clusters on the right. The sub-clusters each show a mixed phase marked in black. (b) Alkane
spectrum (red) and the mixed phase (black) as extracted from the top two clusters. (c) Mixed alkane
and water spectrum (blue) and the mixed phase (black) as extracted from the middle two clusters.
The water phase does not exist as a pure phase in this sample. (d) Oil spectrum (green) and the
mixed phase (black) as extracted from the bottom two clusters
4 Software and Data Analysis 81

as a pure component in this sample but is always to some degree mixed with the
alkane.
Once the number of clusters (N ) is defined for the K -means cluster analysis, the
algorithm first defines N centers in the 1600 dimensional space and assigns each
point (spectrum) the center closest to it. Then the centroid (one might also call it an
average spectrum) for each group is computed. Following this the spectra are again
sorted according to their distance to the calculated centroids and then the procedure
is repeated. The algorithm is typically stopped once the assignment of the points
(spectra) to their group ceases to change.
While this method needs somewhat more supervision than hierarchical clustering
and is heavily dependent on the selection of the N initial centers, it requires much
less processing power. For some commercial confocal Raman microscopes it can
even be applied as an online evaluation tool during confocal Raman measurements
with acquisition speeds of more than 600 spectra/s.

Fuzzy Clustering
In the hierarchical and K -means cluster algorithms, each spectrum either belongs to
a cluster or does not. This is why the image outputs of these algorithms are binary
masks (one for each cluster extracted).
In fuzzy clustering, the spectra can belong “to a certain degree” to a cluster. If
a spectrum is located inside the cluster, then it belongs more to this cluster than
one on the edge of it. Image outputs of this algorithm display this variation and are
therefore not binary, but each pixel value typically has a value between 0 and 1 (or
100%).
This method instantly shows gradients in the images due to each pixel now
having a certain probability of belonging to one cluster or another. One can also
interpret this value as a measure of how well the spectrum fits to the corresponding
cluster. However, the resulting clusters cannot be clustered further as is the case for
classical K -means clustering.

4.6 Image Masks and Selective Average Calculation

In order to present the results of a confocal Raman imaging experiment, repre-


sentative or average spectra found in the sample are typically shown. Extract-
ing individual spectra from the data set is often undesirable, because these mea-
surements typically display signal to noise (S/N ) ratios which are unsuitable for
presentation.
Therefore, averaging subsets of the data set is generally employed in order to
calculate a smooth average spectrum. The definition of these subsets can be seen
as binary images (masks) with the same pixel resolution as the recorded confocal
Raman image. In these masks each pixel defines if the spectrum recorded at this
position is used for the average calculation or not. This process is sometimes
referred to as selective average calculation.
82 T. Dieing and W. Ibach

The definition of the masks can be performed in various ways:

Manual definition of the pixels


In order to define areas of interest manually, the user should be able to clearly
identify those areas. This can be done through an integral filter and the resulting
image. For example, the bright areas in Fig. 4.1 could be marked. The bright areas
indicate a high PMMA content in this example and the mask could then be used
to calculate the spectrum of PMMA with a good S/N ratio. The user should for
this purpose be able to mark any arbitrary area within the image and define it as
a mask.
While this method is very easy to implement, the major disadvantage is that it
is quite subjective and the user can easily mark areas which contain a different
spectrum. The average spectrum will then not be representative of the true single
component. Additionally, it can be very laborious, if many small domains need to
be marked.
Mask definition through calculation
Not all samples will have clearly defined areas of single components as is the
case in the example shown in Fig. 4.1. It can therefore be desirable to use images
with mathematical and logical operators in order to define the mask. One simple
example is thresholding. In this, each pixel of the image is treated with a “<” or
“>” operator in order to generate the mask. It might furthermore be desirable to
combine these operations using a calculator style interface. The software needs
to enable the user to preview the mask in order to facilitate the calculation. The
masks generated in such a way can then be used again for selective average cal-
culation.
This method facilitates significantly the marking of regions of interest in com-
parison to the manual definition described above. If various components show
relatively strong intensities, it can become necessary to mathematically link sev-
eral images in order to find a single component.
Cluster analysis
The cluster analysis typically generates binary masks as outputs in addition to
the average spectra of the clusters. If the input to the cluster analysis is the result
of a PCA, then the resulting spectra will be the averaged PCA spectra. One can
therefore in this case use the mask output with the original hyperspectral data set
to again calculate the selective average.
This method is certainly the most objective to find the masks. The only disadvan-
tage in this method is that in the case of K -means cluster analysis, small clusters
tend to be included in large ones and can be difficult to extract.

4.7 Combination of Single Spectra with Multi-spectral Data Sets


The fundamentals of image generation were described in the previous section. This
section describes the generation of images based on the fitting of so-called “basis
spectra” to the data set. This procedure produces images with much better contrast
4 Software and Data Analysis 83

than if only a small part of the spectrum is used (as is the case when using a sum
filter, for example).

4.7.1 Basis Spectra

Basis spectra can be the spectra of the pure components present in the sample. This
is the ideal case. Care must be taken, however, to record the spectra with exactly the
same settings as used when the confocal Raman image was recorded. Typically the
same integration time and many accumulations are chosen for this in order to obtain
basis spectra with a good S/N ratio. An additional point which should be taken into
consideration is whether the pure component can be present in the sample or if it
might have undergone a chemical reaction to form a new component.
Quite often the pure spectra cannot (easily) be acquired from any arbitrary sam-
ple. In this case the basis spectra should be extracted from the scan. The selective
averaging described in Sect. 4.6 is one method to perform this. Care must be taken
to ensure that the spectra are pure and not mixed themselves. If they are mixed spec-
tra, they need to be de-mixed because the fitting procedure will not work properly
otherwise, as described in the following.

4.7.2 Fitting Procedure

The fitting procedure is essentially fitting each of the spectra recorded using the
basis spectra. It tries to minimize the fitting error D described by the equation
−−−−−−−−−−−−−→ −−→ −−→ −−→ 2
D = [RecordedSpectrum] − a × B S A − b × B S B − c × B SC − · · · (4.1)


by varying the weighting factors a, b, c, ... of the basis spectra B S.
In order to improve such a fit, it is not advisable to use the entire recorded spectral
range (i.e., from −100 to 3500 cm−1 ). The Rayleigh peak and common parts in
the spectra (such as a glass substrate, for example) are best excluded from the fit.
Additionally, parts of the spectra that do not contain Raman information should not
be taken into account as they only contribute noise.
Following the fit of all the tens of thousands of spectra (using (4.1) for each
spectrum recorded), the algorithm constructs one image for each basis spectrum
showing the factors a, b, c, ... plus one image for the fitting error D.
Care must be taken in order to avoid using mixed spectra as the basis spectra.
If such spectra are used the weighting factors can become negative, or if the fit is
constrained to weighting factors greater than zero, the fit will not work properly.
Figure 4.9 shows an example of basis analysis. Here a thin layer of a PS-PMMA
polymer blend was investigated with very short integration times (4.3 ms). The indi-
vidual spectra were thus relatively noisy as can be seen from the red and blue spectra
in Fig. 4.9(c) and (e).
84 T. Dieing and W. Ibach

Fig. 4.9 Basis analysis of a PS-PMMA polymer blend. (a) Average spectra used for the fitting
procedure. (c) and (e) Original spectra (red and blue, respectively) and fitted spectra (black) for PS
(red) and PMMA (blue). The original spectra were recorded at the crosses indicated in the images
on the right with the corresponding colors. (b) and (d) Resulting image showing the distribution of
PS [b red frame] and PMMA [d blue frame] following the basis analysis. Brighter colors indicate
a higher fitting factor and thus a higher signal intensity of the basis spectrum at the corresponding
position

Using selectively averaged spectra with a good signal to noise ratio (see Fig. 4.9a)
one can fit the individual, noisy spectra using the information contained within the
entire spectral range. This results in a significant improvement of the S/N ratio and
the contrast of the resulting images (see Fig. 4.9b and d).

4.8 Combination of Various Images


Following the generation of the images from the confocal Raman data, they need to
be presented in a suitable way. If only the distribution of a single component or the
shift of a Raman line need to be presented, then this is relatively straightforward.
4 Software and Data Analysis 85

However, if multiple images such as the results of the basis analysis need to
be presented, the number of images can quickly become too large. It might be of
additional interest to see if certain components are present as pure or mixed phases.
In such cases the combination of images is a good way to illustrate the distribu-
tion of components. In Fig. 4.5(b) the distribution of oil, water, and alkane is shown
in green, blue, and red, respectively.
The color scales for each component are first adjusted, so that each component
has an individual color scale (red, green, and blue (RGB) in this case). The images
are then combined. One can combine the images by layering them and making the
upper layers transparent depending on the value of each pixel. Another method is
to combine the colored pixels in an additive way in order to illustrate the mixing of
phases better. The colors are then combined and where, in the example above, water
(blue) and alkane (red) are present, the resulting color is mixed (violet).
Note that the definition of the range of the color scale bar has a significant influ-
ence on the appearance of images and that care must be taken to choose appropriate
settings.

4.9 The Law of Numbers

In this section a very noisy example data set is evaluated to illustrate the implemen-
tation and capability of the methods described above and show that even though the
S/N ratio of the individual spectra is at first glance insufficient, the algorithms used
can produce images and spectra of high quality.
The sample investigated was a PET-PMMA polymer blend spin coated onto a
glass slide. The data were acquired in EMCCD mode and Fig. 4.10 shows an exam-
ple of three of the 22,500 spectra recorded.

Fig. 4.10 Noisy PET-PMMA Raman image. The spectra (a) on the left show the three differ-
ent components acquired at the positions marked in the image (b) with the corresponding colors
(green = glass, red = PET, blue = PMMA). The image (b) on the right displays the distribution of
PMMA in the sample as derived through a simple sum filter
86 T. Dieing and W. Ibach

The noise level in the spectra is significant and the S/N ratio is only slightly
above 1. Even with this noise level, the sum filter applied to the CH-stretching
region where PMMA shows a signal produces some image contrast as can be seen
in Fig. 4.10(b).
Following cosmic ray removal and a background subtraction, a K -means cluster
analysis was performed on the data set, resulting in three clusters. The average
spectra of these clusters as well as the color–coded cluster map are shown in
Fig. 4.11.
It can be clearly seen that the quality of the spectra as well as the spatial assign-
ment of the pixels and thus the image contrast is dramatically improved. As can be
seen from Fig. 4.11(a), all spectra still contain the glass background due to the lim-
ited depth resolution of the confocal setup and the thickness of the film (<<50 nm).
It is additionally noticeable that the glass spectrum still shows a small peak in the
CH-stretching region. This is due to edge effects of the clusters as already discussed
in Sect. 4.5.2.2.
The spectra can now be de-mixed further by correct subtraction of the spectra
from each other, which then results in the spectra shown in Fig. 4.12(c).
These spectra are used for the basis analysis and the resulting image for the
PET and the PMMA is shown in Fig. 4.12(a) and (b), respectively. The scale
bars to the right of image Fig. 4.12(a) and (b) indicate the fitting value. The
glass background image (not shown here) now shows a homogenous distribution.
Fig. 4.12(d) shows the combined image of the three components following basis
analysis.
Figure 4.13 illustrates the effect on image contrast for the PMMA phase by sim-
ply displaying Figs 4.10(b) and 4.12(c) next to each other. The enhancement in
image contrast due to the multivariate data methods and the de-mixing is clearly
visible. This is especially apparent in the region where PET is located, the contrast
is strongly enhanced.

Fig. 4.11 PET-PMMA Raman image and spectra following K -means cluster analysis. The spectra
on the left (a) show the average spectra of the three clusters (green = glass, red = PET, blue =
PMMA) and the image on the right (b) shows the combined cluster map with the pixels color-coded
as the spectra according to their cluster affiliation
4 Software and Data Analysis 87

Fig. 4.12 The results of the basis analysis for PET (a) and PMMA (b), the de-mixed spectra used
for the basis analysis (c) and the combined image of the three components (green = glass, red =
PET, blue = PMMA)

Fig. 4.13 Comparison of the image contrast before and after data evaluation. Image (a) was
obtained through the sum filter and image (b) after basis analysis.
88 T. Dieing and W. Ibach

The scale bars on the right-hand side of the images are an additional indicator
that the sum filter only uses the number of detected electrons1 in the CH-stretching
band, whereas the fitted image uses the photons in the entire spectral range, which
is reflected in the higher scale bar values in Fig. 4.13(b).
This example demonstrates conclusively, that with a sufficiently large number
of spectra (22,500 in this case), it is still possible to obtain a great amount of
spectral and spatial information from the data sets by using multivariate methods
and advanced data analysis algorithms, even though the individual spectra have a
very low signal to noise ratio. However, the signals in the spectra still need to be
sufficiently high to allow the cluster analysis to distinguish one cluster from another.

4.10 Materials and Methods

In order to illustrate the data processing of Raman spectra, a few example data sets
were utilized throughout this chapter. The samples as well as the acquisition details
will be explained in the following.
All data presented were recorded using an alpha300R confocal Raman micro-
scope from WITec GmbH, a frequency-doubled Nd:YAG laser (532 nm) and a spec-
trometer equipped with a 600 g/mm and a 1800 g/mm grating as well as a back-
illuminated CCD camera. The samples were
• A PS-PMMA polymer blend either dropped onto or spin coated onto a glass slide
• A Si[100] wafer with an indent produced using a nano-indenter

Table 4.3 Experimental details for the example data sets


Figure Sample Integration Scan Scan Grating Layer
number description time [s] size [μm] resolution [g/mm] thickness
1 PS-PMMA 0.062 30 × 30 200×200 600 < 1μm
1 PS-PMMA 10×0.512 Single spectrum 600 < 1 μm
1 PS-PMMA 10×1.512 Single spectrum 1800 < 1 μm
2 PS-PMMA 5×0.512 22.4 (Line scan) 15 600 < 1 μm
3 PS-PMMA 0.062 Single spectrum 600 < 1 μm
(extracted)
4 PS-PMMA 0.062 Single spectrum 600 < 1 μm
(extracted)
5 O-A-W 0.017 100×100 128×128 600 N.A.
6 Si indent 0.069 10×10 100×100 1800 N.A.
8 O-A-W 0.017 100×100 128×128 600 N.A.
9 PS-PMMA 0.0043 40×40 120×120 600 < 1 μm
10–13 PET-PMMA 0.050 25×25 150×150 600 < 50 nm

1 Since this measurement is performed in EMCCD mode, the signal is strongly amplified and thus
does not represent the number of photons.
4 Software and Data Analysis 89

• An oil – alkane – water (O – A – W) mix which was placed between two cover
slips
• A PET-PMMA polymer blend spin coated on a glass slide
The objective used was either a 100× air objective with an NA of 0.9 or 0.95 or
a 100× oil immersion objective with an NA of 1.25 (oil – alkane – water sample).
Further experimental details can be found in Table 4.3.

References
1. WITec GmbH. Ultrafast confocal raman imaging – application examples. http:/www.witec.
de/en/download/Raman/UltrafastRaman.pdf (2008)
2. L. Quintero, S. Hunt, M. Diem. Denoising of raman spectroscopy signals. Poster presented at
the 2007 R2C Multi Spectral Discrimination Methods Conference (2007)
3. W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery, Numerical Recipes in C, 2nd edn.,
chap. Savitzky-Golay Smoothing Filters, pp. 650–655, (The Press Syndicate of the University
of Cambridge, 1999)
4. P. Ramos, I. Ruisánchez, J. Raman Spectrosc. 36, 848 (2005)
5. G. Gaeta, C. Camerlingo, R. Ricio, G. Moro, M. Lepore, P. Indovina, Proc. SPIE 5687, 170
(2005)
6. T. Dieing, O. Hollricher, Vib. Spectrosc. 48, 22 (2008)
7. K. Pearson, Philos. Mag. 2(6), 559 (1901)
8. C. Bishop, Pattern Recognition and Machine Learning (Springer, New York, NY, 2007)
9. T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning (Springer, New York,
NY, 2009)

You might also like