100% found this document useful (1 vote)
30 views11 pages

Unidecnmr: Automatic Peak Detection For NMR Spectra in 1-4 Dimensions

UnidecNMR is a new software tool designed for automatic peak detection in NMR spectra, significantly improving the efficiency of resonance identification compared to manual methods. It utilizes a Bayesian deconvolution algorithm to analyze 1-4 dimensional NMR data, demonstrating superior performance on various experimental spectra, including those from proteins. The software offers an interactive graphical user interface for users to process raw data and obtain peak lists, thereby streamlining the workflow for both novice and experienced spectroscopists.

Uploaded by

rutwick
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
30 views11 pages

Unidecnmr: Automatic Peak Detection For NMR Spectra in 1-4 Dimensions

UnidecNMR is a new software tool designed for automatic peak detection in NMR spectra, significantly improving the efficiency of resonance identification compared to manual methods. It utilizes a Bayesian deconvolution algorithm to analyze 1-4 dimensional NMR data, demonstrating superior performance on various experimental spectra, including those from proteins. The software offers an interactive graphical user interface for users to process raw data and obtain peak lists, thereby streamlining the workflow for both novice and experienced spectroscopists.

Uploaded by

rutwick
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Article https://doi.org/10.

1038/s41467-024-54899-3

UnidecNMR: automatic peak detection for


NMR spectra in 1-4 dimensions

Received: 16 November 2022 Charles Buchanan 1,2, Gogulan Karunanithy 1, Olga Tkachenko1,
Michael Barber1, Michael T. Marty 1,3, Timothy J. Nott 4, Christina Redfield 4
&
Accepted: 25 November 2024
Andrew J. Baldwin 1,2

Check for updates To extract information from NMR experiments, users need to identify the
number of resonances in the spectrum, together with characteristic features
1234567890():,;

such as chemical shifts and intensities. In many applications, particularly those


1234567890():,;

involving biomolecules, this procedure is typically a manual and laborious


process. While many algorithms are available to tackle this problem, their
performance tends to be inferior to that of an experienced user. Here, we
introduce UnidecNMR, which identifies resonances in NMR spectra using
deconvolution. We demonstrate its favourable performance on 1 and 2D
simulated spectra, strongly overlapped 1D spectra of oligosaccharides and 2D
HSQC, 3D HNCO, 3D HNCA and 3/4D methyl-methyl NOE experimental spectra
from a range of proteins. UnidecNMR outperforms a number of freely available
algorithms and provides results comparable to those generated manually.
Introducing additional restraints, such as a 2D peak list when analysing 3 and
4D data and incorporating reflection symmetry in NOE analysis further
improves the results. UnidecNMR outputs a back-calculated spectrum and a
peak list, both of which can be easily examined using the supplied GUI. The
software allows interactive processing using nmrPipe, allowing users to go
directly from raw data to processed spectra with picked peak lists.

Nuclear magnetic resonance spectroscopy is the most widely used spectroscopists. We demonstrate the versatility of UnidecNMR
experimental technique for characterising molecules, offering atomic through its application to a wide variety of spectra including small
resolution structural and dynamical information about chemical and molecule 1D, 2D and 3D spectra of proteins, together with 3D and 4D
biochemical systems. While NMR spectroscopy is ubiquitous, the NOE spectra, much of which is low signal to noise. We recently
analysis of NMR data is largely manual1,2, which presents a substantial demonstrated the utility of UnidecNMR by implementing it into a full
bottleneck. In related fields, for example, X-ray crystallography3 and analysis pipeline for saturation transfer experiments, uSTA8.
cryogenic electron microscopy4,5, many viable automated analysis The first problem confronting an NMR spectroscopist once they
techniques have been devised, allowing them to become largely have acquired and processed their data is to determine the number of
unsupervised high throughput methods6. This is not the case for NMR resonances in their spectra, typically noting their chemical shifts and
owing in large part to the specific challenges associated with identi- intensities. Three major issues render peak detection challenging in
fying resonances in spectra7. We present here UnidecNMR, a general NMR: low signal-to-noise ratios, spectral overlap, and artefacts such as
method for identifying resonances in NMR spectra, the quality of T1 noise9. Many computational tools have been written to perform
which is comparable to the results identified by experienced NMR peak picking10–17 although a ‘standard’ has yet to emerge that can

1
Physical and Theoretical Chemistry, University of Oxford, Oxford, UK. 2Kavli Institute for Nanoscience Discovery, Oxford, UK. 3Department of Chemistry and
Biochemistry, University of Arizona, Tucson, Arizona, USA. 4Department of Biochemistry, University of Oxford, Oxford, UK.
e-mail: [email protected]

Nature Communications | (2025)16:449 1


Article https://doi.org/10.1038/s41467-024-54899-3

operate in 1-4D. While peak-picking algorithms generally have out- Unidec algorithm on a synthetic 1D dataset (Supplementary Fig. 4),
standing performances on near-perfect data, their performance on before demonstrating successful application to a synthetic 2D data-
‘real’ problems tends to degrade, particularly in cases where signal-to- set (Fig. 2).
noise ratios start to approach 1, where resonances are heavily over- When applied to experimental data, UnidecNMR is able to identify
lapped and where there are large variations in intensity due to their the individual multiplets of resonances in an extremely overlapped 1D
dynamics. Progress is largely made through manual analysis18. This NMR spectrum of a series of sugars (Figs. 1, 3). The algorithm is then
unfortunately is not an effective option for either new students or for tested against experimental data acquired on 5 uniformly labelled
13
researchers unable to devote the necessary time for training and C/15H/1H proteins of molecular weight ranging from 8.6 to 24.8 kDa
provides an argument for researchers to turn to other tools. We aim to dimer (2D 15N HSQCs and 3D HNCO and HNCA spectra, Figs. 4, 5,
address this here by providing a computational tool, UnidecNMR Supplementary Table 1), and then on a 25.4 kDa 236-residue dis-
(Universal deconvolver for NMR). This software and associated gra- ordered protein where the spectra are highly crowded (Fig. 6). The
phical user interface (GUI) allow a user to process their multi- testing dataset includes cases where all resonances are sharp and easily
dimensional FIDs easily and interactively into spectra, execute our identified, to cases where signal to noise is low, and many resonances
resonance identification algorithm and inspect the results, modifying have low signal to noise because of exchange broadening. We finally
where necessary. UnidecNMR can accelerate and simplify the workflow analyse the performance of the algorithm on data acquired from 2
of both experienced and novice practitioners. deuterated 13CH3 ILV labelled proteins (3 and 4D methyl-NOESY spec-
UnidecNMR is based on a Bayesian deconvolution algorithm that tra, Fig. 7).
was previously developed for the analysis of Mass spectrometry data19. We compare the peak-picking results from UnidecNMR to four
Deconvolution aims to separate ‘sources’ of resonances from acquired frequently encountered algorithms that can be freely downloaded and
data using a given point spread function, analogous to how supra- already have user bases; PICKY, which relies on a singular value
resolution methods in light microscopy can locate light sources to decomposition of spectra10, WaVPeak, which takes advantage of a
better precision than the diffraction limit20 (Fig. 1a). In this application, wavelet smoothing and clustering technique11, NMRNet, which
the point spread function is a peak shape function which needs to be employs a convolutional neural network based on machine learning12,
effectively removed in order to locate the underlying resonances. To and the intrinsic peak picker in Sparky21, which in 2D (and 3D when
produce UnidecNMR, we optimised the naïve core of the mass-spec used with a 2D peak list to restrict the search space) can detect local

a Actual peaks 100 b


i
Possible peaks ii
110
ω1-15N (ppm)

Picked peaks

120

130
Enhanced
11 10 9 8 7 6
120 iii iv
ω1-15N (ppm)

122

124

Raw Picked Peaks


126 Raw Reconvolved
OH O
ONa OH Reconvolved
HO 1 6
9 6 9.4 9.2 9.0 8.8 8.6 9.4 9.2 9.0 8.8 8.6
8 7 6 O2 HO
4 5 O HO4
5 O
AcHN 5 O 1 O ω2-1H (ppm) ω2-1H (ppm)
1
4 3 3 2 OH HO 3 2 AcHN
c HO HO c b a OH OH d 125
ω2-15N (ppm)

H-3ax 6
H-3b H-2ab 120
H-3eq

*
* *
115
ω3-1H (ppm)

4.0 3.5 3.0 2.5 2.0 1.5 1.0 8 110


3aβ
6aβ+4b 9c
4aβ+4c 6c+9c
9c 9c+5c 4aβ+4c 4c 5aβ7c
9c 5c 4aβ 6c
6aα 6aβ+3aα
2aα 10
3aα6aβ 6b 5b 4c 4c
8c 2b
6aβ
5aα 4aα
125 65
120 60
ω2-15N 115 55 ω -13Cα
3.8 3.7 3.6 3.5 3.4 110 50 1
65 60 55 50
(ppm) (ppm) ω1-13Cα (ppm)

Fig. 1 | UnidecNMR method and applications. a A schematic of the UnidecNMR enhanced (ii, green). c Application to a 1D spectrum of 2,3-α-sialyllactose8 showing
method: all locations above a noise threshold are initially possible peak locations a remarkable ability to discern individual assigned multiplets from a highly over-
before UnidecNMR iteratively filters them down to final locations. b UnidecNMR lapped region of the spectrum. d Application to 3D HNCA spectrum of αB-crys-
application to a 2D 15N-1HSQC spectrum of αB-crystallin (i, blue). the reconvolved tallin, showing raw (blue) and reconvolved spectra in projection, and in 1 and 2D
spectrum (iii, red) is easily compared against raw data (iv). Running UnidecNMR slices visualising how UnidecNMR simplifies resonance detection.
without the clustering results in a spectrum whose apparent sensitivity has been

Nature Communications | (2025)16:449 2


Article https://doi.org/10.1038/s41467-024-54899-3

a 80 1
c 10
2 3 UnidecNMR NMRNet

SNR= 9
9 1 2 3 100
70 8
80 4 5 6 90

S:N Ratio
SNR= 5
7

6 80
70 4 5 6

Correctly assigned spectra (/100)


80 7 8 9 5
SNR = 3

70
4
7 8 9
70 3 60
9.4 9.0 8.6 9.4 9.0 8.6 9.4 9.0 8.6
Separation 1.13 1.70 2.55 2
(/FWHM) 10 50
PICKY WaVPeak
b 123.5 9
MR 40
cN
n ide 8
U 124.5
123.5 30
et 7

S:N Ratio
M RN
N
124.5 6
20
123.5
eak
5
a vP
W 124.5 10
4
123.5
KY 3 0
PIC
124.5 2
9.4 9.0 8.6 9.4 9.0 8.6 9.4 9.0 8.6 1.00 1.50 2.00 2.50 1.00 1.50 2.00 2.50
Separation (/FWHM) Separation(/FWHM)

Fig. 2 | Testing of peak picking algorithms on synthetic data. Following opti- be ‘roughly correct’ (Supplementary Fig. 1). a Simulated data illustrate a range of
misation of the UnidecNMR algorithm on 1D data (Supplementary Fig. 4) perfor- signal-to-noise and separation values. The numbers 1–9 indicate where these
mance of a range of freely available peak pickers was assessed on simulated 2D data example data were taken from the full dataset, c. b Illustrative examples of three
(7200 spectra). Two Gaussian resonances were simulated with a function of signal spectra that provide a representative discrimination of the various algorithms
to noise (9 values ranging from 2 to 10) and separation (8 values, ranging from 0.85 tested that could be easily automated to iterate over the full dataset. Only NMRNet12
to 2.6 in units of the full-width half maximum, FWHM of the simulated resonances). and UnidecNMR were able to resolve the peak with the closest separation whereas
For each separation/SN combination, the seed for the random number that gen- only WaVPeak11, PICKY10 and UnidecNMR were able to distinguish the peaks with
erates the noise was varied allowing us to derive an ensemble average over 100 the widest separation. NMRNet overpicked one of the peaks on the third spectrum,
repeats (see extended methods). In the UnidecNMR analysis, a Gaussian peak shape which we have indicated in blue for clarity. c Overall performance on the entire
with the same width used for the simulation was used to deconvolve the data, with simulated 2D dataset. Only UnidecNMR, PICKY10 and WaVPeak11 achieve 100%
the one manually tuneable parameter, ‘fac’ set to 1.6. In the tests that follow, the accuracy at the relatively trivial high separation/high signal-to-noise limit. The blue
width of the peak shape used by UnidecNMR was purposely mis-set for the purpose contour line represents the threshold above which 100% accuracy is achieved,
of testing which demonstrated that the results are reasonably agnostic of the allowing a convenient means by which to compare the different algorithms in this
precise value used, indicating that the peak shape chosen for analysis needs only to test. UnidecNMR outperforms the other algorithms on these synthetic datasets.

maxima. UnidecNMR demonstrated superior performance to the first 1-4D NMR data that is free for academic use, that at the very least,
three of these algorithms (we could not easily automate Sparky) on a provides an excellent ‘starting point’ for both new and experienced
synthetic 2D dataset (Fig. 2). On experimental data, UnidecNMR again users to facilitate rapid and effective analysis of NMR data.
substantially outperforms the other algorithms tested, and the
resulting peak lists were either similar to those obtained by an Results
experienced spectroscopist (Figs. 4–6 Supplementary Table 1), or in Theory
the case of the NOESY spectra, superior evidenced by the larger The kernel of the algorithm was originally developed to analyse mass
number of NOEs successfully identified that are consistent with the spectrometry data19, a problem that shares many features with NMR
known structure (Fig. 7). We further demonstrate that supplying a 2D data analysis. The method relies on the assumption that a spectrum of
peak list when analysing 3 and 4D data (all 3 and 4D results in Figs. 4–7 intensities, I, can be reasonably expressed as a convolution of a peak
use this approach) and including reflection symmetry when analysing shape function, g, and an array of delta functions or sources, f, each of
NOE data can both substantially improve the results (Fig. 7). The which spans the same set of spectral frequencies, i. The algorithm aims
algorithm is tested ‘to destruction’ and errors, both false positives and to perform the deconvolution that removes the peak shape function
false negatives on real data have been individually characterised from the data, providing a user with a list of sources, f, that dictates
(Supplementary Figs. 5–7), each of which reflects ambiguous decisions both the peak positions and intensities. The kernel iterates (t) on the
on the edge of human judgement. The software is released with a GUI intensities of each spectral element:
that facilitates rapid inspection and manual adjustment of the results
(Supplementary Fig. 3). For example, false negatives can easily be t +1 t Ii
fi =fi t ð1Þ
identified within the GUI by overlaying the reconvolved and raw ðf  gÞi
spectra (Supplementary Figs. 2, S7cii). The time of the calculation
depends on the size and dimensionality of the data, but it completes in When the back-calculated spectrum, (f∗g), has the same intensity
t +1 t
5–30 s for a 2D spectrum, 30–120 s for a 3D, and several minutes for a as the data I at frequency i, then the ratio is equal to 1, f i = f i and the
4D on a 2021 MacBook Pro equipped with an M1 Pro processor and algorithm has converged19. The action of the algorithm is to suppress
16GB of RAM. The package has been tested on Windows, Mac and sources that are adjacent to the true ‘centre’ of a resonance. A user
Linux environments. Overall, our algorithm is a tool for the analysis of supplies a noise threshold and a peak shape function g (Fig. 1a). In our

Nature Communications | (2025)16:449 3


Article https://doi.org/10.1038/s41467-024-54899-3

~
AcHN
a i)
O OH Raw spectrum
HO 9 8OH HO 9 8 OH OH Reconvolved
7 6 OH 7 6 OH Deconvolved
AcHN 5 O AcHN 5 O AcHN H-3ax
4 H-3eq
HO HO 3 4 O
HO HO 3
α-anomer (5.76%) β-anomer (94.24%)

* *
H-3eq H-3ax *
* * *

4.0 H-6 3.0 2.5 2.0 1.5 1.0


H-5
ii) H-8
H-9 H-9

H-8

H-4

H-8 H-5 H-4 H-7 H-6


H-9

3.8 3.7 3.6 3.5 3.4 3.3


b i) ~
OH O ONa OH AcHN
HO 9 8 1 HO 6 6
7 6 2 4 5 O 1 HO4 5 O
AcHN 5 O O O 1
4 3 3 2 OH HO 3 2
HO HO c OH OH H-3ax
H-3b b a
H-2ab
H-3eq

** *
*
4.0 3.5 3.0 2.5 2.0 1.5 1.0
ii)
6aβ+4b 6c+9c
4aβ+4c
9c 3aβ
9c+5c 4c 9c 7c
9c 5c 4aβ 6c
6aα 6aβ+ 3aα
3aα 6b 4aβ+4c 5aβ 2aα
8c 6aβ 6b
4c 2b
6aβ 4c
5aα 4aα

3.8 3.7 3.6 3.5 3.4

Fig. 3 | Performance of UnidecNMR on 1D spectra. 1D proton spectra (blue) of not compromise performance. a Two interconverting anomeric forms, α and β, are
N-Acetylneuraminic acid (a) and 2,3-α-sialyllactose (b) together with the recon- observable. UnidecNMR functions well even when evolution due to J coupling is
volved spectrum from UnidecNMR (red) as shown previously8. The deconvolution present in the spectrum, imposed by the delays associated with using excitation
was performed with the tuneable parameter fac = 1.4 using a peak shape fitted on sculpted water suppression. b Two interconverting anomeric forms associated with
isolated resonances using the GUI. In each case, the multiplets identified by Uni- sugar a are observable (inset). Unique assignment cannot be obtained from a single
decNMR (yellow) can be mapped to known assignments obtained from standard 1D NMR spectrum, but each peak identified by UnidecNMR corresponds to
multi-dimensional methods8. The peak shape used for the analysis was obtained by assignments achieved using standard methods8, as indicated (green). No water
fitting the most intense peak using the UnidecNMR GUI. There are small variations suppression was used for this spectrum and so there are no distortions due to J
in peak shape over the spectrum (see e.g., peak 6c at 3.45ppm in b), but this does coupling present, which will yield optimal UnidecNMR performance.

implementation, the initial intensities are set to the initial intensities of Performance on synthetic data
the raw data found at each position, and then iteratively adjusted To test the algorithm, a dataset of 40,500 1D NMR spectra was simu-
according to Eq. (1) until convergence, where the final values in f lated by placing two resonances at predefined separations, and intro-
provide the central locations and intensities of the picked peaks. To ducing noise sampled from a normal distribution to obtain a
determine convergence, the changes made in intensities are assessed predefined signal-to-noise ratio (Supplementary Fig. 4a, b, e). Naïve
in each step, and when these fall below a user-specified threshold, or implementations of Eq. (1) showed promise for peak detection but
when the calculation exceeds a pre-specified number of maximum tended to ‘over-pick’ the spectrum (Supplementary Fig. 4a, c). This
iterations, the calculation stops. The selected peak shape g should be a result follows from there being no unique solution when fitting an
reasonable match for the average resonance in a spectrum, although arbitrary number of Gaussian functions to a Gaussian function22, and
the final results are reasonably tolerant to this parameter being mis-set so the final arrangement of peak locations, f, themselves, tend to
(Supplementary Fig. 1a), as expected in the case of experimental data resemble a Gaussian function. From the perspective of peak detection,
where there is a wide range of peak shapes. The algorithm can work on this amounts to a failure, as too many resonances are ‘picked’. This
data of arbitrary dimensionality, which renders it highly amenable to does, however, provide an unexpected feature of this algorithm—one
NMR analysis. can apply it ‘naïvely’ with a peak shape set to be deliberately too wide

Nature Communications | (2025)16:449 4


Article https://doi.org/10.1038/s41467-024-54899-3

A B C
t
RKY e ak MR nual t eak MR ual t eak MR ual
RNe VP PIC
KY ecN RNe VP PIC
KY ecN Man RNe VP PIC
KY ecN Man
NM SPA Wa Unid Ma NM Wa Unid NM Wa Unid
i 205
221 221 214 206 224 225 228 219 219 230 230 411 382 383 431 431

214

380
210

218
Correct
215

405
222
220

226

430
30 225

37 230
ii

139
Incorrect

18

69
15

3 6 30 24 1 0 10 37 17 0 0 27 139 56 2 0
0

0
Fig. 4 | Relative performance of UnidecNMR versus a series of alternative peak- tabulation of these results and scoring criteria is provided (Supplementary Table 1).
picking algorithms on ‘real’ NMR data acquired on 4 different proteins. Counts When running UnidecNMR in 2D, the tuneable parameter ‘fac’ was set to 1.4, and in
of correct (i) and incorrect picks (ii) are shown for a set of 4 × 2D 15N1H HSQCs (A), 3D, 1.6. In all cases, the peak shape parameters were fitted on isolated resonances
3D HNCO (B) and 3D HNCA (C) spectra as described in the text. The results were using the GUI, and when running in 3D, a 2D peak list was provided (‘boring’ mode)
scored against independently determined peak lists by a skilled user. A detailed as described in the text.

a b
i
10cα(i-1)
10cα(i)

9.2 Focus: 10N-H PICKY NMRNet WavPeak UnidecNMR


H ω (ppm)

2/3 2/3 2/3 3/3


9.25

9.30
1

8.6
9.35

45 50
13
55 60 65 ii 55 54 53 55 54 53 55 54 53 55 54 53
C ω (ppm)
9.2 Back-calculated: 10N-H
H ω (ppm)

9.10 1/2 1/2 1/2 2/2


9.15
10ca(i-1)
10ca(i)

9.20
1

8.6 9.25

iii 61 60 59 58 61 60 59 58 61 60 59 58 61 60 59 58
110 2/3 2/3 2/3 3/3
N ω (ppm)

8.90
15

120 8.95

58 57 56 58 57 56 58 57 56 58 57 56
45 13
55 65 10 1
8
C ω (ppm) H ω (ppm)

Fig. 5 | Analysis of 3D HNCA of αB-crystallin using UnidecNMR and repre- overlapped resonances can be seen. UnidecNMR could identify this while the other
sentative examples of errors obtained using the various algorithms. a After algorithms failed. (ii) In a 3D HNCA, the overlap between these two weak peaks is
processing, the slice for each peak can be selected to inspect the peak picks and to suggested by the asymmetry. Only UnidecNMR correctly identified the underlying
compare the raw data to the back-calculated spectrum. The location of the slice in resonances. (iii) WaVPeak11 and PICKY10 were unable to pick relatively weak reso-
the 3D is indicated, with respect to the overall projections. b Representative errors nances defined by relatively few data points, as exemplified by this HNCA.
from the various peak pickers10–12,21. (i) In a 3D HNCA spectrum, three highly

and obtain a new NMR spectrum with substantially enhanced apparent the algorithm against our database of simulated data allowed values of
resolution (Fig. 1bii). fac and squash to be optimised in order to produce the maximum
After some development, the final version of UnidecNMR runs in number of correct results (Supplementary Fig. 4e, f). In this test, a
stages. Equation (1) is initially executed with a peak shape function g broad range of parameter space was identified where 100% accuracy
whose FWHM is artificially increased by a factor ‘fac’. This suppresses was achieved. Within the final implementation, ‘squash’ was fixed to
the tendency of the algorithm to pick peaks remote from the true 0.725/FWHM and is determined automatically from the user-supplied
centre (Supplementary Fig. 4a, b). Second, we implemented a clus- peak width, and ‘fac’ is the only user-supplied parameter. In practical
tering algorithm that combines intensity within a predefined window implementations, this can be adjusted typically with the range 1.4–1.6,
whose width is characterised by the parameter ‘squash’. Both are with a value of 1.4 being suitable for 1D/2D data and 1.6 better suited
specified as a multiple of the FWHM of the input peak shape. Running for 3D/4D (see also ‘Standard settings for using UnidecNMR’ in

Nature Communications | (2025)16:449 5


Article https://doi.org/10.1038/s41467-024-54899-3

d d
A B icte icte
R e str MR e str MR
HNCA t Y ak NM t ea
k
y R cN l t
e ea k
y R cN l
R NeARKVPe KY EP iDec nual e
RN VP KY ark iDe nua RN VP KY ark iDe nua
107.5 NM SP Wa PIC DE Un Ma NM Wa PIC Sp Un Ma NM Wa PIC Sp Un Ma
220

201
193
182
189
192
215
215

197

173

169

193

223

226

255

248

243

274

345

346
180 160

110.0 260
190

Correct
180

DDX4N: 200 200 300


112.5
25.4 kDa IDP
236 Residues 210
220 340

Incorrect
61 78 232
115.0

232
61

23

78

41

10
7
7
2
0

0
4
0
0 0 0
117.5
C UnidecNMR WaVPeak PICKY NMRNet Sparky restricted

120.0 13
8.1 C slice:
45.5ppm

122.5

C Slice (Δ ppm)
8.3
125.0 Found
peaks:
0.35
Missed peak 0.18
127.5 0

13
8.4 8.2 8.0 7.8 112 110 108 112 110 108 112 110 108 112 110 108 112 110 108
1
H (ppm) 15
N (ppm)

Fig. 6 | Comparison of different peak-picking methods on the 236-residue dimensions, with UnidecNMR picking over 100 more resonances in the two spectra
intrinsically disordered protein DDX4N1. A An NH projection of an HNCA that were missed by the other methods. The tuneable parameter ‘fac’ was set to 1.6
acquired on a 750 MHz spectrometer, superimposed on a peak list derived from for 3D data. The specific peaks that were missed by UnidecNMR are analysed
UnidecNMR results from a high-resolution HSQC (Supplementary Fig. S6B)40. As (Supplementary Fig. 7), arising from resonances appearing in the spectrum that
expected for a large IDP, the centre of the spectrum is heavily overlapped. B An were not in the HSQC, arising most likely due to some sample degradation. C A slice
indication of the false negatives and false positives of the different methods tested from a heavily overlapped region in the HNCA illustrates the performance of Uni-
on a 2D 15N HSQC, and 3D HNCO and HNCA spectra. (i) On a high-resolution 2D 15N decNMR and the places where the other algorithms fail to spot resonances. Overall,
HSQC, similar performance was found for Sparky, WaVPeak, PICKY, NMRNet, and the performance of UnidecNMR is almost identical to those obtained by an
DEEP picker with ca. 30 false negatives and between 0 and 66 false positives. By experienced user. As these are N-H slices through a 3D spectrum, the distance of an
contrast, UnidecNMR picked exactly the same resonances as an experienced user identified peak from the current, carbon, slice is indicated with the colour of the
with two false positives (Supplementary Fig. S7A). For 2D analysis, the UnidecNMR peak, as shown in the key (bottom right).
tuneable parameter ‘fac’ was set to 1.4. (ii/ii) Similar performance was found in 3

Supplementary). These values effectively set a limit on the resolving To quantify the accuracy of the respective algorithm’s identified
power of our algorithm yet still allow asymmetric ‘shoulders’ to be peak locations, we calculated the difference between the found and
readily identified. known locations in ‘correct’ spectra (Supplementary Fig. 8). This shows
With optimised values of ‘fac’ and ‘squash’, the performance of that UnidecNMR performs better than all other algorithms at higher
UnidecNMR was effectively perfect, provided that the signal-to-noise signal-to-noise ratios and lower separations, a problem typically faced
ratio is greater than 3 and the separation of resonances is greater than in experimental data (i.e., heavy peak overlap where signal to noise is
1.13 FWHM (Supplementary Fig. 4d for 1D, Fig. 2c for 2D). This is a reasonably high). Further, we quantified the reliability of intensities
physically reasonable outcome, as these limits approximately coincide extracted by UnidecNMR and found that except at very low separa-
with limits where a user would be confident in visually identifying two tions, the error was <10% (Supplementary Fig. 9).
resonances. Below these thresholds, the success of the algorithm falls To run UnidecNMR in general, both a noise threshold and para-
away from 100%, and its success depends on the exact shape of the meters that describe the peak shape in all dimensions must be sup-
noise profile in an individual spectrum. It nevertheless remains rea- plied. For convenience, we have implemented a general pseudo-Voigt
sonably successful outside these windows and typically fails in cases function that describes a mixed Gaussian/Lorentzian function. This is
where an experienced user would also be unconfident. parametrised by Lorentzian (σL) and Gaussian (σG) FWHM values in
We next performed a similar test on 7200 simulated 2D spectra ppm, the Euclidean distance (x) from the centre of the peak in ppm and
using the optimised parameters (Fig. 2). Here, it was possible to a mole fraction n that determines the degree of Gaussian (n = 0) and
compare results from UnidecNMR to the other freely downloadable Lorentzian (n = 1) character.
algorithms. The success of UnidecNMR in 2D was very similar to its
pffiffiffiffiffiffiffiffiffiffi! σ 2 !
performance in 1D and was more effective than the other methods   x 2 2ln2 L

tested (Fig. 2c). PICKY and WaVPeak both showed excellent perfor- P i x, n, σ L , σ G = ð1  nÞ exp  +n 2
 2 ð2Þ
σG2 x + σ2L
2
mance, but require higher S/N, and larger peak separations than Uni-
decNMR to obtain a 100% success rate. Notably NMRNet performed Optimal performance is obtained when the shape characteristics
very poorly in this test (Fig. 2c), although we note this program per- either match or are slightly narrower than those observed although the
formed substantially better when tested against ‘real’ experimental algorithm is remarkably tolerant to mis-setting the peak shape (Sup-
data recorded on protein samples (Fig. 4). plementary Figs. 1a, 4f). In practical applications, the algorithm can be

Nature Communications | (2025)16:449 6


Article https://doi.org/10.1038/s41467-024-54899-3

Fig. 7 | Automatic picking of 3 and 4D methyl NOE spectra. Results from a 3D that all resonances are centred on the selected plane. The deconvolved version of
methyl NOESY spectrum from ATCase (a) and a 4D methyl NOESY spectrum of EIN the spectrum can also be shown side-by-side with the raw data. (iii) The cross peak
(b). These figures were generated from outputs within our GUI. (i) The location of signal intensity, shown as a mean and standard deviation of the two reciprocal
two selected slices is indicated with respect to the relevant projections. It is resonances, shown versus the expected C-C distance from the corresponding
desirable when analysing 4D spectra to take 2D slices that have lower resolution structure. 460 pairs of cross-peaks were identified for ATCase and 660 for EIN,
than the reference planes that include the direct dimension, as shown for EIN. (ii) overall 60% more than obtained from a manual analysis29 (294, 420 respectively).
The corresponding slices focusing on identified NOEs from a pair of resonances, The picked peaks fall within a sensible range of distances indicating that the NOEs
and a cross peak between them. The specific cross peak feature is indicated are consistent with the expected structures (iii).
(orange). Orthogonal views can be selected within the GUI allowing a user to verify

run iteratively and compared to the raw data to manually determine peak list for the HNCA (Fig. 4). As with the synthetic data, UnidecNMR
appropriate peak shape parameters. Within our GUI, it is also possible substantially outperformed the other algorithms, achieving almost
to select several intense, isolated resonances in a spectrum and run a 100% success rates, identifying all peaks and missing only 1 resonance
conventional optimiser to obtain a reasonable estimate for parameters (Fig. 4, Supplementary Table 1). This error deserves some attention.
to describe the peak shape (Supplementary Fig. 3). In fact, we would not expect this missing resonance to be identified
by a user given only this spectrum (Supplementary Fig. 5), due to
Performance against experimental data overlap, though its presence is confirmed by additional 3D experi-
Having demonstrated that our algorithm can function well against ments. NMRNet, however, did identify two resonances in this location,
synthetic data, we tested it against experimental data. Initially, we ran and so we report it as a peak missed by UnidecNMR. We note that in
the algorithm on 1D NMR spectra of sugar molecules that contain a this test, NMRNet tended to over-pick NMR spectra (Fig. 4). As our
very large number of highly overlapped resonances. The algorithm was algorithm provides a back-calculated spectrum, it is straightforward to
able to identify individual multiplets even in heavily overlapped compare its results to the raw data within our GUI (Supplemen-
regions (Fig. 3), each of which was consistent with the known assign- tary Fig. 3).
ment obtained using conventional methods. One spectrum of note was an HNCO acquired on HSP16.5
To test further, 2D 15N HSQC, and 3D HNCO and HNCA spectra (unpublished data), where the decoupler was mis-set, resulting in a
from three proteins, HSP271,23, ubiquitin24 and αB-crystallin25 (NMR triplet for each peak in the 13C dimension (Supplementary Fig. 1b). By
data unpublished) were analysed. Spectra from these proteins provide increasing the effective peak width used for calculations, we were able
a range of difficulties, ranging from Ubiquitin and HSP27, where the to account for this deficiency. By contrast, the other algorithms found
resonances are sharp and well resolved, to αB-crystallin, where there analysing this specific experiment highly challenging.
are substantial contributions from chemical exchange leading to a To increase the challenge further, we then analysed 15N HSQC,
range in peak intensity and shape25. The performance of the algorithm HNCO/HNCA assignment spectra from a 236-residue intrinsically dis-
was measured against a peak list determined manually. ordered protein DDX4N1 (Fig. 6). Disordered proteins are relatively
When assigning the backbone of a protein, 3D spectra such as the challenging targets because the range of chemical shifts spanned is
HNCO and HNCA are analysed simultaneously26. Mirroring this, a 2D H/ much narrower than for a folded protein and so overlap of resonances
N peak list was prepared using only the 15N HSQC and HNCO results is substantially higher. As before, UnidecNMR outperformed the other
and supplied to the algorithm as a restraint for the starting locations algorithms and returned results comparable to those obtained from an
for peak positions. The resulting ‘boring’ mode produced excellent entirely manual analysis. The small number of specific errors where
results. We provide this as an option in the software and recommend UnidecNMR diverged from the manual peak picks were analysed
using this when analysing 3 and 4D data. In this vein, the HNCO and (Supplementary Figs. 6, 7) were classified as ambiguous in the human
NHSQC spectra were analysed in isolation while we employed a 2D derived assignment.

Nature Communications | (2025)16:449 7


Article https://doi.org/10.1038/s41467-024-54899-3

Finally, we sought to test the algorithm against methyl-NOE data. outstanding nmrGlue package36, which allows NMR spectra to be read
Two spectra were analysed; a 3D dataset from a dimer of regulatory into Python. The deconvolution software is written in C++ which can be
chains of aspartate transcarbamoylase from E. coli ATCase, acquired in executed from the command line and so can be incorporated into
the laboratory of Prof. Lewis Kay27 and a 4D spectrum from the automated workflows outside of our own GUI. The GUI and deconvo-
N-terminal domain of E. coli Enzyme I (EIN1), acquired in the laboratory lution program have been tested in Linux, Mac and Windows envir-
of Prof. Marius Clore28. In both cases, a 2D H/C peak list was initially onments. The simulated data comprising 40,500 1D and 7200 2D
prepared from a high-resolution 2D HMQC spectrum. As the proteins simulated spectra (Supplementary Figs. 2, 4) and the experimental
are deuterated with only 13CH3 methyl groups labelled, the expected data (Supplementary Table 1) will be made freely downloadable to
distance of NOEs will be substantially longer than those spanned in enable rapid and systematic comparison of peak-picking algorithms
uniformly labelled samples, frequently extending to a C-C distance of going forward.
10 Å29. Moreover, cross-peaks are expected to be symmetric with Overall, against the experimental data tested here, we find the
minimal spin diffusion, such that if A->B is present then we also expect performance of UnidecNMR to be either comparable or, in the case of
B->A30. We provide an option in UnidecNMR to impose this require- methyl NOE data, substantially superior to the results generated by an
ment on the spectrum and to seek out only cross-peaks related by experienced spectroscopist (Pritišanac et al.29). A GUI is provided to
reflection symmetry. The resulting peak list is returned as a set of enable a user to quickly screen through the picks to both check the
correlations between two resonances in the original 2D peak list, ready results and manually amend, as required. The GUI also generates and
for use in assignment software or in structure calculations. executes nmrPipe34 and SMILE37 scripts for interactive processing of
For ATCase/EIN, 460/660 cross-peaks were picked by Uni- 2-4D NMR. This allows a user to go straight from FIDs to processed and
decNMR, an increase of 60% from those picked previously by an picked spectra within one software environment using a few ‘clicks’.
experienced user whose results were used for assignment (294/420)29. While it is possible to add the algorithm to a fully automatic pipeline,
The intensity of the resonances was plotted against the known distance we nevertheless recommend inspecting the results manually. Either
spanned in the protein to determine if the picked NOEs are consistent way, by substantially reducing the time taken to analyse 3 and 4D
with the expected structural forms (1d0931/1eza32 ATCase/EIN) spectra, UnidecNMR has promise to both accelerate the workflow of
(Fig. 6A). Both plots revealed the vast majority of the NOEs to be within spectroscopists and reduce the barriers for non-specialist laboratories
10 Å and so are consistent with the known structures (Fig. 6A) and have to undertake a biomolecular NMR analysis to address their research
a similar pattern to those picked manually29. For ATCase, several questions. The software and benchmark are free for academic use and
relatively intense resonances were identified both manually and by can be downloaded from http://UnidecNMR.chem.ox.ac.uk.
UnidecNMR that would indicate a distance >10 Å29. As noted
previously27, these indicate a fluctuation in the loop adjacent to leucine Methods
7 of ATCase present in the truncated dimer that is not present in the Simulation methods
WT structure used for analysis. In the case of methyl NOE spectra, as NMR spectra with 2 peaks were simulated in 1 (Supplementary Fig. 4)
60% more cross-peaks were identified, UnidecNMR performance or 2 dimensions (Fig. 2) using a Gaussian function with a predefined
exceeds that of an experienced user. width. The difficulty of each spectrum was controlled by two axes: the
separation of the two peaks and the signal-to-noise ratio (y-axis, 1D—
Discussion Supplementary Fig. 4c, d, f, 2D—Fig. 2c, varied between 2 and 10 in
When publishing the first ‘triple resonance’ 13C/15N/1H backbone assign- both cases).
ment experiment, the HNCO, the authors noted that “because of the low For 1D spectra, a peak location function of 400 frequency points
level of resonance overlap in the 3D spectra, much of the 3D peak picking was defined as zero everywhere except two points separated by the
can be done in a fully automated manner.”33. While this experiment given separation value (expressed as a function of FWHM, x-axis 1D—
remains widely used, the goal of full automation has yet to be realised. Supplementary Fig. 4c, d, f, 2D—Fig. 2c ranging from 0.85 to 2.6 in both
To move further towards this goal, UnidecNMR provides a pow- cases). A standard Gaussian peak shape function was defined and
erful computational tool for picking resonances in 1-4D NMR spectra. convolved with this peak location to give a noiseless spectrum.
In practice, a user supplies parameters that describe a peak shape that To define the signal to noise for each spectrum, a random noise
will be ultimately used to back-calculate the spectrum, together with a function was produced by drawing from a normal distribution. To
noise threshold. Both can be either estimated by the software or mirror what is typically done in the analysis of experimental spectra,
manually adjusted in an iterative manner by a user. The target spec- the highest intensity/standard deviation was taken from a defined
trum and the back calculation are in nmrPipe34 format. The resulting region of this signal-less function. The magnitude of the noise function
picked peaks and back-calculated spectra can be inspected either in was then adjusted before being added to the signal to produce a final
our GUI or in any other preferred visualisation software. Against both spectrum with the required signal-to-noise ratio.
synthetic and experimental data, the performance of UnidecNMR is 500 1D spectra were simulated for each of 9 signal-to-noise values
substantially improved over the alternative freely downloadable peak- and 9 inter-peak separations (Supplementary Fig. 4). The same pro-
picking algorithms tested in this article. tocol was repeated for 2D spectra, and we chose 100 spectra for each
The algorithm offers two additional modes that can use prior of the 9 signal-to-noise ratios and 8 separations (Fig. 2). As we were
knowledge to improve performance, neither of which is typically producing so many spectra, care was taken to vary the random seed to
possible with existing peak-picking software. Providing a 2D peak list ensure the noise distribution was not being repeated.
limits the search space substantially in 3 and 4D applications, and in
the case of 3 and 4D methyl NOE spectra, a user can further specify the Experimental methods
additional requirement that symmetry will be imposed on any cross- All data, processing scripts and analysis are included as part of a
peaks. These peak lists can then be used for assignment29 or passed to benchmark available for download as described in the data and code
structural calculations. availability statement. Methods to produce the materials described
The GUI is written in wx-python and uses the Python package and data acquisition are described below.
matplotlib35 to generate figures (Figs. 5a, 6, Supplementary Figs. 2, 3). αB-crystallin (86 residues, present as a dimer, 19.8 kDa):
Matplotlib is widely used by NMR users to visualise data35 and our Sequence:
software makes it easy for the plotting functions to be manually edited MRLEKDRFSVNLDVKHFSPEELKVKVLGDVIEVHGKHEERQDEHGFI
to cater for individual preferences. We are taking advantage of the SREFHRKYRIPADVDPLTITSSLSSDGVLTVNGPRKQVS

Nature Communications | (2025)16:449 8


Article https://doi.org/10.1038/s41467-024-54899-3

Residues 68–153 of Human αB-crystallin are known to form a crystallin domain and shows many variable peak heights in spectra
predominant dimer (19.8 kDa) at equilibrium, although all spectra again consistent with subunit exchange. The HNCO was recorded at
show signs of exchange broadening, consistent with previously 323 K on an Oxford Instruments 600 MHz spectrometer equipped with
observed monomer-dimer interchange25,38. All spectra of αB-crystallin a Varian Inova console and a 5 mm triple resonance room temperature
core domain were recorded at 288 K on an Oxford Instruments probe with z-axis gradients. 50 (13C), 25 (15N) and 1298 (1H) complex
600 MHz spectrometer equipped with a Varian Inova console and a points were acquired with 1400 Hz (13C), 1400 Hz (15N) and 9000 Hz
5 mm triple resonance room temperature probe with z-axis gradients. (1H) sweep widths using an interscan delay of 1.5 s and 16 scans per FID
The 2D 15N-1H sensitivity-enhanced HSQC was recorded with 100 for a total duration of 36 h 57 min.
( N) and 1024 (1H) complex points and 2110 Hz (15N) and 9000 Hz (1H)
15
DDX4N (236 residues, 25.4 kDa):
sweep widths using an interscan delay of 1 s and 16 scans per FID for a Sequence:
total duration of 60 min. The sensitivity-enhanced HNCO was recor- MGDEDWEAEINPHMSSYVPIFEKDRYSGENGDNFNRTPASSSEMD
ded with 40 (13C), 15 (15N) and 1298 (1H) complex points with 1400 Hz DGPSRRDHFMKSGFA
(13C), 1400 Hz (15N) and 9000 Hz (1H) sweep widths using an interscan SGRNFGNRDAGECNKRDNTSTMGGFGVGKSFGNRGFSNSRFED
delay of 1.5 s and 8 scans per FID for a total duration of 8 h 52 min. The GDSSGFWRESSNDCEDN
sensitivity-enhanced HNCA was recorded with 30 (13C), 15 (15N) and 1152 PTRNRGFSKRGGYRDGNNSEASGPYRRGGRGSFRGCRGGFGLG
(1H) complex points with 4500 Hz (13C), 1210 Hz (15N) and 9000 Hz (1H) SPNNDLDPDECMQRTGG
using an interscan delay of 1.2 s and 16 scans per FID for a total duration LFGSRRPVLSGTGNGDTSQSRSGSGSERGGYKGLNEEVITGSGKN
of 10 h 43 min. SWKSEAEGGES
HSP27core (88 residues, present as a dimer 19.8 kDa)1,23: 1–236 residues of human DDX4 protein (sequence termed
Sequence: DDX4N139).
GVSEIRHTADRWRVSLDVNHFAPDELTVKTKDGVVEITGKHEERQ DDX4N experiments were performed at 303 K on an Oxford
DEHGYISRCFTRKYTLPPGVDPTQVSSSLSPEGTLTVEAPMPK Instruments 750 MHz spectrometer equipped with a Bruker Avance
Residues 86–171 (a.k.a. the core domain) of Human HSP27 are III HD console and a 5 mm TCI CryoProbe with z-axis gradients.
known to form a predominant dimer (19.8 kDa) at equilibrium, The 2D 15N-1H BEST-TROSY HSQC was recorded with 128 (15N) and
although all spectra show signs of exchange broadening, consistent 1024 (1H) complex points and respective sweep widths of 1597 Hz
with previously observed monomer-dimer interchange. The 2D 15N-1H and 9803 Hz using an interscan delay of 0.2 s and 32 scans per FID
sensitivity-enhanced HSQC was recorded at 298 K on an Oxford for a total duration of 55 min. The 3D BEST-TROSY HNCO was
Instruments 750 MHz spectrometer equipped with a Bruker Avance III recorded with 64 (13C), 64 (15N) and 1024 (1H) complex points and
HD console and a 5 mm TCI cryoprobe with z-axis gradients. 2832 Hz (13C), 1597 Hz (15N) and 9804 Hz (1H) sweep widths using an
128 (15N) and 1024 (1H) complex points were acquired with 2257 Hz interscan delay of 0.2 s and 32 scans per FID for a total duration of
( N) 10,000 Hz (1H) sweep widths using an interscan delay of 1.4 s and
15
3 h 25 min.
16 scans per FID for a total duration of 1 h 46 min. Both the HNCO and The 3D BEST-TROSY HNCA was recorded with 64 (13C), 64 (15N)
HNCA were recorded at 298 K on an Oxford Instruments 600 MHz and 1024 (1H) complex points and 5291 Hz (13C), 1597 Hz (15N) and
spectrometer equipped with a Varian Inova console and a 5 mm triple 9804 Hz (1H) sweep widths using an interscan delay of 0.2 s and
resonance room temperature probe with Hz-axis gradients. 156 scans per FID for a total duration of 16 h 6 min. All spectra were
The sensitivity-enhanced HNCO was recorded with 50 (13C), 25 processed using the UnidecNMR software, which relies on NMRPipe
( N) and 1532 (1H) complex points with 1256 Hz (13C), 1056 Hz (15N) and
15
and nmrGlue.
8992 Hz (1H) sweep widths using an interscan delay of 1 s and 8 scans
per FID for a total duration of 13 h 10 min. Generalised instructions to run UnidecNMR
The sensitivity-enhanced HNCA was recorded with 40 (13C), 30 For best performance, we recommend a Lorentz-to-gauss window
(15N) and 1532 (1H) complex points with 2700 Hz (13C), 1056 Hz (15N) and function in all dimensions, which will result in a peak shape that well
8993 Hz (1H) using an interscan delay of 1 s and 16 scans per FID for a matches a pseudo-voigt function (Eq. 2). Empirically, care to mini-
total duration of 24 h 55 min. mise long Lorenztian tails tends to generate optimal results.
Ubiquitin (76 residues, 8.5 kDa): This choice is not essential, other window functions, including
Sequence: exponential and sine-bell can be used. In practice, our experimental
MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGK dataset was processed using a small range of apodization functions
QLEDGRTLSDYNIQKESTLHLVLRLRGG (Supplementary Table 2) and results are largely independent of this
All experiments were recorded at 298 K on a 500 MHz spectro- choice.
meter equipped with a Varian console, as published previously24. The The next step is to obtain general peak shape parameters for use
2D 15N HSQC was recorded with 128 (15N) and 1024 (1H) complex points as the peak shape filter. There are a range of ways to do this. First, a
and 2000 Hz (15N) 4000 Hz (1H) sweep widths using an interscan delay peak shape can be ‘guessed’, a trial UnidecNMR calculation is per-
of 1.3 s and 4 scans per FID for a total duration of 26 min. formed, the result inspected, then the values adjusted based on whe-
The HNCO was recorded with 40 (13C), 40 (15N) and 1024 (1H) ther the result has obvious over or under picked the spectrum. Second,
complex points with 1500 Hz (13C), 2000 Hz (15N) and 3000 Hz (1H) the UnidecNMR GUI loads in the most intense peaks detected, allowing
sweep widths using an interscan delay of 1.1 s and 8 scans per FID. The a user to either manually or algorithmically fit them within the software
HNCA was recorded with 60 (13C), 40 (15N) and 512 (1H) complex points (the ‘Fit Peaks’ tab). Sliders allow the various parameters to be adjusted
with 4000 Hz (13C), 2000 Hz (15N) and 4000 Hz (1H) sweep widths using until the shape is as desired.
an interscan delay of 1.3 s and 4 scans per FID. Once a peak shape has been determined, UnidecNMR can be run,
HSP16.5 (113 residues, present as a dimer 24.8 kDa): the result inspected, then the peak shape further tweaked until the
Sequence: simulated spectrum well resembles the original and the positions of
GSSSTGIQISGKGFMPISIIEGDQHIKVIAWLPGVNKEDIILNAVGDTLE the selected peaks look reasonable. The overlay of the reconvolved
IRAKRSPLMITESERIIYSEIPEEEEIYRTIKLPATVKEENASAKFENGVLSVILP spectrum and the original, as presented in the GUI, make this a
KAESSIK straightforward exercise (Figs. 1, 3, 5–7, S2, S7). Either the projections
Methanococcus jannaschii heat shock protein 16.5 (HSP16.5) forms in 3D/4D data or the overlay of the 3D/4D data specifically can be
a predominant dimer (24.8 kDa) when truncated to the ‘core’ alpha- compared in this way.

Nature Communications | (2025)16:449 9


Article https://doi.org/10.1038/s41467-024-54899-3

Finally, a noise threshold has to be set. A useful protocol for this is “[email protected]”. We will thoroughly welcome com-
to inspect the data in the projections window, place the lower contour munity input both in improving the user experience and expanding the
in a desirable position, and then press ‘set’. In practical applications, we benchmark to enable future development and improvement of com-
typically run with a high threshold to enable a rapid computation putational tools such as these.
allowing assessment of the peak shape, before honing by lowering the
threshold to a level suitable to detect all relevant spectral features. References
The number of CPUs can be set to any value, and the calculation 1. Alderson, T. R. et al. Local unfolding of the HSP27 monomer reg-
will be parallelised at the level of the Fourier transform via FFTW3. ulates chaperone activity. Nat. Commun. 10, 1068 (2019).
Improvements in calculation time are achieved, but owing to the 2. Huang, C., Rossi, P., Saio, T. & Kalodimos, C. G. Structural basis for
complexities of parallelising Fourier transforms, reduction in calcula- the antifolding activity of a molecular chaperone. Nature 537,
tion time will not be linear with the number of CPUs. 202 (2016).
When running UnidecNMR, there are two further considerations. 3. Rasmussen, S. G. F. et al. Crystal structure of the human β2 adre-
The first is the convergence of the algorithm. Two numbers can be nergic G-protein-coupled receptor. Nature 450, 383 (2007).
selected: the maximum number of iterations and the convergence 4. Gauto, D. F. et al. Integrated NMR and cryo-EM atomic-resolution
threshold. These can be set in the GUI to ‘quick’, ‘medium’ and ‘accu- structure determination of a half-megadalton enzyme complex.
rate’. Increasing the maximum number of iterations and decreasing the Nat. Commun. 10, 2697 (2019).
convergence threshold will result in a longer but more thorough cal- 5. Hofmann, S. et al. Conformation space of a heterodimeric ABC
culation. All results shown in the paper were achieved with either exporter under turnover conditions. Nature 571, 580–583 (2019).
‘medium’ or ‘accurate’ settings, but when initialising a calculation, 6. Scheres, S. H. W. RELION: implementation of a Bayesian approach
‘quick’ settings are helpful. to cryo-EM structure determination. J. Struct. Biol. 180,
The values for convergence of the algorithm (maxIter, con- 519–530 (2012).
vergence) were set as ‘quick’ (25, 10−5), ‘medium’ (50, 10−7) and ‘accu- 7. Selenko, P. Quo vadis biomolecular NMR spectroscopy? Int. J. Mol.
rate’ (100, 10−8). In calculations shown in this paper, the maximum Sci. 20, 1278 (2019).
number of iterations typically halts the calculation. The final peak list 8. Buchanan, C. J. et al. Pathogen-sugar interactions revealed by
tends to vary slightly between ‘quick’ and ‘medium’, and rarely varies universal saturation transfer analysis. Science 377,
when comparing ‘medium’ and ‘accurate’. In cases where the algorithm eabm3125 (2022).
performance seems poor, the maximum number of iterations should 9. Ernst, R. R., Bodenhausen, G. & Wokaun, A. Principles of Nuclear
be increased. Magnetic Resonance in One and Two Dimensions (Oxford University
Finally, as described in the text, the optimiser has one further user Press, 1987).
scalable parameter, ‘fac’. We find excellent results for 2D data with 10. Alipanahi, B., Gao, X., Karakoc, E., Donaldson, L. & Li, M. PICKY: a
fac = 1.4, and for 3D, 1.6 when using the ‘boring’ mode. If the program is novel SVD-based NMR spectra peak picking method. Bioinformatics
missing peaks that are highly overlapped, decrease fac. If it picks too 25, 268–275 (2009).
many, increase it. The settings become highly intuitive after running 11. Liu, Z., Abbas, A., Jing, B.-Y. & Gao, X. WaVPeak: picking NMR peaks
the program a small number of times. through wavelet-based smoothing and volume-based filtering.
Bioinformatics 28, 914–920 (2012).
Reporting summary 12. Klukowski, P. et al. NMRNet: a deep learning approach to automated
Further information on research design is available in the Nature peak picking of protein NMR spectra. Bioinformatics 34,
Portfolio Reporting Summary linked to this article. 2590–2597 (2018).
13. Koradi, R., Billeter, M., Engeli, M., Güntert, P. & Wüthrich, K. Auto-
Data availability mated peak picking and peak integration in macromolecular NMR
Benchmarking data is available for download from http://UnidecNMR. spectra using AUTOPSY. J. Magn. Reson. 135, 288–297 (1998).
chem.ox.ac.uk. The 2D HSQC and 3D HNCA/HNCO spectra from the 14. Cheng, Y., Gao, X. & Liang, F. Bayesian peak picking for NMR
four proteins in the benchmark (ubiquitin, aB-crystallin, hsp27 and spectra. Genomics Proteomics Bioinformatics 12, 39–47 (2014).
DDx4) are present, with input files that allow their processing and peak 15. Tikole, S., Jaravine, V., Rogov, V., Dötsch, V. & Güntert, P. Peak
picking to be performed precisely as described in this manuscript (in picking NMR spectral data using non-negative matrix factorization.
conjunction with the program, see “Code availability” statement). BMC Bioinformatics 15, 46 (2014).
These act as a template to allow users to adapt their own data to our 16. Wurz, J. M. & Guntert, P. Peak picking multidimensional NMR
environment and as a validation that the software works as described spectra with the contour geometry based algorithm CYPICK. J.
in this manuscript. Experimental details for the different systems and Biomol. NMR 67, 63–76 (2017).
spectrometer acquisition settings are found in the “Methods” section. 17. Li, D.-W., Hansen, A. L., Yuan, C., Bruschweiler-Li, L. & Brüschweiler,
Supplementary Table 2 describes detailed settings for processing the R. DEEP picker is a deep neural network for accurate deconvolution
FIDs into spectra. Supplementary Table 1 describes how the different of complex two-dimensional NMR spectra. Nat. Commun. 12,
peak-picking methods were scored. 5229 (2021).
18. Klukowski, P., Gonczarek, A. & Walczak, M. J. A Benchmark for
Code availability Automated Peak Picking of Protein NMR Spectra. In Proc. 2015 IEEE
Software and benchmarking data are available for download from Conference on Computational Intelligence in Bioinformatics and
http://UnidecNMR.chem.ox.ac.uk. The core algorithm is written in C++ Computational Biology (CIBCB), 1–8 (2015).
and will be available in pre-compiled binary form. The Python code will 19. Marty, M. T. et al. Bayesian deconvolution of mass and ion mobility
be distributed. This provides a GUI allowing access to the processing spectra: from binary interactions to polydisperse ensembles. Anal.
functions of nmrPipe, a 1/2/3/4D spectrum viewer, from which the Chem. 87, 4370–4376 (2015).
peak-picking functions of the UnidecNMR can also be directly acces- 20. Hell, S. W. & Wichmann, J. Breaking the diffraction resolution limit
sed. The Software is distributed “AS IS” under this Licence solely for by stimulated emission: stimulated-emission-depletion fluores-
non-commercial use. If you are interested in using the Software com- cence microscopy. Opt. Lett. 19, 780–782 (1994).
mercially, please contact the technology transfer company of the 21. Goddard, T. D. & Kneller, D. G. SPARKY 3, University of California,
University to negotiate a licence. Contact details are: San Francisco.

Nature Communications | (2025)16:449 10


Article https://doi.org/10.1038/s41467-024-54899-3

22. Bromiley, P. A. Products and convolutions of Gaussian probability Acknowledgements


density functions. Tina-Vis. Memo. 3, 1 (2003). Many thanks to the following for providing the data for this project: Reid
23. Alderson, T. R., Benesch, J. L. P. & Baldwin, A. J. Proline isomeriza- Alderson provided the HSP27 data; The Clore laboratory provided the 4D
tion in the C-terminal region of HSP27. Cell Stress Chaperones 22, dataset of methyl labelled EIN; The Kay laboratory provided the 3D
639–651 (2017). dataset of methyl labelled ATCase. Many thanks to Richard Harris and
24. Harris, R. The ubiquitin NMR resource. ACS Symp. Ser. 969, Paul Driscoll for the Ubiquitin NMR resource. Thank you to Aziz Khan for
114–126 (2007). preparing the 2,3-sialyllactose sample. Reid Alderson also kindly pro-
25. Hochberg, G. K. & Benesch, J. L. Dynamical structure of alphaB- vided comments on the manuscript. Thanks also to Pembroke College,
crystallin. Prog. Biophys. Mol. Biol. 115, 11–20 (2014). Oxford, for supporting this project. A.J.B. has received funding from the
26. Frueh, D. P. Practical aspects of NMR signal assignment in larger European Research Council (ERC) under the European Union’s Horizon
and challenging proteins. Prog. Nucl. Magn. Reson. Spectrosc. 78, 2020 research and innovation programme (grant agreement No
47–75 (2014). 101002859).
27. Velyvis, A., Schachman, H. K. & Kay, L. E. Assignment of Ile, Leu, and
Val methyl correlations in supra-molecular systems: an application Author contributions
to aspartate transcarbamoylase. J. Am. Chem. Soc. 131, Materials and raw data: O.T., M.B., G.K., T.J.N., C.R. Algorithm initial
16534–16543 (2009). design: M.T.M., A.J.B. Algorithm implementation, development, bench-
28. Venditti, V., Fawzi, N. L. & Clore, G. M. Automated sequence- and marking and testing: C.B., A.J.B. The manuscript was written by C.B. and
stereo-specific assignment of methyl-labeled proteins by para- A.J.B. with input from all authors.
magnetic relaxation and methyl-methyl nuclear overhauser
enhancement spectroscopy. J. Biomol. NMR 51, 319–328 (2011). Competing interests
29. Pritišanac, I. et al. Automatic assignment of methyl-NMR spectra of The authors declare no competing interests.
supramolecular machines using graph theory. J. Am. Chem. Soc.
139, 9523–9533 (2017). Additional information
30. Abragam, A. Principles of Nuclear Magnetism (Clarendon Supplementary information The online version contains
Press, 1961). supplementary material available at
31. Jin, L., Stec, B., Lipscomb, W. N. & Kantrowitz, E. R. Insights into the https://doi.org/10.1038/s41467-024-54899-3.
mechanisms of catalysis and heterotropic regulation of Escherichia
coli aspartate transcarbamoylase based upon a structure of the Correspondence and requests for materials should be addressed to
enzyme complexed with the bisubstrate analogue N-phosphonacetyl- Andrew J. Baldwin.
L-aspartate at 2.1 Å. Proteins 37, 729–742 (1999).
32. Garrett, D. S. et al. Solution structure of the 30 kDa N-terminal Peer review information Nature Communications thanks Gary Thomp-
domain of enzyme I of the Escherichia coli phosphoenolpyr- son and the other anonymous reviewers for their contribution to the peer
uvate:sugar phosphotransferase system by multidimensional NMR. review of this work. A peer review file is available.
Biochemistry 36, 2517–2530 (1997).
33. Kay, L. E., Ikura, M., Tschudin, R. & Bax, A. Three-dimensional triple- Reprints and permissions information is available at
resonance NMR spectroscopy of isotopically enriched proteins. J. http://www.nature.com/reprints
Magn. Reson. 89, 496–514 (1990).
34. Delaglio, F. et al. NMRPipe: a multidimensional spectral processing Publisher’s Note Springer Nature remains neutral with regard to jur-
system based on UNIX pipes. J. Biomol. NMR 6, 277–293 (1995). isdictional claims in published maps and institutional affiliations.
35. Hunter, J. D. Matplotlib: a 2D graphics environment. Comput. Sci.
Eng. 9, 90–95 (2007). Open Access This article is licensed under a Creative Commons
36. Helmus, J. J. & Jaroniec, C. P. Nmrglue: an open source Python Attribution 4.0 International License, which permits use, sharing,
package for the analysis of multidimensional NMR data. J. Biomol. adaptation, distribution and reproduction in any medium or format, as
NMR 55, 355–367 (2013). long as you give appropriate credit to the original author(s) and the
37. Ying, J., Delaglio, F., Torchia, D. A. & Bax, A. Sparse multi- source, provide a link to the Creative Commons licence, and indicate if
dimensional iterative lineshape-enhanced (SMILE) reconstruction changes were made. The images or other third party material in this
of both non-uniformly sampled and conventional NMR data. J. article are included in the article’s Creative Commons licence, unless
Biomol. NMR 68, 101–118 (2017). indicated otherwise in a credit line to the material. If material is not
38. Tkachenko, O. Polydisperse Chaperone Proteins and the Mechan- included in the article’s Creative Commons licence and your intended
isms by Which They Inhibit Aggregation (University of Oxford, 2018). use is not permitted by statutory regulation or exceeds the permitted
39. Nott, T. J., Craggs, T. D. & Baldwin, A. J. Membraneless organelles use, you will need to obtain permission directly from the copyright
can melt nucleic acid duplexes and act as biomolecular filters. Nat. holder. To view a copy of this licence, visit http://creativecommons.org/
Chem. 8, 569–575 (2016). licenses/by/4.0/.
40. Crabtree, M. D. et al. Ion binding with charge inversion combined
with screening modulates DEAD box helicase phase transitions. Cell © The Author(s) 2025
Rep. 42, 113375 (2023).

Nature Communications | (2025)16:449 11

You might also like