Music Deep Learning Deep Learning Methods for Music Signal ProcessingA Review of the State-Of-The-Art
Music Deep Learning Deep Learning Methods for Music Signal ProcessingA Review of the State-Of-The-Art
ABSTRACT The discipline of Deep Learning has been recognized for its strong computational tools, which
have been extensively used in data and signal processing, with innumerable promising results. Among
the many commercial applications of Deep Learning, Music Signal Processing has received an increasing
amount of attention over the last decade. This work reviews the most recent developments of Deep Learning
in Music signal processing. Two main applications that are discussed are Music Information Retrieval, which
spans a plethora of applications, and Music Generation, which can fit a range of musical styles. After a review
of both topics, several emerging directions are identified for future research.
INDEX TERMS Deep learning, music signal processing, music information retrieval, music generation,
neural networks, machine learning.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 11, 2023 17031
L. Moysis et al.: MDL: DL Methods for Music Signal Processing—A Review of the State-of-the-Art
For emotion classification, a GAN is proposed in [115] from the lyrics, combining a word vector and a CNN-LSTM
that utilizes a double-channel fusion strategy to extract local architecture, with a word frequency weight vector along with
and global features of an input voice or image. There are a DNN. The outputs of the two architectures are combined
five emotion classes considered: sad, happy, quiet, lonely, on a matching attention mechanism to derive the text emotion
and miss. The information used in the experiments comes classification. Four classes are considered, happy, sad, heal-
from a number of websites, such as Kuwo Music Box, Baidu ing, and calm. The classification accuracy for all emotions
Heartlisten, and others. The recognition rates achieved are ranges between 0.809 to 0.903.
between 87.6% and 91.2% for all emotions. For music score recognition, the proposed architecture
In [116], an architecture combining computer vision and takes as input an image of a music score and outputs the
note recognition is proposed for music notation recognition. duration, pitch, and coordinate for each note in [127]. Data
The experiments make use of several datasets, including the from Muse Score [128] were used for the experiments, and
JSB Chorales [117], Maestro [118], Video Game [119], Lakh the model outperforms other architectures, with respect to all
MIDI [82], [83], and another MIDI dataset. The recognition accuracy measures.
accuracy ranges from 0.88 to 0.92 for all the datasets. The For sound event recognition, [129] considers polyphonic
proposed model’s intended application is music education. sounds, for a wide family of 61 classes, including music,
For Singing voice separation, in [120], a GAN with a taken out of a dataset of ten different daily contexts, like a
time-frequency masking function is used. The databases sports game, a bus, a restaurant, and more [130]. The model
MIR-1K [48], iKala [72], and DSD100 [62], [89] are used in achieves an average f1 score of around 65%.
the experiments, and the model outperforms a conventional
DNN. H. ARCHITECTURE OVERVIEW
From the above review, it is clear that the ‘‘classical’’ DL
F. CONVOLUTIONAL RNNs (CRNN) models perform well in a variety of MIR tasks. However,
Complementary to standard models, more complex ones the models under consideration need to be appropriately
have been developed that utilize couplings between different designed, so that they can achieve good results for their
architectures, often in a series interconnection, to combine set problem. Thus, (and accordingly to the no free lunch
their characteristics and improve performance. Convolutional theorem) there is no architecture that can be considered
RNNs (CRNNs) are one of these examples. holistically better than the rest. On the contrary, complex
architectures that incorporate layers of different types are the
For music classification, a CRNN was considered in [121],
most promising, since they combine the best characteristics
which is a CNN network with the last layers replaced by
of each DL module, as discussed in Section IV.
an RNN. The CNN part is used for feature extraction and
the RNN part as a temporal summarizer. The Million Song
III. DL METHODS FOR MUSIC GENERATION
Dataset [80] is used for training, to predict genre, mood,
In this section, the application of DL in MG is reviewed.
instrument, and era. The model outperforms other architec-
Automatic MG utilizes the MIR techniques mentioned in the
tures with respect to AUC-ROC.
previous section to generate novel music scores of desired
For MR, a CRNN is used in [122] for classifying and
characteristics, like genre, rhythm, tonality, and underlying
recommending music, in the categories of classical, elec-
emotion. The resulting output can either be a music track in
tronic, folk, hip-hop, instrumental, jazz, and rock music. The
the form of audio, so it can be directly listened to, or it can
database used is the Free Music Archive [123]. The system
be in a symbolic notation form. Along with the generation
was tested on a group of 30 users, and the best architecture
of novel tracks, some tasks can be considered adjacent to
was the one that implemented a cosine similarity, along with
MG. One such application is Genre Transfer (GT). This
information on music genre.
refers to preserving key content characteristics of a music
score and applying style characteristics that are typical of a
G. CNN-LSTM different genre. An example would be transforming a pop
Similarly to CRNNs, some works combine the architectures song into its heavy metal cover. Another application is Music
of CNNs with LSTMs. For emotion classification, a model Inpainting (MI), which refers to filling a missing part of a
in [124], consisting of a 2d input through a CNN-LSTM and a music track, using information from the rest of its content.
1d input through a DNN, combines two types of features and Again, the section is divided into subsections based on the DL
improves audio and lyrics classification performance. Four architecture used. The public databases used in each work are
classes are considered, angry, happy, relaxed, and sad. The also mentioned. Table 2, summarizes the reviewed works for
dataset used is the Last.fm tag subset of the Million Song MG, categorized by their architecture.
Dataset [80], with an average accuracy of 78%. In [125], The MG architectures can be evaluated both objectively
a novel database of Turkish songs is constructed for exper- and subjectively. Objective evaluation refers to using math-
imentation. The model uses a CNN as the feature extractor ematical and statistical tools, to measure the similarity of
and an LSTM with a DNN as the classifier. An accuracy of the generated music tracks to the training dataset, as well as
over 99% is obtained. In [126], the model extracts features other characteristics that can measure their similarity to real
TABLE 1. Deep learning methods for music information retrieval. pleRNN [133] generates one audio sample at a time, with the
resulting signals receiving positive evaluation from human
listeners. Three different datasets were considered, one con-
taining a female English voice actor, one containing human
sounds like breathing, grunts, coughs, etc, and one con-
taining Beethoven’s piano sonatas, taken from the Internet
Archive [134]. The models were evaluated by a human
group, with the samples of the 3-tier model gaining the
highest preference. In [117], an RNN model termed Deep-
Bach is designed, for generating hymn-like scores mimicking
the style of Bach. The dataset is taken from the music21
library [135]. The model offers some control to the user,
allowing the placement of constraints like notes, rhythms,
or cadences to the score. The model was evaluated by human
listeners of varying expertise, who were given several sam-
ples, and had to guess between Bach or computer generated.
Around 50% of the time, the computer tracks were passed
as real samples, which is a very satisfying result for such
complex music. The work was expanded in [136], with an
architecture termed Anticipation-RNN which again offered
control to the user to place defined positional constraints. The
music21 library [135] was used once again.
In [137], a Graphical User Interface (GUI) system termed
BachDuet was developed for promoting classical music
improvisation training through user and computer interaction.
The JSB chorales data from the music21 dataset [135] is used
for training. The GUI was warmly received by test users, who
music. For objective evaluation, there are several measures, found the improvisation interaction easy to use, enjoyable,
including the loss and accuracy of the training process, the and helpful for improving their counterpoint improvisation
empty bar rate, polyphonicity, note in a scale, qualified note skills. Additionally, a second group of participants were asked
rate, tonal distance, and note length histogram, among others. to listen to music clips, rate them, and also decide whether
Most studies consider a subset of these measures or similar they resulted from a human-machine improvisation using
ones, so the reader can refer to each work for details. BachDuet, or human-human interaction. Both types of tracks
For subjective evaluation, a test audience is usually given received similar scores, and the listeners were also unable
a collection of DL-generated tracks from different architec- to differentiate between the duets, as they wrongly classified
tures, along with human compositions, and is asked to rate them around 50% of the time.
them with respect to different aspects, usually on a five-point In [138] the model produces drum rhythms for a
Likert scale. Variations of this include comparing pairs of seven-piece drum kit. Natural language translation was used
tracks and choosing which one they prefer the most or being to express the hit sequences. An online interface was designed
asked to decide if a track is computer or human-made. In the and evaluated by users, who gave an overall average to posi-
following sections, we point out which works have conducted tive score.
subjective evaluations, as the positive audience perception In [139], the effects of different conditioning inputs on the
of AI music tracks is essential for the future applicability performance of a recurrent monophonic melody generation
of MDL. The reader can again refer to each work for the model are studied. The model was trained on the FolkDB
extensive presentation of the evaluation results. dataset [140] and a novel Bebop Jazz dataset. The valida-
As a closing note, it is worth mentioning an issue that tion Negative Log Likelihood loss (NLL) can be as low as
emerges from the field of AI-based MG, that of copyright- 0.190 for the pitch and 0.045 for the duration.
ing [131], [132]. As AI methods use different software and In [141], the problem of inpainting was considered, which
sample databases, legal problems may arise when claiming combines a VAE that takes as input past and future context
authorship of the final musical product. It is thus important sequences, with an RNN that takes as input the latent vectors
that legislators update the existing policies, to avoid rising from the VAE, and as output a latent vector sequence that is
such issues in the future. passed through a decoder, to create the inpainting sequence.
A folk dataset from The Session [142] is used for testing. The
A. RNNs model outperforms others with respect to the NLL measure.
As with MIR, RNNs have proved popular for MG tasks. The architecture was also tested by users, who were given
For works on classical music, the model termed Sam- pairs of segmented sequences, and had to choose among
17038 VOLUME 11, 2023
L. Moysis et al.: MDL: DL Methods for Music Signal Processing—A Review of the State-of-the-Art
excerpts that fit. The model performance was on the same Experiments were conducted in MuseData [154], a classical
level as other architectures. music dataset, and JSB chorales [155] dataset. The model out-
performs other architectures with respect to Log-likelihood
B. LSTMs (LL) and frame-level accuracy (ACC%) measures.
LSTMs have been considered for several scenarios. In [143], In [156], variations of the LSTM are discussed, termed
data preprocessing has been applied to improve the quality of Tied Parallel LSTM with a neural autoregressive distribution
the generated music, and also reduce training time. estimator (NADE), and Biaxial LSTM. The model was tested
In [144], BLSTM networks are used for chord generation. on the datasets of JSB Chorales [155], MuseData [154],
The database used was Wikifonia, which is now inactive, Nottingham [147], and Piano-MIDI [151], a classical piano
that included sheets for several music genres [145]. The user dataset. The architectures perform well concerning the Log-
evaluation showed a preference for the BLSTM model over likelihood measure. The architectures also have translation
others, although the original music still received the highest invariance.
score. In [157], an RNN-LSTM architecture is proposed, using
In [146], BLSTM is used for chord generation. The model the Meier cepstrum coefficients as features. The dataset con-
consists of three parts: a chord generator, which uses some sists of folk tunes collected by the author. The model achieves
starting chords as input, a chord-to-note generator, which an accuracy of 99% and a loss rate of 0.03.
generates the melody line from the generated chords, and a In [158], a model termed Chord conditioned Melody
music styler, which combines the chords and melody into Transformer (CMT) is proposed, which generates rhythm and
a final music piece. Multiple music genres were used as a pitch conditioned on a chord progression. The training has
training database, including Nottingham [147], a collection two phases, first, a rhythm decoder is trained, and second,
of British and American folk tunes, Wikifonia [145], and the a pitch decoder is trained based on the rhythm decoder.
McGill-Billboard Chord Annotations [148]. The model was The model was trained on a novel K-Pop dataset. In addi-
evaluated by listeners, which gave a score ranging from neu- tion to various measures, like rhythm accuracy, the model
tral to positive, taking into consideration harmony, rhythm, was also evaluated by listeners, with respect to rhythm, har-
and structure. mony, creativity, and naturalness. The model outperforms the
In [149], a combination of two LSTM models, termed Explicitly-constrained conditional variational auto-encoder
CLSTMS, is used to build chords that can match a given (EC2 -VAE) [159], with respect to rhythm, harmony, and natu-
melody. One sub-model is used for the analysis of measure ralness. The model also has a higher score for creativity than
note information, and the other is used for chord transfer the real dataset tracks, meaning that it can indeed generate
information. Wikifonia is used with data taken from [144] and novel melodies.
[145]. In [160], an LSTM specifically for Jazz music was
In [150], a variation of Biaxial LSTM was used, and a designed, using a novel Jazz music dataset in MIDI format,
model termed DeepJ was developed for MG. The model was and the Piano-MIDI [151]. The model can also generate
tested on three types of music, baroque, classical, and roman- music using only a chosen instrument. The model can achieve
tic, with test participants being able to successfully categorize a very low final loss value.
the generated samples most of the time. The Piano-MIDI In [161], a BLSTM network with attention is considered
dataset [151] was used. The model is also capable of mixing for Jazz MG. The architecture consists of a BLSTM network,
musical styles by tuning the values of a single input vector. an attention layer, and another LSTM layer. The Jazz ML
In [152], a two-stage architecture is proposed that utilizes ready MIDI dataset [162] is considered. The model outper-
BLSTM, where the harmony and rhythm templates are first forms simpler architectures like LSTM without attention and
produced, and the melody is then generated and conditioned the attention LSTM without the BLSTM layer.
on these templates. The Wikifonia dataset is used [145]. In the In [163], a piano composer is designed, that uses
subjective evaluation, participants were given a collection of information from given composers to generate music.
tracks and were asked to rate them according to how much The datasets used were Classical Music MIDI [164] and
they found them pleasing and coherent, and whether they MIDI_classic_music [165], from which tracks of Beethoven,
believe they were human or AI-generated. The highest scores Mozart, Bach, and Chopin were considered. The model was
were achieved by the model where the melody generator is evaluated through a human survey, where participants had
conditioned on an existing chord and rhythm scheme from a to choose the real sample among the computer-generated
real song. This melody is also perceived as human-made by and composer ones. Around half the time, people mistook
many participants. The authors also noted that there are high the model-generated music for the human-composed track,
standard deviations in all answers, and slightly more so in the meaning that the model can generate music that is relatively
models rated positively, indicating that there is a much wider indistinguishable from real samples. The generated tracks can
perception of what is considered good-sounding music, than also be perceived as fairly interesting, pleasing, and realistic.
a bad one. In [166], an architecture, comprising of an LSTM paired
In [153], an architecture combining LSTM with a Recur- with a Feed Forward layer, can generate drum sequences
rent Temporal Restricted Boltzmann Machine is designed. resembling a learned style, and can also match up to set
VOLUME 11, 2023 17039
L. Moysis et al.: MDL: DL Methods for Music Signal Processing—A Review of the State-of-the-Art
constraints. The LSTM part learns drum sequences, while experiments [82]. The model achieves high cosine similarity
the feed-forward part processes information on guitar, bass, with the human-composed music for the frequency vector.
metrical structure, tempo, and grouping. The dataset was In [176], the problem of symbolic music GT was studied
collected from 911tabs [167], and broken into three parts, for using CycleGAN, a model consisting of two GANs that
80s disco, 70s blues and rock, and progressive rock/metal, exchange data and are trained simultaneously. The model
with the model being effective in all styles. was evaluated using genre classifiers, verifying the successful
Finally, in [168], the MI problem was considered by com- style transfer.
bining half-toning and steganography, and various methods In [177], DrumGan is proposed, an architecture for gen-
were compared using a dataset of various instruments, with erating drum sounds (kick, snare, and cymbal). The model
satisfying results for the considered models. offers user control over the resulting score, by tuning the
timbre features.
C. CNNs In [178], the authors generated log-magnitude spectro-
grams and phases directly with GAN to produce more coher-
For CNN architectures, in [169], the architecture comprises
ent waveforms than directly generating waveforms with
an LSTM as a generator, a CNN as a discriminator, and a con-
strided convolutions. The resulting scores are generated at a
trol network that introduces restriction rules for a particular
much higher speed. The NSynth dataset [179] is used, which
style of music generation. The matching subset of the Lakh
contains single notes from many instruments, at different
MIDI dataset (LMD) [82] and Piano-MIDI dataset [151]
pitches, timbres, and volumes. The human audience rated the
was used. The model was evaluated by music experts, with
audio quality of the tracks, and the model was received as
respect to melody, rhythm, chord harmony, musical texture,
slightly inferior to the real tracks.
and emotion. The model is rated higher than other ones in all
In [180], a GAN equipped with a self-attention mechanism
of the above aspects.
is used to generate multi-instrument music. The self-attention
In [170], a CNN with a Bidirectional Gate Recurrent Unit
mechanism is used to allow the extraction of spatial and
(BiGRU) and attention mechanism is used for folk music
temporal features from data. The Lakh MIDI [82] and Million
generation. The ESAC dataset [171] is used for testing. The
Song [80] datasets were used here.
results were evaluated by listeners, who gave overall positive
In [181], a GAN was designed for symbolic MG, along
ratings, although lower than the real ones. There were also
with a conditional mechanism to use available prior informa-
some exceptions of low scores, meaning that the model gen-
tion, so that the model can generate melodies either starting
eration may have some inconsistencies in its performance.
from zero, by following a chord sequence, or by conditioning
In [172] a Convolution-LSTM for piano track generation is
on the melody of previous bars. Pop music tabs from Theory-
considered. The CNN layer is used for feature extraction, and
Tab [182] were used. The resulting system, termed MidiNet,
the output is fed into the LSTM for music generation. Piano
is compared to Google’s MelodyRNN and performs equally,
tracks from Midiworld [173] were used for training. The
with the test audience characterizing the results as being more
model was evaluated by listeners, who were given 10 music
interesting.
segments, and had to decide whether they were human-made
In [183], multi-track MG was considered using three dif-
or computer generated. In most cases, the segments were
ferent GAN models, termed the Jamming, Composer, and
correctly identified, but the Convolution-LSTM model per-
Hybrid. The Jamming model consists of multiple indepen-
formed better than the simple LSTM.
dent generators. The Composer consists of a single gener-
ator and discriminator, and a shared random input vector.
D. GANs In the Hybrid model, the independent generators have both
Symbolic music is stored using a notation-based format, an independent and a shared random input vector. The models
which makes it an easier-to-use input for training NNs. were trained on a rock music database and used to generate
For symbolic music generation, a GAN model is proposed piano rolls for bass, drums, guitar, piano, and strings. The
in [174] for piano roll generation, equipped with LSTM layers database is termed Lakh Pianoroll Dataset, as it is created
in the generator and discriminator. The generated files were from the Lakh MIDI [82], by converting the MIDI files to
evaluated by participants with respect to melody and rhythm, multi-track piano rolls. A subset is also used with matched
and the proposed model received a higher score than files entries from the Million Song dataset [80]. Additionally to
generated from other architectures. using the training database, the model can also use as an
In [175], an inception model conditional GAN termed input a given music track from the user and generate four
INCO-GAN is proposed that can generate variable-length additional tracks from it. The model was evaluated by profes-
music. This complex architecture consists of two phases, sional and casual users and received overall neutral to positive
that of training and generation, and each phase is bro- scores.
ken into three processes: preprocessing, CVG training, and In [184], Sequence Generative Adversarial Net (SeqGAN)
conditional GAN training for the training stage, and CVG is proposed, which applies policy gradient update. The Not-
executing, phrase generation, and postprocessing for the tingham folk dataset [147] is used in the experiments. The
generation phase. The Lakh MIDI dataset is used for the model outperforms a maximum likelihood estimation (MLE)
17040 VOLUME 11, 2023
L. Moysis et al.: MDL: DL Methods for Music Signal Processing—A Review of the State-of-the-Art
and overall structure and quality. The proposed model outper- TABLE 2. DL methods for MG.
forms others in all metrics.
In [197], conditional drum generation is considered,
inspired by [166]. A BLSTM encoder receives the condition-
ing parameter information, and a transformer-based decoder
with relative global attention generates the drum sequence.
A subset of rock and metal songs from the Lakh MIDI
dataset is used [82]. For subjective evaluation, participants
were given a set of three tracks, two being the accompanying
or condition tracks, and the third being the drum track to
be evaluated. They were asked to rate the drum tracks with
respect to rhythm, pitch, naturalness, groove, and coherence.
The tracks generated from the proposed model outperform
another baseline model and are even rated higher than real
compositions with respect to naturalness, groove, and coher-
ence. The users were also asked their opinion on whether
the given drum tracks each time were real compositions or
computer generated. The drum tracks from the model were
perceived as computer generated only 39% of the time, indi-
cating the natural feel of the tracks.
In [198], the problem of melody harmonization was
considered. The model maps lower-level melody notes
into semantic higher-level chords. Three architectures are
proposed using a standard transformer, variational trans-
former, and regularized variational transformer. The Chord
Melody [199] and Hooktheory Lead Sheet [200] datasets are
used. In the human evaluation conducted, participants, com-
prising casual music listeners and professionals, were asked
to rate samples with respect to harmonicity, unexpectedness, tently yield improved results is to consider combined archi-
complexity, and preference. The standard model achieved tectures, like CRNNs [121], [122] or LRCNs [65], [69]. Such
the highest scores in harmony and preference, whereas the approaches can harness the individual characteristics of each
variational model achieved the highest in unexpectedness and model to surpass their counterparts. Attention mechanism
complexity. enhanced architectures is one such example [56], [95], [126],
[161], [180], with more being developed [201], [202], [203],
F. ARCHITECTURE OVERVIEW [204]. Such approaches will surely lead the advances in the
As with the case of MIR, it is clear that there is no single MDL field.
architecture that can outperform the rest in MG tasks. Multi- Apart from hybrid architectures, MDL will be significantly
layered architectures though can be a path for building better benefited from the use fusion of diverse input modalities. This
models, especially when additional objectives are set, like would increase performance, as the conjunction of differ-
conditioning the generated music to desired features. ent modalities can help build connections between different
features. For example, in [76] sound signals were extracted
IV. FUTURE STUDIES IN MDL from unlabelled video sources. In [205], the combination of
In this section, future research directions in MDL are identi- singing signals along with laryngoscope images was com-
fied and discussed. bined for voice parts division. In [206], a system that com-
bined heart rate measurements and facial expressions was
A. MIXED ARCHITECTURES composed to detect drowsiness in drivers, which is accompa-
So far there have been multiple approaches and different nied by a music recommendation system used as a counter-
architectures to address key problems in MDL. However, measure to avoid accidents. In [63] and [64], a synchronized
despite most works reporting positive results, due to the com- music and dance dataset were used for recommendation.
plexity of the applications under study and their peculiarities, In [207], music emotion classification is performed for four
there is no dominant method that should be followed for a emotional classes, combining features from lyrics and acous-
given task. Thus, there is no overall superior architecture that tics. These are indicative examples of an emerging trend of
is guaranteed to outperform all others for any given MDL bridging the gap between different modalities.
problem. For the above techniques, an all-present problem is
On the other hand, results indicate that the best approach the computational cost of training [208]. The increase in
to constructing holistically better models, which can consis- hardware requirements creates practical issues with energy
consumption and environmental footprint, which under the TABLE 3. List of DL studies focused in traditional music.
scope of the global energy and environmental crisis, are
mandatory to address. Addressing the above will require the
performance improvement of current architectures, or the
consideration of different ones [209]. Understandably, any
improvements in the computational cost will, by extension,
also boost the commercialization of MDL applications.
B. TRADITIONAL MUSIC
Most of the existing works use widely available training
databases, which mainly include western music genres, like
classical music, pop, rock, metal, jazz, blues, etc. Using
widely established music genres make sense, due to their
popularity, but it is highly important to enrich and diversify
the training databases by including more genres. So, while it
is essential to consider new and emerging genres, especially
ones that are computer-based, like electronic, synth-wave,
and vaporwave [65], [210], [211], [212], another trend that
is gaining popularity is the application of MDL and MG for
traditional and regional music. Traditional music refers to
music originating from a specific country or region and is
closely tied to its culture [213]. Examples include the recita-
tion of religious excerpts like the Holy QurBan [214], and
traditional music from different regions, like Byzantine [215],
Greek [216], [217], [218], Persian [219], Chinese [220],
[221], Indian [222], [223], and many more.
In the development of MDL for regional and traditional
music, several challenges may appear, as a result of the dis- have to be adjusted to fit the characteristics of each genre.
tinct nature of the topic. One issue is the dataset availability, This again requires the existence of appropriate databases for
which in contrast to western popular music, is in many cases different musical notations.
hard to gather, especially in large amounts, which are required Overall, it seems that there are still several practical chal-
for optimal training. In most cases, the research groups take it lenges to fully developing DL for traditional music. These are
upon themselves to build their own dataset, due to the lack of steadily addressed by the efforts of several research groups
existing ones, so hopefully, in the future, more authorities will over the world. Table 3 lists the recent works that study
help towards building free databases [77], [78], [142], [196], Traditional Music Deep Learning (TMDL), categorized by
[221], [224], [225], [226]. For this task, recording difficulties music type. These works offer great service to the preserva-
may arise, especially for recordings made outside a music stu- tion of history, culture, and art, as the digitization, study, and
dio, with varying acoustics, for example in religious singing. generation of traditional music will help open it up to new
Coming along with the problem of dataset collection is that generations of listeners and also promote thematic (music,
of appropriate feature tagging of the tracks. This is strenuous religious) tourism. Thus, it is expected that more research
work that requires time, and often the collaboration of music groups will contribute to regional MDL in the future, and
experts, for tasks like the annotation of music features, and hopefully, such research endeavors will also receive govern-
testing audiences, for more ambiguous characterizations, like mental support and recognition.
the emotion that a track evokes.
Moreover, many musical instruments, like the guitar and C. MEDICAL APPLICATIONS
piano, are present in almost all music genres, so it is eas- The field of Music Therapy (MT) lies at the intersec-
ier to adopt MG architectures for a specific instrument to tion of Medicine and Music. MT is an evident-based
many different styles. This may not be the case for regional approach for treating a plethora of pathological condi-
instruments, which are only used for playing a region’s tradi- tions, including, among others, anxiety, depression, substance
tional music. So, for preserving and learning musical styles abuse, Alzheimer’s, eating disorders, sleep disorders, and
through DL, it is essential to build datasets for specific instru- more [261], [262], [263]. Naturally, DL can prove a valuable
ments [221]. Finally, many traditional music styles have a dis- tool to therapists and patients, as a complement to existing
tinct musical notation, like Mensural notation, Chinese Gong- treatments. Table 4 summarizes the recent applications of
che, and Organ tablature, meaning that MDL architectures for DL in music therapy, categorized by architecture. The con-
transcription, pattern recognition, and symbolic MG would ditions that have been addressed include music remixing to
fields. For all of the aforementioned applications, bringing [19] Z. Rafii, A. Liutkus, F.-R. Stöter, S. I. Mimilakis, D. FitzGerald, and B.
together research groups consisting of heterogeneous and Pardo, ‘‘An overview of lead and accompaniment separation in music,’’
IEEE/ACM Trans. Audio, Speech, Language Process., vol. 26, no. 8,
complementing researchers, like computer scientists, physi- pp. 1307–1335, Aug. 2018.
cists, mathematicians, musicians, audio engineers, and med- [20] R. Monir, D. Kostrzewa, and D. Mrozek, ‘‘Singing voice detection: A
ical practitioners, is the key to success. The authors hope survey,’’ Entropy, vol. 24, no. 1, p. 114, Jan. 2022.
[21] K. Choi, G. Fazekas, K. Cho, and M. Sandler, ‘‘A tutorial on deep learning
that the present work can be of service to these researchers, for music information retrieval,’’ 2017, arXiv:1709.04396.
by providing a clear overview of recent and emerging devel- [22] D. Han, Y. Kong, J. Han, and G. Wang, ‘‘A survey of music emotion
opments in the field. recognition,’’ Frontiers Comput. Sci., vol. 16, no. 6, pp. 1–11, Dec. 2022.
[23] B. L. Sturm, J. Felipe Santos, O. Ben-Tal, and I. Korshunova, ‘‘Music
transcription modelling and composition using deep learning,’’ 2016,
APPENDIX A arXiv:1604.08723.
LIST OF ABBREVIATIONS [24] L. Casini, G. Marfia, and M. Roccetti, ‘‘Some reflections on the potential
Table 5 lists the abbreviations used throughout the text. and limitations of deep learning for automated music generation,’’ in
Proc. IEEE 29th Annu. Int. Symp. Pers., Indoor Mobile Radio Commun.
(PIMRC), Sep. 2018, pp. 27–31.
REFERENCES [25] M. Kleć and A. Wieczorkowska, ‘‘Music recommendation systems:
[1] A. Shrestha and A. Mahmood, ‘‘Review of deep learning algorithms and A survey,’’ in Recommender Systems for Medicine and Music. Cham,
architectures,’’ IEEE Access, vol. 7, pp. 53040–53065, 2019. Switzerland: Springer, 2021, pp. 107–118.
[2] J. Chai, H. Zeng, A. Li, and E. W. T. Ngai, ‘‘Deep learning in computer [26] N. Ndou, R. Ajoodha, and A. Jadhav, ‘‘Music genre classification: A
vision: A critical review of emerging techniques and application scenar- review of deep-learning and traditional machine-learning approaches,’’
ios,’’ Mach. Learn. Appl., vol. 6, Dec. 2021, Art. no. 100134. in Proc. IEEE Int. IoT, Electron. Mechatronics Conf. (IEMTRONICS),
[3] N. Fatima, A. S. Imran, Z. Kastrati, S. M. Daudpota, and A. Apr. 2021, pp. 1–6.
Soomro, ‘‘A systematic literature review on text generation using [27] C.-W. Wu, C. Dittmar, C. Southall, R. Vogl, G. Widmer, J. Hockman,
deep neural network models,’’ IEEE Access, vol. 10, pp. 53490–53503, M. Müller, and A. Lerch, ‘‘A review of automatic drum transcription,’’
2022. IEEE/ACM Trans. Audio, Speech, Language Process., vol. 26, no. 9,
[4] M. R. Karim, O. Beyan, A. Zappa, I. G. Costa, D. Rebholz-Schuhmann, pp. 1457–1483, Sep. 2018.
M. Cochez, and S. Decker, ‘‘Deep learning-based clustering approaches [28] L. Wyse, ‘‘Audio spectrogram representations for processing with convo-
for bioinformatics,’’ Briefings Bioinf., vol. 22, no. 1, pp. 393–415, lutional neural networks,’’ 2017, arXiv:1706.09559.
Jan. 2021. [29] C. Gupta, H. Li, and M. Goto, ‘‘Deep learning approaches in topics
[5] M. M. Islam, F. Karray, R. Alhajj, and J. Zeng, ‘‘A review on deep learning of singing information processing,’’ IEEE/ACM Trans. Audio, Speech,
techniques for the diagnosis of novel coronavirus (COVID-19),’’ IEEE Language Process., vol. 30, pp. 2422–2451, 2022.
Access, vol. 9, pp. 30551–30572, 2021. [30] M. Civit, J. Civit-Masot, F. Cuadrado, and M. J. Escalona, ‘‘A sys-
[6] A. B. Nassif, I. Shahin, I. Attili, M. Azzeh, and K. Shaalan, ‘‘Speech tematic review of artificial intelligence-based music generation: Scope,
recognition using deep neural networks: A systematic review,’’ IEEE applications, and future trends,’’ Exp. Syst. Appl., vol. 209, Dec. 2022,
Access, vol. 7, pp. 19143–19165, 2019. Art. no. 118190.
[7] S. Minaee, Y. Y. Boykov, F. Porikli, A. J. Plaza, N. Kehtarnavaz, and [31] S. Ji, J. Luo, and X. Yang, ‘‘A comprehensive survey on deep music gen-
D. Terzopoulos, ‘‘Image segmentation using deep learning: A survey,’’ eration: Multi-level representations, algorithms, evaluations, and future
IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 7, pp. 3523–3542, directions,’’ 2020, arXiv:2011.06801.
Jul. 2021. [32] J.-P. Briot and F. Pachet, ‘‘Deep learning for music generation: Challenges
[8] L. Ljung, C. Andersson, K. Tiels, and T. B. Schön, ‘‘Deep learn- and directions,’’ Neural Comput. Appl., vol. 32, no. 4, pp. 981–993,
ing and system identification,’’ IFAC-PapersOnLine, vol. 53, no. 2, Feb. 2020.
pp. 1175–1181, 2020. [33] L. A. Iliadis, S. P. Sotiroudis, K. Kokkinidis, P. Sarigiannidis,
[9] G. Gupta and R. Katarya, ‘‘Research on understanding the effect of S. Nikolaidis, and S. K. Goudos, ‘‘Music deep learning: A survey on deep
deep learning on user preferences,’’ Arabian J. Sci. Eng., vol. 46, no. 4, learning methods for music processing,’’ in Proc. 11th Int. Conf. Modern
pp. 3247–3286, Apr. 2021. Circuits Syst. Technol. (MOCAST), Jun. 2022, pp. 1–4.
[10] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, [34] F. Fessahaye, L. Perez, T. Zhan, R. Zhang, C. Fossier, R. Markarian,
MA, USA: MIT Press, 2016. C. Chiu, J. Zhan, L. Gewali, and P. Oh, ‘‘T-RECSYS: A novel music
[11] H. Purwins, B. Li, T. Virtanen, J. Schlüter, S.-Y. Chang, and T. Sainath, recommendation system using deep learning,’’ in Proc. IEEE Int. Conf.
‘‘Deep learning for audio signal processing,’’ IEEE J. Sel. Topics Signal Consum. Electron. (ICCE), Jan. 2019, pp. 1–6.
Process., vol. 13, no. 2, pp. 206–219, Apr. 2019. [35] (2018). Spotify RecSys Challenge. Accessed: Sep. 30, 2022. [Online].
[12] J. P. Puig, ‘‘Deep neural networks for music and audio tagging,’’ Available: http://www.recsyschallenge.com/2018/
Ph.D. thesis, Inf. Commun. Technol., Universitat Pompeu Fabra, [36] V. Revathy and A. S. Pillai, ‘‘Binary emotion classification of music using
Barcelona, Spain, 2019. deep neural networks,’’ in Proc. Int. Conf. Soft Comput. Pattern Recognit.
[13] M. Schedl, ‘‘Deep learning in music recommendation systems,’’ Frontiers Cham, Switzerland: Springer, 2021, pp. 484–492.
Appl. Math. Statist., vol. 5, p. 44, Aug. 2019. [37] I. A. P. Santana, F. Pinhelli, J. Donini, L. Catharin, R. B. Mangolin, V.
[14] J.-P. Briot, G. Hadjeres, and F.-D. Pachet, ‘‘Deep learning techniques for Delisandra Feltrim, and M. A. Domingues, ‘‘Music4All: A new music
music generation—A survey,’’ 2017, arXiv:1709.01620. database and its applications,’’ in Proc. Int. Conf. Syst., Signals Image
[15] Y. M. G. Costa, L. S. Oliveira, and C. N. Silla Jr., ‘‘An evaluation of con- Process. (IWSSIP), Jul. 2020, pp. 399–404.
volutional neural networks for music classification using spectrograms,’’ [38] G. Song, Z. Wang, F. Han, S. Ding, and M. A. Iqbal, ‘‘Music auto-tagging
Appl. Soft Comput., vol. 52, pp. 28–38, Mar. 2017. using deep recurrent neural networks,’’ Neurocomputing, vol. 292,
[16] C. Senac, T. Pellegrini, F. Mouret, and J. Pinquier, ‘‘Music feature maps pp. 104–110, May 2018.
with convolutional neural networks for music genre classification,’’ in [39] E. Law and L. von Ahn, ‘‘Input-agreement: A new mechanism for col-
Proc. 15th Int. Workshop Content-Based Multimedia Indexing, Jun. 2017, lecting data using human computation games,’’ in Proc. SIGCHI Conf.
pp. 1–5. Hum. Factors Comput. Syst., Apr. 2009, pp. 1197–1206.
[17] M. Lu, D. Pengcheng, and S. Yanfeng, ‘‘Digital music recommendation [40] J. R. Castillo and M. J. Flores, ‘‘Web-based music genre classifica-
technology for music teaching based on deep learning,’’ Wireless Com- tion for timeline song visualization and analysis,’’ IEEE Access, vol. 9,
mun. Mobile Comput., vol. 2022, pp. 1–8, May 2022. pp. 18801–18816, 2021.
[18] D. Martín-Gutiérrez, G. H. Penaloza, A. Belmonte-Hernandez, and [41] J. F. Gemmeke, D. P. W. Ellis, D. Freedman, A. Jansen, W. Lawrence,
F. A. Garcia, ‘‘A multimodal end-to-end deep learning architecture for R. C. Moore, M. Plakal, and M. Ritter, ‘‘Audio set: An ontology and
music popularity prediction,’’ IEEE Access, vol. 8, pp. 39361–39374, human-labeled dataset for audio events,’’ in Proc. IEEE Int. Conf. Acoust.,
2020. Speech Signal Process. (ICASSP), Mar. 2017, pp. 776–780.
[42] W. Zhao, Y. Zhou, Y. Tie, and Y. Zhao, ‘‘Recurrent neural network for [68] M. Ramona, G. Richard, and B. David, ‘‘Vocal detection in music with
MIDI music emotion classification,’’ in Proc. IEEE 3rd Adv. Inf. Technol., support vector machines,’’ in Proc. IEEE Int. Conf. Acoust., Speech Signal
Electron. Autom. Control Conf. (IAEAC), Oct. 2018, pp. 2596–2600. Process., Mar. 2008, pp. 1885–1888.
[43] S. Rajesh and N. J. Nalini, ‘‘Musical instrument emotion recognition [69] X. Zhang, Y. Yu, Y. Gao, X. Chen, and W. Li, ‘‘Research on singing
using deep recurrent neural network,’’ Proc. Comput. Sci., vol. 167, voice detection based on a long-term recurrent convolutional network
pp. 16–25, Jan. 2020. with vocal separation and temporal smoothing,’’ Electronics, vol. 9, no. 9,
[44] A. Vall, M. Quadrana, M. Schedl, and G. Widmer, ‘‘The importance of p. 1458, Sep. 2020.
song context and song order in automated music playlist generation,’’ [70] (2012). Rwc Pop Music Dataset. Accessed: Sep. 30, 2022. [Online].
2018, arXiv:1807.04690. Available: https://staff.aist.go.jp/m.goto/RWC-MDB/
[45] B. McFee and G. R. Lanckriet, ‘‘Hypergraph models of playlist dialects,’’ [71] R. M. Bittner, J. Salamon, M. Tierney, M. Mauch, C. Cannam, and
in Proc. ISMIR. Pennsylvania, PA, USA: Citeseer, vol. 12, 2012, J. P. Bello, ‘‘MedleyDB: A multitrack dataset for annotation-intensive
pp. 343–348. MIR research,’’ in Proc. ISMIR, vol. 14, 2014, pp. 155–160.
[46] 8TRACKS. Accessed: Sep. 30, 2022. [Online]. Available: https://8tracks. [72] iKala Dataset. Accessed: Sep. 30, 2022. [Online]. Available:
com/ https://paperswithcode.com/dataset/ikala
[47] S. Kang, J.-S. Park, and G.-J. Jang, ‘‘Improving singing voice separation [73] Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, ‘‘A survey of convolutional
using curriculum learning on recurrent neural networks,’’ Appl. Sci., neural networks: Analysis, applications, and prospects,’’ IEEE Trans.
vol. 10, no. 7, p. 2465, Apr. 2020. Neural Netw. Learn. Syst., vol. 33, no. 12, pp. 6999–7019, Dec. 2021.
[48] Mir-1K Dataset. Accessed: Sep. 30, 2022. [Online]. Available: [74] T. Hirvonen, ‘‘Classification of spatial audio location and content using
https://sites.google.com/site/unvoicedsoundseparation/mir-1k convolutional neural networks,’’ in Proc. 138th Audio Eng. Soc. Conv.,
[49] A. Liutkus, D. Fitzgerald, and Z. Rafii, ‘‘Scalable audio separation with 2015, pp. 1–10.
light kernel additive modelling,’’ in Proc. IEEE Int. Conf. Acoust., Speech
[75] S. Hershey, S. Chaudhuri, D. P. W. Ellis, J. F. Gemmeke, A. Jansen,
Signal Process. (ICASSP), Apr. 2015, pp. 76–80.
R. C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold, M. Slaney,
[50] Z. Rafii, A. Liutkus, F.-R. Stöter, S. I. Mimilakis, and R. Bittner, ‘‘The R. J. Weiss, and K. Wilson, ‘‘CNN architectures for large-scale audio
MUSDB18 corpus for music separation,’’ Dec. 2017, doi: 10.5281/zen- classification,’’ in Proc. IEEE Int. Conf. Acoust., Speech Signal Process.
odo.1117372. (ICASSP), Mar. 2017, pp. 131–135.
[51] J. Li, ‘‘Automatic piano harmony arrangement system based on deep
[76] Y. Aytar, C. Vondrick, and A. Torralba, ‘‘SoundNet: Learning sound
learning,’’ J. Sensors, vol. 2022, pp. 1–13, Jul. 2022.
representations from unlabeled video,’’ in Proc. Adv. Neural Inf. Process.
[52] F. Zhang, ‘‘Research on music classification technology based on deep Syst., vol. 29, 2016, pp. 1–9.
learning,’’ Secur. Commun. Netw., vol. 2021, pp. 1–8, Dec. 2021.
[77] C. N. Silla Jr., A. L. Koerich, and C. A. Kaestner, ‘‘The Latin music
[53] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’ Neural database,’’ in Proc. ISMIR, 2008, pp. 451–456.
Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
[78] Royal Museum of Central-Africa (RMCA). Accessed: Sep. 30, 2022.
[54] J. Dai, S. Liang, W. Xue, C. Ni, and W. Liu, ‘‘Long short-term memory
[Online]. Available: https://www.africamuseum.be/en
recurrent neural network based segment features for music genre classifi-
cation,’’ in Proc. 10th Int. Symp. Chin. Spoken Lang. Process. (ISCSLP), [79] J. Lee, J. Park, K. Luke Kim, and J. Nam, ‘‘Sample-level deep convo-
Oct. 2016, pp. 1–5. lutional neural networks for music auto-tagging using raw waveforms,’’
2017, arXiv:1703.01789.
[55] (2004). Ismir Genre Dataset. Accessed: Sep. 30, 2022. [Online]. Avail-
able: https://ismir2004.ismir.net [80] T. Bertin-Mahieux, D. P. Ellis, B. Whitman, and P. Lamere, ‘‘The mil-
lion song dataset,’’ in Proc. 12th Int. Soc. Music Inf. Retr. Conf., 2011,
[56] S. K. Prabhakar and S.-W. Lee, ‘‘Holistic approaches to music genre
pp. 591–596.
classification using efficient transfer and deep learning techniques,’’ Exp.
Syst. Appl., vol. 211, Jan. 2023, Art. no. 118636. [81] L. Qiu, S. Li, and Y. Sung, ‘‘3D-DCDAE: Unsupervised music latent rep-
[57] G. Tzanetakis and P. Cook, ‘‘Musical genre classification of audio sig- resentations learning method based on a deep 3D convolutional denoising
nals,’’ IEEE Trans. Speech Audio Process., vol. 10, no. 5, pp. 293–302, autoencoder for music genre classification,’’ Mathematics, vol. 9, no. 18,
Jul. 2002. p. 2274, Sep. 2021.
[58] (2013). The MagnaTagATune Dataset. Accessed: Sep. 30, 2022. [Online]. [82] (2004). The Lakh Midi Dataset. Accessed: Sep. 30, 2022. [Online].
Available: https://mirg.city.ac.uk/codeapps/the-magnatagatune-dataset Available: https://colinraffel.com/projects/lmd
[59] X. Li, H. Xianyu, J. Tian, W. Chen, F. Meng, M. Xu, and L. Cai, ‘‘A deep [83] C. Raffel, ‘‘Learning-based methods for comparing sequences, with
bidirectional long short-term memory based multi-scale approach for applications to audio-to-midi alignment and matching,’’ Ph.D. thesis,
music dynamic emotion prediction,’’ in Proc. IEEE Int. Conf. Acoust., Columbia Univ., New York, NY, USA, 2016, doi: 10.7916/D8N58MHV.
Speech Signal Process. (ICASSP), Mar. 2016, pp. 544–548. [84] B. Stasiak and J. Mońko, ‘‘Analysis of time-frequency representations for
[60] A. Aljanaki, Y.-H. Yang, and M. Soleymani, ‘‘Emotion in music task at musical onset detection with convolutional neural network,’’ in Proc. Ann.
mediaeval 2015,’’ in Proc. MediaEval Workshop, 2015, pp. 1–3. Comput. Sci. Inf. Syst., Oct. 2016, pp. 147–152.
[61] S. Uhlich, M. Porcu, F. Giron, M. Enenkl, T. Kemp, N. Takahashi, and [85] H. Phan, L. Hertel, M. Maass, and A. Mertins, ‘‘Robust audio event
Y. Mitsufuji, ‘‘Improving music source separation based on deep neural recognition with 1-max pooling convolutional neural networks,’’ 2016,
networks through data augmentation and network blending,’’ in Proc. arXiv:1604.06338.
IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), Mar. 2017, [86] S. Nakamura, K. Hiyane, F. Asano, T. Yamada, and T. Endo, ‘‘Data col-
pp. 261–265. lection in real acoustical environments for sound scene understanding and
[62] SiSEC DSD100. Accessed: Sep. 30, 2022. [Online]. Available: hands-free speech recognition,’’ in Proc. 6th Eur. Conf. Speech Commun.
https://sisec.inria.fr/sisec-2016/2016-professionally-produced-music- Technol. (EUROSPEECH), 1999, pp. 1–4.
recordings/ [87] A. Varga and H. J. M. Steeneken, ‘‘Assessment for automatic speech
[63] W. Gong and Q. Yu, ‘‘A deep music recommendation method based on recognition: II. NOISEX-92: A database and an experiment to study the
human motion analysis,’’ IEEE Access, vol. 9, pp. 26290–26300, 2021. effect of additive noise on speech recognition systems,’’ Speech Com-
[64] T. Tang, J. Jia, and H. Mao, ‘‘Dance with melody: An LSTM-autoencoder mun., vol. 12, no. 3, pp. 247–251, Jul. 1993.
approach to music-oriented dance synthesis,’’ in Proc. 26th ACM Int. [88] K. W. E. Lin, B. T. Balamurali, E. Koh, S. Lui, and D. Herremans,
Conf. Multimedia, Oct. 2018, pp. 1598–1606. ‘‘Singing voice separation using a deep convolutional neural network
[65] R. Romero-Arenas, A. Gómez-Espinosa, and B. Valdés-Aguirre, trained by ideal binary mask and cross entropy,’’ Neural Comput. Appl.,
‘‘Singing voice detection in electronic music with a long-term recurrent vol. 32, no. 4, pp. 1037–1050, Feb. 2020.
convolutional network,’’ Appl. Sci., vol. 12, no. 15, p. 7405, Jul. 2022. [89] A. Liutkus, F.-R. Stöter, Z. Rafii, D. Kitamura, B. Rivet, N. Ito, N. Ono,
[66] TheFatRat. (2016). The Arcadium. Accessed: Sep. 30, 2022. [Online]. and J. Fontecave, ‘‘The 2016 signal separation evaluation campaign,’’ in
Available: https://www.youtube.com/c/TheArcadium Proc. Int. Conf. Latent Variable Anal. Signal Separat. Cham, Switzerland:
[67] B. Woodford. (2011). NCS (No Copytight Sounds)—Free Music for Con- Springer, 2017, pp. 323–332.
tent Creators. Accessed: Sep. 30, 2022. [Online]. Available: https://www. [90] Y. Liusong and D. Hui, ‘‘Voice quality evaluation of singing art based on
ncs.io/ 1DCNN model,’’ Math. Problems Eng., vol. 2022, pp. 1–9, Jul. 2022.
[91] P. Li, J. Qian, and T. Wang, ‘‘Automatic instrument recognition [116] N. Li, ‘‘Generative adversarial network for musical notation recognition
in polyphonic music using convolutional neural networks,’’ 2015, during music teaching,’’ Comput. Intell. Neurosci., vol. 2022, pp. 1–9,
arXiv:1511.05520. Jun. 2022.
[92] V. Lostanlen and C.-E. Cella, ‘‘Deep convolutional networks on the pitch [117] G. Hadjeres, F. Pachet, and F. Nielsen, ‘‘DeepBach: A steerable model
spiral for musical instrument recognition,’’ 2016, arXiv:1605.06644. for Bach chorales generation,’’ in Proc. Int. Conf. Mach. Learn., 2017,
[93] D. Mukhedkar, ‘‘Polyphonic music instrument detection on weakly pp. 1362–1371.
labelled data using sequence learning models,’’ School Elect. Eng. Com- [118] C. Hawthorne, A. Stasyuk, A. Roberts, I. Simon, C.-Z. A. Huang,
put. Sci., KTH Roy. Inst. Technol., Stockholm, Sweden, 2020. S. Dieleman, E. Elsen, J. Engel, and D. Eck, ‘‘Enabling factorized piano
[94] E. Humphrey, S. Durand, and B. McFee, ‘‘OpenMIC-2018: An open music modeling and generation with the MAESTRO dataset,’’ 2018,
data-set for multiple instrument recognition,’’ in Proc. ISMIR, 2018, arXiv:1810.12247.
pp. 438–444. [119] C.-Z. Anna Huang, C. Hawthorne, A. Roberts, M. Dinculescu, J. Wexler,
[95] A. Wise, A. S. Maida, and A. Kumar, ‘‘Attention augmented CNNs for L. Hong, and J. Howcroft, ‘‘The bach doodle: Approachable music com-
musical instrument identification,’’ in Proc. 29th Eur. Signal Process. position with machine learning at scale,’’ 2019, arXiv:1907.06637.
Conf. (EUSIPCO), Aug. 2021, pp. 376–380. [120] Z.-C. Fan, Y.-L. Lai, and J.-S.-R. Jang, ‘‘SVSGAN: Singing voice separa-
[96] London Philharmonic Orchestra Dataset. Accessed: Sep. 30, 2022. tion via generative adversarial network,’’ in Proc. IEEE Int. Conf. Acoust.,
[Online]. Available: https://philharmonia.co.uk/resources/sound- Speech Signal Process. (ICASSP), Apr. 2018, pp. 726–730.
samples/ [121] K. Choi, G. Fazekas, M. Sandler, and K. Cho, ‘‘Convolutional recur-
[97] University of Iowa Musical Instrument Samples. Accessed: Sep. 30, 2022. rent neural networks for music classification,’’ in Proc. IEEE Int. Conf.
[Online]. Available: https://theremin.music.uiowa.edu/MIS.html Acoust., Speech Signal Process. (ICASSP), Mar. 2017, pp. 2392–2396.
[98] M. Blaszke and B. Kostek, ‘‘Musical instrument identification using deep [122] A. A. S. Gunawan and D. Suhartono, ‘‘Music recommender system based
learning approach,’’ Sensors, vol. 22, no. 8, p. 3033, Apr. 2022. on genre using convolutional recurrent neural networks,’’ Proc. Comput.
[99] E. Manilow, G. Wichern, P. Seetharaman, and J. Le Roux, ‘‘Cutting music Sci., vol. 157, pp. 99–109, Jan. 2019.
source separation some Slakh: A dataset to study the impact of training [123] M. Defferrard, K. Benzi, P. Vandergheynst, and X. Bresson, ‘‘FMA: A
data quality and quantity,’’ in Proc. IEEE Workshop Appl. Signal Process. dataset for music analysis,’’ 2016, arXiv:1612.01840.
to Audio Acoust. (WASPAA), Oct. 2019, pp. 1–7. [124] C. Chen and Q. Li, ‘‘A multimodal music emotion classification method
[100] X. Liu, Q. Chen, X. Wu, Y. Liu, and Y. Liu, ‘‘CNN based music emotion based on multifeature combined network classifier,’’ Math. Problems
classification,’’ 2017, arXiv:1704.05665. Eng., vol. 2020, pp. 1–11, Aug. 2020.
[101] S.-Y. Wang, J.-C. Wang, Y.-H. Yang, and H.-M. Wang, ‘‘Towards time- [125] S. Hizlisoy, S. Yildirim, and Z. Tufekci, ‘‘Music emotion recognition
varying music auto-tagging based on CAL500 expansion,’’ in Proc. IEEE using convolutional long short term memory deep neural networks,’’ Eng.
Int. Conf. Multimedia Expo. (ICME), Jul. 2014, pp. 1–6. Sci. Technol., Int. J., vol. 24, no. 3, pp. 760–767, Jun. 2021.
[102] T. Ciborowski, S. Reginis, D. Weber, A. Kurowski, and B. Kostek, ‘‘Clas- [126] X. Jia, ‘‘Music emotion classification method based on deep learning and
sifying emotions in film music—A deep learning approach,’’ Electronics, improved attention mechanism,’’ Comput. Intell. Neurosci., vol. 2022,
vol. 10, no. 23, p. 2955, Nov. 2021. pp. 1–8, Jun. 2022.
[103] Epidemic Sound. Accessed: Sep. 30, 2022. [Online]. Available: [127] M. Liang, ‘‘Music score recognition and composition application based
https://www.epidemicsound.com/ on deep learning,’’ Math. Problems Eng., vol. 2022, pp. 1–9, Jun. 2022.
[104] A. Vall, M. Dorfer, H. Eghbal-zadeh, M. Schedl, K. Burjorjee, and [128] (2012). Musescore. Accessed: Sep. 30, 2022. [Online]. Available:
G. Widmer, ‘‘Feature-combination hybrid recommender systems for https://musescore.org/en
automated music playlist continuation,’’ User Model. User-Adapted
[129] G. Parascandolo, H. Huttunen, and T. Virtanen, ‘‘Recurrent neural net-
Interact., vol. 29, no. 2, pp. 527–572, Apr. 2019.
works for polyphonic sound event detection in real life recordings,’’
[105] Art of the Mix. Accessed: Sep. 30, 2022. [Online]. Available:
in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP),
http://www.artofthemix.org/
Mar. 2016, pp. 6440–6444.
[106] M. Sheikh Fathollahi and F. Razzazi, ‘‘Music similarity measurement
[130] T. Heittola, A. Mesaros, A. Eronen, and T. Virtanen, ‘‘Audio context
and recommendation system using convolutional neural networks,’’ Int.
recognition using audio event histograms,’’ in Proc. 18th Eur. Signal
J. Multimedia Inf. Retr., vol. 10, no. 1, pp. 43–53, Mar. 2021.
Process. Conf., 2010, pp. 1272–1276.
[107] M. Zentner, D. Grandjean, and K. R. Scherer, ‘‘Emotions evoked by
[131] O. Bulayenko, J. Quintais, D. J. Gervais, and J. Poort. (2022). AI Music
the sound of music: Characterization, classification, and measurement,’’
Outputs: Challenges to the Copyright Legal Framework. [Online]. Avail-
Emotion, vol. 8, no. 4, pp. 494–521, 2008.
able: https://ssrn.com/abstract=4072806
[108] H. Homburg, I. Mierswa, B. Möller, K. Morik, and M. Wurst, ‘‘A bench-
[132] R. B. Abbott and E. Rothman, ‘‘Disrupting creativity: Copyright law in
mark dataset for audio classification and clustering,’’ in Proc. ISMIR,
the age of generative artificial intelligence,’’ Aug. 2022. [Online]. Avail-
2005, pp. 31–528.
able: https://ssrn.com/abstract=4185327, doi: 10.2139/ssrn.4185327.
[109] H. Gao, ‘‘Automatic recommendation of online music tracks based on
deep learning,’’ Math. Problems Eng., vol. 2022, pp. 1–8, Jun. 2022. [133] S. Mehri, K. Kumar, I. Gulrajani, R. Kumar, S. Jain, J. Sotelo,
A. Courville, and Y. Bengio, ‘‘SampleRNN: An unconditional end-to-end
[110] J. S. Downie, K. West, A. Ehmann, and E. Vincent, ‘‘The 2005 music
neural audio generation model,’’ 2016, arXiv:1612.07837.
information retrieval evaluation exchange (MIREX 2005): Preliminary
overview,’’ in Proc. 6th Int. Conf. Music Inf. Retr. (ISMIR), 2005, [134] The Internet Archive. Accessed: Sep. 30, 2022. [Online]. Available:
pp. 320–323. https://archive.org/
[111] K. W. Cheuk, H. Anderson, K. Agres, and D. Herremans, ‘‘NnAudio: An [135] M. S. Cuthbert and C. Ariza, ‘‘music21: A toolkit for computer-aided
on-the-fly GPU audio to spectrogram conversion toolbox using 1D con- musicology and symbolic music data,’’ in Proc. 11th Int. Soc. Music Inf.
volutional neural networks,’’ IEEE Access, vol. 8, pp. 161981–162003, Retr. Conf. (ISMIR), 2010, pp. 637–642.
2020. [136] G. Hadjeres and F. Nielsen, ‘‘Interactive music generation with positional
[112] J. Thickstun, Z. Harchaoui, and S. Kakade, ‘‘Learning features of music constraints using anticipation-RNNs,’’ 2017, arXiv:1709.06404.
from scratch,’’ 2016, arXiv:1611.09827. [137] C. Benetatos, J. VanderStel, and Z. Duan, ‘‘BachDuet: A deep learning
[113] B. McFee, C. Raffel, D. Liang, D. Ellis, M. McVicar, E. Battenberg, system for human-machine counterpoint improvisation,’’ in Proc. Int.
and O. Nieto, ‘‘Librosa: Audio and music signal analysis in Python,’’ Conf. New Interfaces Musical Expression, 2020, pp. 1–6.
in Proc. 14th Python Sci. Conf. Pennsylvania, PA, USA: Citeseer, 2015, [138] P. Hutchings, ‘‘Talking drums: Generating drum grooves with neural
pp. 18–25. networks,’’ 2017, arXiv:1706.09558.
[114] G. Ian, J. Pouget-Abadie, M. Mirza, B. Xu, and D. Warde-Farley, ‘‘Gen- [139] B. Genchel, A. Pati, and A. Lerch, ‘‘Explicitly conditioned melody
erative adversarial nets,’’ in Proc. Adv. Neural Inf. Process. Syst., 2014, generation: A case study with interdependent RNNs,’’ 2019,
pp. 1–9. arXiv:1907.05208.
[115] I.-S. Huang, Y.-H. Lu, M. Shafiq, A. Ali Laghari, and R. Yadav, ‘‘A gen- [140] Folkdb. Accessed: Sep. 30, 2022. [Online]. Available: https://github.com/
erative adversarial network model based on intelligent data analytics IraKorshunova/folk-rnn/tree/master/data
for music emotion recognition under IoT,’’ Mobile Inf. Syst., vol. 2021, [141] A. Pati, A. Lerch, and G. Hadjeres, ‘‘Learning to traverse latent spaces
pp. 1–8, Nov. 2021. for musical score inpainting,’’ 2019, arXiv:1907.01164.
[142] The Session. Accessed: Sep. 30, 2022. [Online]. Available: [170] Y. Su, R. Han, X. Wu, Y. Zhang, and Y. Li, ‘‘Folk melody generation based
https://thesession.org/ on CNN-BiGRU and self-attention,’’ in Proc. 4th Int. Conf. Commun., Inf.
[143] S. Agarwal, V. Saxena, V. Singal, and S. Aggarwal, ‘‘LSTM based music Syst. Comput. Eng. (CISCE), May 2022, pp. 363–368.
generation with dataset preprocessing and reconstruction techniques,’’ in [171] ESAC. Accessed: Sep. 30, 2022. [Online]. Available: http://www.esac-
Proc. IEEE Symp. Ser. Comput. Intell. (SSCI), Nov. 2018, pp. 455–462. data.org/
[144] H. Lim, S. Rhyu, and K. Lee, ‘‘Chord generation from symbolic melody [172] Y. Huang, X. Huang, and Q. Cai, ‘‘Music generation based on
using BLSTM networks,’’ 2017, arXiv:1712.01011. convolution-LSTM,’’ Comput. Inf. Sci., vol. 11, no. 3, pp. 50–56, 2018.
[145] Wikifonia Subset Dataset. Accessed: Sep. 30, 2022. [Online]. Available: [173] Midiworld. Accessed: Sep. 30, 2022. [Online]. Available: https://www.
http://marg.snu.ac.kr/chord_generation/ midiworld.com/
[146] H. H. Tan, ‘‘ChordAL: A chord-based approach for music generation [174] S. M. Tony and S. Sasikumar, ‘‘Generative adversarial network for music
using Bi-LSTMs,’’ in Proc. ICCC, 2019, pp. 364–365. generation,’’ in High Performance Computing and Networking. Cham,
[147] Nottingham Dataset. Accessed: Sep. 30, 2022. [Online]. Available: Switzerland: Springer, pp. 109–119, 2022.
https://ifdo.ca/~seymour/nottingham/nottingham.html [175] S. Li and Y. Sung, ‘‘INCO-GAN: Variable-length music generation
method based on inception model-based conditional GAN,’’ Mathemat-
[148] Mcgill-Billboard Chord Annotations. Accessed: Sep. 30, 2022. [Online].
ics, vol. 9, no. 4, p. 387, Feb. 2021.
Available: https://ddmal.music.mcgill.ca/research/SALAMI/
[176] G. Brunner, Y. Wang, R. Wattenhofer, and S. Zhao, ‘‘Symbolic music
[149] W. Yang, P. Sun, Y. Zhang, and Y. Zhang, ‘‘CLSTMS: A combination
genre transfer with CycleGAN,’’ in Proc. IEEE 30th Int. Conf. Tools Artif.
of two LSTM models to generate chords accompaniment for symbolic
Intell. (ICTAI), Nov. 2018, pp. 786–793.
melody,’’ in Proc. Int. Conf. High Perform. Big Data Intell. Syst. (HPB-
[177] J. Nistal, S. Lattner, and G. Richard, ‘‘DrumGAN: Synthesis of drum
DIS), May 2019, pp. 176–180.
sounds with timbral feature conditioning using generative adversarial
[150] H. H. Mao, T. Shin, and G. Cottrell, ‘‘DeepJ: Style-specific music genera- networks,’’ 2020, arXiv:2008.12073.
tion,’’ in Proc. IEEE 12th Int. Conf. Semantic Comput. (ICSC), Jan. 2018, [178] J. Engel, K. Krishna Agrawal, S. Chen, I. Gulrajani, C. Donahue, and
pp. 377–382. A. Roberts, ‘‘GANSynth: Adversarial neural audio synthesis,’’ 2019,
[151] Classical Piano-Midi Dataset. Accessed: Sep. 30, 2022. [Online]. Avail- arXiv:1902.08710.
able: http://piano-midi.de/ [179] J. Engel, C. Resnick, A. Roberts, S. Dieleman, M. Norouzi, D. Eck, and
[152] C. D. Boom, S. V. Laere, T. Verbelen, and B. Dhoedt, ‘‘Rhythm, chord and K. Simonyan, ‘‘Neural audio synthesis of musical notes with wavenet
melody generation for lead sheets using recurrent neural networks,’’ in autoencoders,’’ in Proc. Int. Conf. Mach. Learn., 2017, pp. 1068–1077.
Proc. Joint Eur. Conf. Mach. Learn. Knowl. Discovery Databases. Cham, [180] F. Guan, C. Yu, and S. Yang, ‘‘A GAN model with self-attention mech-
Switzerland: Springer, 2019, pp. 454–461. anism to generate multi-instruments symbolic music,’’ in Proc. Int. Joint
[153] Q. Lyu, Z. Wu, and J. Zhu, ‘‘Polyphonic music modelling with LSTM- Conf. Neural Netw. (IJCNN), Jul. 2019, pp. 1–6.
RTRBM,’’ in Proc. 23rd ACM Int. Conf. Multimedia, Oct. 2015, [181] L.-C. Yang, S.-Y. Chou, and Y.-H. Yang, ‘‘MidiNet: A convolutional
pp. 991–994. generative adversarial network for symbolic-domain music generation,’’
[154] Musedata. Accessed: Sep. 30, 2022. [Online]. Available: 2017, arXiv:1703.10847.
https://musedata.org/ [182] Theorytab. Accessed: Sep. 30, 2022. [Online]. Available: https://www.
[155] Johann Sebastian Bach Chorales Dataset. Accessed: Sep. 30, 2022. hooktheory.com/theorytab
[Online]. Available: https://github.com/czhuang/JSB-Chorales-dataset [183] H.-W. Dong, W.-Y. Hsiao, L.-C. Yang, and Y.-H. Yang, ‘‘MuseGAN:
[156] D. D. Johnson, ‘‘Generating polyphonic music using tied parallel net- Multi-track sequential generative adversarial networks for symbolic
works,’’ in Proc. Int. Conf. Evol. Biologically Inspired Music Art. Cham, music generation and accompaniment,’’ in Proc. AAAI Conf. Artif. Intell.,
Switzerland: Springer, pp. 128–143, 2017. vol. 32, 2018, pp. 1–8.
[157] M. Liang, ‘‘An improved music composing technique based on neural [184] L. Yu, W. Zhang, J. Wang, and Y. Yu, ‘‘SeqGan: Sequence generative
network model,’’ Mobile Inf. Syst., vol. 2022, pp. 1–10, Jul. 2022. adversarial nets with policy gradient,’’ in Proc. AAAI Conf. Artif. Intell.,
[158] K. Choi, J. Park, W. Heo, S. Jeon, and J. Park, ‘‘Chord conditioned vol. 31, 2017, pp. 1–7.
melody generation with transformer based decoders,’’ IEEE Access, [185] S.-G. Lee, U. Hwang, S. Min, and S. Yoon, ‘‘Polyphonic music
vol. 9, pp. 42071–42080, 2021. generation with sequence generative adversarial networks,’’ 2017,
[159] R. Yang, D. Wang, Z. Wang, T. Chen, J. Jiang, and G. Xia, arXiv:1710.11418.
‘‘Deep music analogy via latent representation disentanglement,’’ 2019, [186] A. Marafioti, P. Majdak, N. Holighaus, and N. Perraudin, ‘‘GACELA:
arXiv:1906.03626. A generative adversarial context encoder for long audio inpainting of
music,’’ IEEE J. Sel. Topics Signal Process., vol. 15, no. 1, pp. 120–131,
[160] P. S. Yadav, S. Khan, Y. V. Singh, P. Garg, and R. S. Singh, ‘‘A lightweight
Jan. 2020.
deep learning-based approach for jazz music generation in MIDI format,’’
[187] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
Comput. Intell. Neurosci., vol. 2022, pp. 1–7, Aug. 2022.
Ł. Kaiser, and I. Polosukhin, ‘‘Attention is all you need,’’ in Proc. Adv.
[161] G. Keerti, A. Vaishnavi, P. Mukherjee, A. S. Vidya, G. S. Sreenithya,
Neural Inf. Process. Syst., vol. 30, 2017, pp. 1–11.
and D. Nayab, ‘‘Attentional networks for music generation,’’ Multimedia
[188] C.-Z. A. Huang, A. Vaswani, J. Uszkoreit, N. Shazeer, I. Simon,
Tools Appl., vol. 81, no. 4, pp. 5179–5189, 2022.
C. Hawthorne, A. M. Dai, M. D. Hoffman, M. Dinculescu, and D. Eck,
[162] Jazz ML Ready Midi. Accessed: Sep. 30, 2022. [Online]. Available: ‘‘Music transformer,’’ 2018, arXiv:1809.04281.
https://www.kaggle.com/datasets/saikayala/jazz-ml-ready-midi [189] Piano-E-Competition Dataset. Accessed: Sep. 30, 2022. [Online]. Avail-
[163] O. Yadav, D. Fernandes, V. Dube, and M. D’Souza, ‘‘Apollo: A classical able: https://www.piano-e-competition.com/
piano composer using long short-term memory,’’ IETE J. Educ., vol. 62, [190] N. Zhang, ‘‘Learning adversarial transformer for symbolic music gener-
no. 2, pp. 60–70, Jul. 2021. ation,’’ IEEE Trans. Neural Netw. Learn. Syst., early access, Jul. 2, 2020,
[164] Classical Music Midi—Kaggle. Accessed: Sep. 30, 2022. doi: 10.1109/TNNLS.2020.2990746.
[Online]. Available: https://www.kaggle.com/datasets/soumikrakshit/ [191] R. Child, S. Gray, A. Radford, and I. Sutskever, ‘‘Generating long
classical-music-midi sequences with sparse transformers,’’ 2019, arXiv:1904.10509.
[165] Midi Classic Music—Kaggle. Accessed: Sep. 30, 2022. [Online]. Avail- [192] S. Dieleman, A. V. D. Oord, and K. Simonyan, ‘‘The challenge of realistic
able: https://www.kaggle.com/datasets/blanderbuss/midi-classic-music music generation: Modelling raw audio at scale,’’ in Proc. Adv. Neural Inf.
[166] D. Makris, M. Kaliakatsos-Papakostas, I. Karydis, and K. L. Kermanidis, Process. Syst., vol. 31, 2018, pp. 1–11.
‘‘Conditional neural sequence learners for generating drums’ rhythms,’’ [193] Y.-S. Huang and Y.-H. Yang, ‘‘Pop music transformer: Beat-based model-
Neural Comput. Appl., vol. 31, no. 6, pp. 1793–1804, Jun. 2019. ing and generation of expressive pop piano compositions,’’ in Proc. 28th
[167] 911TABS. [Online]. Accessed: Sep. 30, 2022. Available: ACM Int. Conf. Multimedia, Oct. 2020, pp. 1180–1188.
https://www.911tabs.com/ [194] P. Dhariwal, H. Jun, C. Payne, J. Wook Kim, A. Radford, and I. Sutskever,
[168] Z. Cheddad and A. Cheddad, ‘‘ARMAS: Active reconstruction of missing ‘‘Jukebox: A generative model for music,’’ 2020, arXiv:2005.00341.
audio segments,’’ 2021, arXiv:2111.10891. [195] Y.-J. Shih, S.-L. Wu, F. Zalkow, M. Müller, and Y.-H. Yang, ‘‘Theme
[169] C. Jin, Y. Tie, Y. Bai, X. Lv, and S. Liu, ‘‘A style-specific music composi- transformer: Symbolic music generation with theme-conditioned trans-
tion neural network,’’ Neural Process. Lett., vol. 52, no. 3, pp. 1893–1912, former,’’ IEEE Trans. Multimedia, early access, Mar. 23, 2022, doi:
Dec. 2020. 10.1109/TMM.2022.3161851.
[196] Z. Wang, K. Chen, J. Jiang, Y. Zhang, M. Xu, S. Dai, X. Gu, and G. Xia, [221] X. Liang, Z. Li, J. Liu, W. Li, J. Zhu, and B. Han, ‘‘Constructing a
‘‘POP909: A pop-song dataset for music arrangement generation,’’ 2020, multimedia Chinese musical instrument database,’’ in Proc. 6th Conf.
arXiv:2008.07142. Sound Music Technol. (CSMT). Singapore: Springer, 2019, pp. 53–60.
[197] D. Makris, G. Zixun, M. Kaliakatsos-Papakostas, and D. Herremans, [222] A. K. Sharma, G. Aggarwal, S. Bhardwaj, P. Chakrabarti, T. Chakrabarti,
‘‘Conditional drums generation using compound word representations,’’ J. H. Abawajy, S. Bhattacharyya, R. Mishra, A. Das, and H. Mahdin,
in Proc. Int. Conf. Comput. Intell. Music, Sound, Art Design (EvoStar). ‘‘Classification of Indian classical music with time-series matching deep
Cham, Switzerland: Springer, 2022, pp. 179–194. learning approach,’’ IEEE Access, vol. 9, pp. 102041–102052, 2021.
[198] S. Rhyu, H. Choi, S. Kim, and K. Lee, ‘‘Translating melody to chord: [223] B. S. Gowrishankar and N. U. Bhajantri, ‘‘Deep learning long short-term
Structured and flexible harmonization of melody with transformer,’’ IEEE memory based automatic music transcription system for carnatic music,’’
Access, vol. 10, pp. 28261–28273, 2022. in Proc. IEEE Int. Conf. Distrib. Comput. Electr. Circuits Electron. (ICD-
[199] Chord Melody Dataset. Accessed: Sep. 30, 2022. [Online]. Available: CECE), Apr. 2022, pp. 1–6.
https://github.com/shiehn/chord-melody-dataset [224] D. Makris, I. Karydis, and S. Sioutas, ‘‘The Greek music dataset,’’ in Proc.
[200] Hooktheory Lead Sheet Dataset. Accessed: Sep. 30, 2022. [Online]. 16th Int. Conf. Eng. Appl. Neural Netw. (INNS), Sep. 2015, pp. 1–7.
Available: https://www.hooktheory.com/ [225] Thrace and Macedonia. Accessed: Sep. 30, 2022. [Online]. Available:
[201] M. Ashraf, G. Geng, X. Wang, F. Ahmad, and F. Abid, ‘‘A globally reg- http://epth.sfm.gr/
ularized joint neural architecture for music classification,’’ IEEE Access, [226] M. K. Karaosmanoğlu, ‘‘A Turkish makam music symbolic database for
vol. 8, pp. 220980–220989, 2020. music information retrieval: SymbTr,’’ in Proc. 13th Int. Soc. Music Inf.
[202] Y. V. Koteswararao and C. B. Rama Rao, ‘‘An efficient optimal recon- Retr. Conf. Porto, Portugal: International Society for Music Information
struction based speech separation based on hybrid deep learning tech- Retrieval (ISMIR), Oct. 2012, pp. 223–228.
nique,’’ Defence Sci. J., vol. 72, no. 3, pp. 417–428, Jul. 2022. [227] X. Gong, Y. Zhu, H. Zhu, and H. Wei, ‘‘ChMusic: A traditional Chinese
[203] H. Zhang, S. Kandadai, H. Rao, M. Kim, T. Pruthi, and T. Kristjansson, music dataset for evaluation of instrument recognition,’’ in Proc. 4th Int.
‘‘Deep adaptive AEC: Hybrid of deep learning and adaptive acoustic echo Conf. Big Data Technol., Sep. 2021, pp. 184–189.
cancellation,’’ in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. [228] P. Cao, ‘‘Identification and classification of Chinese traditional musical
(ICASSP), May 2022, pp. 756–760. instruments based on deep learning algorithm,’’ in Proc. 2nd Int. Conf.
[204] Y. Ghatas, M. Fayek, and M. Hadhoud, ‘‘A hybrid deep learning approach Comput. Data Sci., Jan. 2021, pp. 1–5.
for musical difficulty estimation of piano symbolic music,’’ Alexandria [229] R. Li and Q. Zhang, ‘‘Audio recognition of Chinese traditional instru-
Eng. J., vol. 61, no. 12, pp. 10183–10196, Dec. 2022. ments based on machine learning,’’ Cognit. Comput. Syst., vol. 4, no. 2,
[205] L. Chen, C. Zhao, Y. Liu, and P. Zhuang, ‘‘A multi-modal joint voice parts pp. 108–115, Jun. 2022.
division method based on deep learning,’’ in Proc. 15th Int. Symp. Med. [230] K. Xu, ‘‘Recognition and classification model of music genres and Chi-
Inf. Commun. Technol. (ISMICT), Apr. 2021, pp. 35–40. nese traditional musical instruments based on deep neural networks,’’ Sci.
[206] J. Lin, ‘‘Integrated intelligent drowsiness detection system based on deep Program., vol. 2021, pp. 1–8, Jun. 2021.
learning,’’ in Proc. IEEE Int. Conf. Power, Intell. Comput. Syst. (ICPICS), [231] J. Li, J. Luo, J. Ding, X. Zhao, and X. Yang, ‘‘Regional classification
Jul. 2020, pp. 420–424. of Chinese folk songs based on CRF model,’’ Multimedia Tools Appl.,
[207] S. Bisht, H. T. Kanakia, and P. Thakur, ‘‘Music emotion prediction vol. 78, no. 9, pp. 11563–11584, May 2019.
based on hybrid approach combining lyrical and acoustic approaches,’’ [232] Q. Chen, W. Zhao, Q. Wang, and Y. Zhao, ‘‘The sustainable develop-
in Proc. 6th Int. Conf. Intell. Comput. Control Syst. (ICICCS), May 2022, ment of intangible cultural heritage with AI: Cantonese opera singing
pp. 1656–1660. genre classification based on CoGCNet model in China,’’ Sustainability,
[208] N. C. Thompson, K. Greenewald, K. Lee, and G. F. Manso, ‘‘The com- vol. 14, no. 5, p. 2923, Mar. 2022.
putational limits of deep learning,’’ 2020, arXiv:2007.05558. [233] H. Wang, J. Li, Y. Lin, W. Ru, and J. Wu, ‘‘Generate Xi’an drum music
[209] H. Chen, Y. Wang, C. Xu, B. Shi, C. Xu, Q. Tian, and C. Xu, ‘‘AdderNet: based on compressed coding,’’ in Proc. 40th Chin. Control Conf. (CCC),
Do we really need multiplications in deep learning?’’ in Proc. IEEE/CVF Jul. 2021, pp. 8679–8683.
Conf. Comput. Vis. Pattern Recognit., Jun. 2020, pp. 1468–1477. [234] J. Luo, X. Yang, S. Ji, and J. Li, ‘‘MG-VAE: Deep Chinese folk songs
[210] L. Glitsos, ‘‘Vaporwave, or music optimised for abandoned malls,’’ Pop- generation with specific regional styles,’’ in Proc. 7th Conf. Sound Music
ular Music, vol. 37, no. 1, pp. 100–118, Jan. 2018. Technol. (CSMT). Singapore: Springer, 2020, pp. 93–106.
[211] P. Ballam-Cross, ‘‘Reconstructed nostalgia: Aesthetic commonalities and [235] Z. Xu, ‘‘Construction of intelligent recognition and learning education
self-soothing in chillwave, synthwave, and vaporwave,’’ J. Popular Music platform of national music genre under deep learning,’’ Frontiers Psy-
Stud., vol. 33, no. 1, pp. 70–93, 2021. chol., vol. 13, May 2022, Art. no. 843427.
[212] N. Chauhan, ‘‘Is it possible to programmatically generate Vaporwave?’’ [236] A. Skoki, S. Ljubic, J. Lerga, and I. Štajduhar, ‘‘Automatic music tran-
IndiaRxiv, Mar. 2020. [Online]. Available: http://indiarxiv.org/9um2r, scription for traditional woodwind instruments sopele,’’ Pattern Recognit.
doi: 10.35543/osf.io/9um2r. Lett., vol. 128, pp. 340–347, Dec. 2019.
[213] Wikipedia. List of Cultural and Regional Genres of Music. [237] E. A. Retta, R. Sutcliffe, E. Almekhlafi, Y. K. Enku, E. Alemu,
Accessed: Sep. 30, 2022. [Online]. Available: https://en.wikipedia. T. D. Gemechu, M. A. Berwo, M. Mhamed, and J. Feng, ‘‘Kinit clas-
org/wiki/List_of_cultural_and_regional_genres_of_music sification in Ethiopian chants, azmaris and modern music: A new dataset
[214] S. Shahriar and U. Tariq, ‘‘Classifying maqams of Qur’anic recitations and CNN benchmark,’’ 2022, arXiv:2201.08448.
using deep learning,’’ IEEE Access, vol. 9, pp. 117271–117281, 2021. [238] V. S. Pendyala, N. Yadav, C. Kulkarni, and L. Vadlamudi, ‘‘Towards
[215] P. Kritopoulou, A. Stergiaki, and K. Kokkinidis, ‘‘Optimizing human building a deep learning based automated Indian classical music
computer interaction for byzantine music learning: Comparing HMMs tutor for the masses,’’ Syst. Soft Comput., vol. 4, Dec. 2022,
with RDFs,’’ in Proc. 9th Int. Conf. Modern Circuits Syst. Technol. Art. no. 200042.
(MOCAST), Sep. 2020, pp. 1–4. [239] S. John, M. S. Sinith, R. S. Sudheesh, and P. P. Lalu, ‘‘Classification
[216] N. Bassiou, C. Kotropoulos, and A. Papazoglou-Chalikias, ‘‘Greek folk of Indian classical carnatic music based on raga using deep learning,’’
music classification into two genres using lyrics and audio via canonical in Proc. IEEE Recent Adv. Intell. Comput. Syst. (RAICS), Dec. 2020,
correlation analysis,’’ in Proc. 9th Int. Symp. Image Signal Process. Anal. pp. 110–113.
(ISPA), Sep. 2015, pp. 238–243. [240] S. T. Madhusudhan and G. Chowdhary, ‘‘DeepSRGM—Sequence clas-
[217] E. Fotiadou, N. Bassiou, and C. Kotropoulos, ‘‘Greek folk music classifi- sification and ranking in Indian classical music with deep learning,’’ in
cation using auditory cortical representations,’’ in Proc. 24th Eur. Signal Proc. 20th Int. Soc. Music Inf. Retr. Conf., 2019, pp. 533–540.
Process. Conf. (EUSIPCO), Aug. 2016, pp. 1133–1137. [241] S. Nag, M. Basu, S. Sanyal, A. Banerjee, and D. Ghosh, ‘‘On the appli-
[218] K. Tsoulou, ‘‘Feature-based machine learning techniques towards Greek cation of deep learning and multifractal techniques to classify emotions
folk music classification,’’ M.S. thesis, School Sci. Technol., Int. Hellenic and instruments using Indian classical music,’’ Phys. A, Stat. Mech. Appl.,
Univ., Thermi, Greece, 2020. vol. 597, Jul. 2022, Art. no. 127261.
[219] N. Farajzadeh, N. Sadeghzadeh, and M. Hashemzadeh, ‘‘PMG-Net: Per- [242] A. Krishnan, A. Vincent, G. Jos, and R. Rajan, ‘‘Multimodal fusion for
sian music genre classification using deep neural networks,’’ Entertain- segment classification in folk music,’’ in Proc. IEEE 18th India Council
ment Comput., vol. 44, Jan. 2023, Art. no. 100518. Int. Conf. (INDICON), Dec. 2021, pp. 1–7.
[220] Y. Yang and X. Huang, ‘‘Research based on the application and explo- [243] S. Chowdhuri, ‘‘PhonoNet: Multi-stage deep neural networks for raga
ration of artificial intelligence in the field of traditional music,’’ J. Sen- identification in Hindustani classical music,’’ in Proc. Int. Conf. Multi-
sors, vol. 2022, pp. 1–9, Jul. 2022. media Retr., Jun. 2019, pp. 197–201.
[244] D. P. Shah, N. M. Jagtap, P. T. Talekar, and K. Gawande, ‘‘Raga recog- [266] Y.-J. Hong, J. Han, and H. Ryu, ‘‘The effects of synthesizing music using
nition in Indian classical music using deep learning,’’ in Proc. Int. Conf. AI for preoperative management of Patients’ anxiety,’’ Appl. Sci., vol. 12,
Comput. Intell. Music, Sound, Art Design (EvoStar). Cham, Switzerland: no. 16, p. 8089, Aug. 2022.
Springer, 2021, pp. 248–263. [267] T. Gajȩcki and W. Nogueira, ‘‘Deep learning models to remix music
[245] R. Surana, A. Varshney, and V. Pendyala, ‘‘Deep learning for conversions for cochlear implant users,’’ J. Acoust. Soc. Amer., vol. 143, no. 6,
between melodic frameworks of Indian classical music,’’ in Proc. 2nd Int. pp. 3602–3615, Jun. 2018.
Conf. Adv. Comput. Eng. Commun. Syst. Cham, Switzerland: Springer, [268] J. Singh and A. Ratnawat, ‘‘Algorithmic music generation for the stimula-
2022, pp. 1–12. tion of musical memory in Alzheimer’s,’’ in Proc. 4th Int. Conf. Comput.
[246] I. Ali-MacLachlan, C. Southall, M. Tomczak, and J. Hockman, ‘‘Player Commun. Autom. (ICCCA), Dec. 2018, pp. 1–4.
recognition for traditional Irish flute recordings,’’ in Proc. 8th Int. Work- [269] J. Chen, F. Pan, P. Zhong, T. He, L. Qi, J. Lu, P. He, and Y. Zheng,
shop Folk Music Anal., 2018, pp. 3–8. ‘‘An automatic method to develop music with music segment and long
[247] A. Kolokolova, M. Billard, R. Bishop, M. Elsisy, Z. Northcott, L. Graves,
short term memory for tinnitus music therapy,’’ IEEE Access, vol. 8,
V. Nagisetty, and H. Patey, ‘‘GANs & reels: Creating Irish music using a
pp. 141860–141871, 2020.
generative adversarial network,’’ 2020, arXiv:2010.15772.
[248] B. L. Sturm and O. Ben-Tal, ‘‘Folk the algorithms: (Mis) applying artifi- [270] Y. Heping and W. Bin, ‘‘Online music-assisted rehabilitation
cial intelligence to folk music,’’ in Handbook of Artificial Intelligence for system for depressed people based on deep learning,’’ Prog.
Music. Cham, Switzerland: Springer, pp. 423–454, 2021. Neuro-Psychopharmacology Biol. Psychiatry, vol. 119, Dec. 2022,
[249] J. Lee, M. Lee, D. Jang, and K. Yoon, ‘‘Korean traditional music genre Art. no. 110607.
classification using sample and midi phrases,’’ KSII Trans. Internet Inf. [271] Y. Li, X. Li, Z. Lou, and C. Chen, ‘‘Long short-term memory-based music
Syst., vol. 12, no. 4, pp. 1869–1886, 2018. analysis system for music therapy,’’ Frontiers Psychol., vol. 13, Jun. 2022,
[250] M. Ebrahimi, B. Majidi, and M. Eshghi, ‘‘Procedural composition of Art. no. 928048.
traditional Persian music using deep neural networks,’’ in Proc. 5th Conf. [272] G. Kruthika, P. Kuruba, and N. Dushyantha, ‘‘A system for anxiety predic-
Knowl. Based Eng. Innov. (KBEI), Feb. 2019, pp. 521–525. tion and treatment using Indian classical music therapy with the applica-
[251] S. S. Hashemi, M. Aghabozorgi, and M. T. Sadeghi, ‘‘Persian music tion of machine learning,’’ in Intelligent Data Communication Technolo-
source separation in audio-visual data using deep learning,’’ in Proc. 6th gies and Internet of Things. Cham, Switzerland: Springer, pp. 345–359,
Iranian Conf. Signal Process. Intell. Syst. (ICSPIS), Dec. 2020, pp. 1–5. 2021.
[252] E. Hallström, S. Mossmyr, B. Sturm, V. Vegeborn, and J. Wedin, ‘‘From [273] S. Shaila, V. Gurudas, R. Rakshita, and A. Shangloo, ‘‘Music therapy
jigs and reels to schottisar OCH polskor: Generating Scandinavian-like for mood transformation based on deep learning framework,’’ in Proc.
folk music with deep recurrent networks,’’ in Proc. 16th Sound Music Comput. Vis. Robot. Singapore: Springer, pp. 35–47, 2022.
Comput. Conf., Malaga, Spain, May 2019, pp. 1–8. [274] S. Shaila, T. Rajesh, S. Lavanya, K. Abhishek, and V. Suma, ‘‘Music
[253] F. Marchetti, C. Wilson, C. Powell, E. Minisci, and A. Riccardi, ‘‘Con- therapy for transforming human negative emotions: Deep learning
volutional generative adversarial network, via transfer learning, for tra- approach,’’ in Proc. Int. Conf. Recent Trends Comput. Singapore:
ditional Scottish music generation,’’ in Proc. Int. Conf. Comput. Intell. Springer, pp. 99–109, 2022.
Music, Sound, Art Design (EvoStar). Cham, Switzerland: Springer, [275] Q. Ding, ‘‘Evaluation of the efficacy of artificial neural network-based
pp. 187–202, 2021. music therapy for depression,’’ Comput. Intell. Neurosci., vol. 2022,
[254] A. Huaysrijan and S. Pongpinigpinyo, ‘‘Automatic music transcription pp. 1–6, Aug. 2022.
for the Thai xylophone played with soft mallets,’’ in Proc. 19th Int. Joint
[276] Z. Hu, Y. Liu, G. Chen, S.-H. Zhong, and A. Zhang, ‘‘Make your favorite
Conf. Comput. Sci. Softw. Eng. (JCSSE), Jun. 2022, pp. 1–6.
music curative: Music style transfer for anxiety reduction,’’ in Proc. 28th
[255] A. Aydingun, D. Bagdatlioglu, B. Canbaz, A. Kokbiyik, M. F. Yavuz,
ACM Int. Conf. Multimedia, Oct. 2020, pp. 1189–1197.
N. Bolucu, and B. Can, ‘‘Turkish music generation using deep learning,’’
in Proc. 28th Signal Process. Commun. Appl. Conf. (SIU), Oct. 2020, [277] E. Idrobo-Ávila, H. Loaiza-Correa, F. Muñoz-Bolaños, L. van Noorden,
pp. 1–4. and R. Vargas-Cañas, ‘‘Development of a biofeedback system using
[256] I. H. Parlak, Y. Çebi, C. Işikhan, and D. Birant, ‘‘Deep learning for harmonic musical intervals to control heart rate variability with a gen-
Turkish makam music composition,’’ TURKISH J. Electr. Eng. Comput. erative adversarial network,’’ Biomed. Signal Process. Control, vol. 71,
Sci., vol. 29, no. 7, pp. 3107–3118, Nov. 2021. Jan. 2022, Art. no. 103095.
[257] S. Tanberk and D. B. Tukel, ‘‘Style-specific Turkish pop music compo- [278] A. E. Coca, G. O. Tost, and L. Zhao, ‘‘Characterizing chaotic melodies
sition with CNN and LSTM network,’’ in Proc. IEEE 19th World Symp. in automatic music composition,’’ Chaos, Interdiscipl. J. Nonlinear Sci.,
Appl. Mach. Intell. Informat. (SAMI), Jan. 2021, pp. 181–185. vol. 20, no. 3, Sep. 2010, Art. no. 033125.
[258] M. A. Kızrak and B. Bolat, ‘‘A musical information retrieval sys- [279] A. E. Coca, D. C. Corrêa, and L. Zhao, ‘‘Computer-aided music composi-
tem for classical Turkish music makams,’’ Simulation, vol. 93, no. 9, tion with lstm neural network and chaotic inspiration,’’ in Proc. Int. Joint
pp. 749–757, Sep. 2017. Conf. Neural Netw. (IJCNN), 2013, pp. 1–7.
[259] M. A. Kizrak and B. Bolat, ‘‘Classification of classic Turkish music [280] M. A. Kaliakatsos-Papakostas, M. G. Epitropakis, A. Floros, and
makams by using deep belief networks,’’ in Proc. 23rd Signal Process. M. N. Vrahatis, ‘‘Chaos and music: From time series analysis to evo-
Commun. Appl. Conf. (SIU), May 2015, pp. 1–6. lutionary composition,’’ Int. J. Bifurcation Chaos, vol. 23, no. 11,
[260] T. P. Van, N. T. N. Quang, and T. M. Thanh, ‘‘Deep learning approach for Nov. 2013, Art. no. 1350181.
singer voice classification of Vietnamese popular music,’’ in Proc. 10th [281] B. Sobota, F. Majcher, M. Sivy, and M. Hudak, ‘‘Chaos simulation and
Int. Symp. Inf. Commun. Technol. (SoICT), 2019, pp. 255–260. audio output,’’ in Proc. IEEE 15th Int. Sci. Conf. Informat., Nov. 2019,
[261] T. Stegemann, M. Geretsegger, E. Phan Quoc, H. Riedl, and pp. 000137–000142.
M. Smetana, ‘‘Music therapy and other music-based interventions [282] E. Berdahl, E. Sheffield, A. Pfalz, and A. T. Marasco, ‘‘Widening the
in pediatric health care: An overview,’’ Medicines, vol. 6, no. 1, p. 25, razor-thin edge of chaos into a musical highway: Connecting chaotic
Feb. 2019. maps to digital waveguides,’’ in Proc. Int. Conf. New Interfaces Musical
[262] M. de Witte, G.-J. Stams, X. Moonen, A. E. R. Bos, and S. van Hooren, Expression (NIME), 2018, pp. 390–393.
‘‘Music therapy for stress reduction: A systematic review and meta-
[283] M. Skarha, V. Cusson, C. Frisson, and M. M. Wanderley, ‘‘Le bâton: A
analysis,’’ Health Psychol. Rev., vol. 16, no. 1, pp. 134–159, Nov. 2020.
digital musical instrument based on the chaotic triple pendulum,’’ in Proc.
[263] H. L. Lam, W. T. V. Li, I. Laher, and R. Y. Wong, ‘‘Effects of music
NIME, 2021, pp. 1–17.
therapy on patients with dementia—A systematic review,’’ Geriatrics,
vol. 5, no. 4, p. 62, 2020. [284] S.-T. Lin and R.-F. Hsu, ‘‘Chaotic signal synthesizer applied on portable
[264] S. Tahmasebi, T. Gajecki, and W. Nogueira, ‘‘Design and evalua- devices for tinnitus therapy,’’ in Proc. Int. Symp. Intell. Signal Process.
tion of a real-time audio source separation algorithm to remix music Commun. Syst. (ISPACS), Nov. 2021, pp. 1–2.
for cochlear implant users,’’ Frontiers Neurosci., vol. 14, p. 434, [285] J.-M. Chen, P.-Y. He, and F. Pan, ‘‘Research on synthesizing music
May 2020. for tinnitus treatment based on chaos,’’ in Proc. 12th Int. Conf. Signal
[265] J. Gauer, A. Nagathil, K. Eckel, D. Belomestny, and R. Martin, ‘‘A ver- Process. (ICSP), Oct. 2014, pp. 2286–2291.
satile deep-neural-network-based music preprocessing and remixing [286] T.-L. Liao, H.-C. Chen, C.-Y. Peng, and Y.-Y. Hou, ‘‘Chaos-based secure
scheme for cochlear implant listeners,’’ J. Acoust. Soc. Amer., vol. 151, communications in biomedical information application,’’ Electronics,
no. 5, pp. 2975–2986, May 2022. vol. 10, no. 3, p. 359, Feb. 2021.
[287] E. Bollt, ‘‘On explaining the surprising success of reservoir computing ACHILLES D. BOURSIANIS (Member, IEEE)
forecaster of chaos? The universal machine learning dynamical system received the B.Sc. degree in physics, the M.Sc.
with contrast to VAR and DMD,’’ Chaos, Interdiscipl. J. Nonlinear Sci., degree in electronic physics (radioelectrology) in
vol. 31, no. 1, Jan. 2021, Art. no. 013108. the area of electronics telecommunications tech-
[288] D. Turnbull, L. Barrington, D. Torres, and G. Lanckriet, ‘‘Towards nology, and the Ph.D. degree in telecommunica-
musical query-by-semantic-description using the CAL500 data set,’’ in tions from the School of Physics, Aristotle Uni-
Proc. 30th Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retr., 2007, versity of Thessaloniki, in 2001, 2005, and 2017,
pp. 439–446. respectively.
Since 2019, he has been a Postdoctoral
Researcher and an Academic Fellow with the
School of Physics, Aristotle University of Thessaloniki. He is currently
a member of the ELEDIA@AUTH Research Group. He is the author or
coauthor of more than 70 articles in international peer-reviewed journals
and conferences. His research interests include wireless sensor networks, the
Internet of Things (IoT), antenna design and optimization, 5G and beyond
communication networks, radio frequency energy harvesting, and artificial
LAZAROS MOYSIS received the B.Sc., M.Sc., intelligence.
and Ph.D. degrees from the Department of Math- Dr. Boursianis is a member of the Hellenic Physical Society and the Sci-
ematics, Aristotle University of Thessaloniki, entific Committee of the National Association of Federation des Ingenieurs
Greece, in 2011, 2013, and 2017, respectively. des Telecommunications de la Communaute Europeenne (FITCE). He is a
He is currently a Researcher with the Physics member of the Editorial Board of the Telecom journal. He serves as a reviewer
Department, Aristotle University of Thessaloniki, for several international journals and conferences and as a member of the
and the Laboratory of Nonlinear Systems, Circuits technical program committees for various international conferences, which
and Complexity. His research interests include are technically sponsored by IEEE.
the theory of control systems, descriptor systems,
chaotic systems, and their applications (notable
examples include observer design, synchronization, chaotification, chaos
encryption, and chaotic path planning).
KONSTANTINOS-IRAKLIS D. KOKKINIDIS
received the B.A. degree from Hellenic Open Uni-
SOTIRIOS P. SOTIROUDIS received the B.Sc. versity and the M.B.A. and Ph.D. degrees from the
degree in physics and the M.Sc. degree in elec- University of Macedonia, Thessaloniki, Greece.
tronics from the Aristotle University of Thessa- He is currently a Special Teaching/Technical Per-
loniki, in 1999 and 2002, respectively, the B.Sc. sonnel with the Department of Applied Informat-
degree in informatics from Hellenic Open Univer- ics, University of Macedonia. He has published
sity, in 2011, and the Ph.D. degree in physics from numerous articles in academic conferences and
the Aristotle University of Thessaloniki, in 2018. journals, such as the International Conference
From 2004 to 2010, he worked with the Telecom- on Modern Circuits and Systems Technologies
munications Center, Aristotle University of Thes- (MOCAST) on Electronics and Communications, the International Con-
saloniki. From 2010 to 2022, he worked as a ference on Movement and Computing (MOCO), and the International
Teacher of physics and informatics with the Greek Ministry of Education. Journal of Mechanical and Mechatronics Engineering (IJMME-IJENS). His
He joined the Department of Physics, Aristotle University of Thessaloniki, research interests include human-centered computing with special interests
in 2022, where he has been involved in several research projects. His research in human–computer interaction, machine learning, the Internet of Things
interests include wireless communications, radio propagation, optimization (IoT), gesture and audio signal processing and identification, and sensori-
algorithms, computer vision, and machine learning. motor learning, with a focus on sound and image processing.
CHRISTOS VOLOS received the Diploma degree SPIRIDON NIKOLAIDIS (Senior Member,
in physics, the M.Sc. degree in electronics, and IEEE) received the Diploma and Ph.D. degrees
the Ph.D. degree in chaotic electronics from the in electrical engineering from Patras University,
Physics Department, Aristotle University of Thes- Greece, in 1988 and 1994, respectively. Since
saloniki, Greece, in 1999, 2002, and 2008, respec- September 1996, he has been with the Department
tively. He is currently an Associate Professor with of Physics, Aristotle University of Thessaloniki,
the Physics Department, Aristotle University of Greece, where he is currently a Full Professor.
Thessaloniki. He is also a member of the Labo- From 2003 to 2017, he was also a contract teaching
ratory of Nonlinear Systems, Circuits and Com- staff of Hellenic Open University. He has worked
plexity, Physics Department, Aristotle University in the areas of digital circuits and system design.
of Thessaloniki. His current research interests include the design and study He is the author or coauthor of more than 200 scientific articles in interna-
of analog and mixed signal electronic circuits, chaotic electronics and their tional journals and conference proceedings, while his work has more than
applications (secure communications, cryptography, and robotics), experi- 2300 references (Google Scholar, H-index=23). Two articles presented at
mental chaotic synchronization, chaotic UWB communications, and mea- international conferences achieved honorary awards. His current research
surement and instrumentation systems. interests include the design of high-speed and low-power digital circuits
and embedded systems, modeling the operations of basic CMOS structures,
modeling the power consumption of embedded processors, and development
of algorithms for leak detection and localization in pipelines. He was a
member of the organization committees of three international conferences.
He is the founder and organizer of the Annual International Conference on
Modern Circuit and System Technologies (MOCAST) since 2012. He also
organized the 27th International Symposium on Power and Timing Mod-
eling, Optimization and Simulation (PATMOS), in 2017. He contributes or
has contributed to a number of research projects funded by the European
Union and the Greek Government, for many of which he has scientific
responsibility.