Intelligent Data Engineering and Automated
Learning IDEAL 2019 20th International
Conference Manchester UK November 14 16 2019
Proceedings Part II Hujun Yin pdf download
https://textbookfull.com/product/intelligent-data-engineering-and-automated-learning-
ideal-2019-20th-international-conference-manchester-uk-november-14-16-2019-proceedings-part-ii-
hujun-yin/
★★★★★ 4.6/5.0 (33 reviews) ✓ 71 downloads ■ TOP RATED
"Great resource, downloaded instantly. Thank you!" - Lisa K.
DOWNLOAD EBOOK
Intelligent Data Engineering and Automated Learning IDEAL
2019 20th International Conference Manchester UK November 14
16 2019 Proceedings Part II Hujun Yin pdf download
TEXTBOOK EBOOK TEXTBOOK FULL
Available Formats
■ PDF eBook Study Guide TextBook
EXCLUSIVE 2025 EDUCATIONAL COLLECTION - LIMITED TIME
INSTANT DOWNLOAD VIEW LIBRARY
Collection Highlights
Intelligent Data Engineering and Automated Learning IDEAL
2019 20th International Conference Manchester UK November
14 16 2019 Proceedings Part I Hujun Yin
Intelligent Data Engineering and Automated Learning IDEAL
2018 19th International Conference Madrid Spain November
21 23 2018 Proceedings Part II Hujun Yin
Intelligent Data Engineering and Automated Learning IDEAL
2018 19th International Conference Madrid Spain November
21 23 2018 Proceedings Part I Hujun Yin
Intelligent Data Engineering and Automated Learning IDEAL
2014 15th International Conference Salamanca Spain
September 10 12 2014 Proceedings 1st Edition Emilio
Corchado
Creativity in Intelligent Technologies and Data Science
Third Conference CIT DS 2019 Volgograd Russia September 16
19 2019 Proceedings Part II Alla G. Kravets
Intelligence Science and Big Data Engineering Big Data and
Machine Learning 9th International Conference IScIDE 2019
Nanjing China October 17 20 2019 Proceedings Part II Zhen
Cui
Web Information Systems Engineering WISE 2019 20th
International Conference Hong Kong China November 26 30
2019 Proceedings Reynold Cheng
Creativity in Intelligent Technologies and Data Science
Third Conference CIT DS 2019 Volgograd Russia September 16
19 2019 Proceedings Part I Alla G. Kravets
Advances in Knowledge Discovery and Data Mining 23rd
Pacific Asia Conference PAKDD 2019 Macau China April 14 17
2019 Proceedings Part II Qiang Yang
Hujun Yin · David Camacho ·
Peter Tino · Antonio J. Tallón-Ballesteros ·
Ronaldo Menezes · Richard Allmendinger (Eds.)
Intelligent
Data Engineering and
LNCS 11872
Automated Learning –
IDEAL 2019
20th International Conference
Manchester, UK, November 14–16, 2019
Proceedings, Part II
Lecture Notes in Computer Science 11872
Founding Editors
Gerhard Goos
Karlsruhe Institute of Technology, Karlsruhe, Germany
Juris Hartmanis
Cornell University, Ithaca, NY, USA
Editorial Board Members
Elisa Bertino
Purdue University, West Lafayette, IN, USA
Wen Gao
Peking University, Beijing, China
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Gerhard Woeginger
RWTH Aachen, Aachen, Germany
Moti Yung
Columbia University, New York, NY, USA
More information about this series at http://www.springer.com/series/7409
Hujun Yin David Camacho
• •
Peter Tino Antonio J. Tallón-Ballesteros
• •
Ronaldo Menezes Richard Allmendinger (Eds.)
•
Intelligent
Data Engineering and
Automated Learning –
IDEAL 2019
20th International Conference
Manchester, UK, November 14–16, 2019
Proceedings, Part II
123
Editors
Hujun Yin David Camacho
University of Manchester Technical University of Madrid
Manchester, UK Madrid, Spain
Peter Tino Antonio J. Tallón-Ballesteros
University of Birmingham University of Huelva
Birmingham, UK Huelva, Spain
Ronaldo Menezes Richard Allmendinger
University of Exeter University of Manchester
Exeter, UK Manchester, UK
ISSN 0302-9743 ISSN 1611-3349 (electronic)
Lecture Notes in Computer Science
ISBN 978-3-030-33616-5 ISBN 978-3-030-33617-2 (eBook)
https://doi.org/10.1007/978-3-030-33617-2
LNCS Sublibrary: SL3 – Information Systems and Applications, incl. Internet/Web, and HCI
© Springer Nature Switzerland AG 2019
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, expressed or implied, with respect to the material contained herein or for any errors or
omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
This year saw the 20th edition of the International Conference on Intelligent Data
Engineering and Automated Learning (IDEAL 2019), held for the second time in
Manchester, UK – the birthplace of one of the world’s first electronic computers as well
as artificial intelligence (AI) marked by Alan Turing’s seminal and pioneering work.
The IDEAL conference has been serving its unwavering role in data analytics and
machine learning for the last 20 years. It strives to provide an ideal platform for the
scientific communities and researchers from near and far to exchange latest findings,
disseminate cutting-edge results, and to forge alliances on tackling many real-world
challenging problems. The core themes of the IDEAL 2019 include big data
challenges, machine learning, data mining, information retrieval and management,
bio-/neuro-informatics, bio-inspired models (including neural networks, evolutionary
computation, and swarm intelligence), agents and hybrid intelligent systems, real-world
applications of intelligent techniques, and AI.
In total, 149 submissions were received and subsequently underwent rigorous peer
reviews by the Program Committee members and experts. Only the papers judged to be
of the highest quality and novelty were accepted and included in the proceedings.
These volumes contain 94 papers (58 for the main track and 36 for special sessions)
accepted and presented at IDEAL 2019, held during November 14–16, 2019, at the
University of Manchester, Manchester, UK. These papers provided a timely snapshot
of the latest topics and advances in data analytics and machine learning, from
methodologies, frameworks, and algorithms to applications. IDEAL 2019 enjoyed
outstanding keynotes from leaders in the field, Thomas Bäck of Leiden University and
Damien Coyle of University of Ulster, and an inspiring tutorial from Peter Tino of
University of Birmingham.
IDEAL 2019 was hosted by the University Manchester and was co-sponsored by the
Alan Turing Institute and Manchester City Council. It was also technically
co-sponsored by the IEEE Computational Intelligence Society UK and Ireland Chapter.
We would like to thank our sponsors for their financial and technical support. We
would also like to thank all the people who devoted so much time and effort to the
successful running of the conference, in particular the members of the Program
Committee and reviewers, organizers of the special sessions, as well as the authors who
contributed to the conference. We are also very grateful to the hard work by the local
Organizing Committee at the University of Manchester, in particular, Yao Peng and
vi Preface
Jingwen Su for checking through all the camera-ready files. Continued support,
sponsorship, and collaboration from Springer LNCS are also greatly appreciated.
September 2019 Hujun Yin
David Camacho
Peter Tino
Antonio J. Tallón-Ballesteros
Ronaldo Menezes
Richard Allmendinger
Organization
General Chairs
Hujun Yin University of Manchester, UK
David Camacho Universidad Politecnica de Madrid, Spain
Peter Tino University of Birmingham, UK
Programme Co-chairs
Hujun Yin University of Manchester, UK
David Camacho Universidad Politecnica de Madrid, Spain
Antonio University of Huelva, Spain
J. Tallón-Ballesteros
Peter Tino University of Birmingham, UK
Ronaldo Menezes University of Exeter, UK
Richard Allmendinger University of Manchester, UK
International Advisory Committee
Lei Xu Chinese University of Hong Kong, Hong Kong, China,
and Shanghai Jiaotong University, China
Yaser Abu-Mostafa CALTECH, USA
Shun-ichi Amari RIKEN, Japan
Michael Dempster University of Cambridge, UK
Francisco Herrera Autonomous University of Madrid, Spain
Nick Jennings University of Southampton, UK
Soo-Young Lee KAIST, South Korea
Erkki Oja Helsinki University of Technology, Finland
Latit M. Patnaik Indian Institute of Science, India
Burkhard Rost Columbia University, USA
Xin Yao Southern University of Science and Technology,
China, and University of Birmingham, UK
Steering Committee
Hujun Yin (Chair) University of Manchester, UK
Laiwan Chan (Chair) Chinese University of Hong Kong, Hong Kong, China
Guilherme Barreto Federal University of Ceará, Brazil
Yiu-ming Cheung Hong Kong Baptist University, Hong Kong, China
Emilio Corchado University of Salamanca, Spain
Jose A. Costa Federal University of Rio Grande do Norte, Brazil
Marc van Hulle KU Leuven, Belgium
viii Organization
Samuel Kaski Aalto University, Finland
John Keane University of Manchester, UK
Jimmy Lee Chinese University of Hong Kong, Hong Kong, China
Malik Magdon-Ismail Rensselaer Polytechnic Institute, USA
Peter Tino University of Birmingham, UK
Zheng Rong Yang University of Exeter, UK
Ning Zhong Maebashi Institute of Technology, Japan
Publicity Co-chairs/Liaisons
Jose A. Costa Federal University of Rio Grande do Norte, Brazil
Bin Li University of Science and Technology of China, China
Yimin Wen Guilin University of Electronic Technology, China
Local Organizing Committee
Hujun Yin Richard Allmendinger
Richard Hankins Yao Peng
Ananya Gupta Jingwen Su
Mengyu Liu
Program Committee
Ajith Abraham Josep Carmona
Jesus Alcala-Fdez Mercedes Carnero
Richardo Aler Carlos Carrascosa
Davide Anguita Andre de Carvalho
Anastassia Angelopoulou Joao Carvalho
Ángel Arcos-Vargas Pedro Castillo
Romis Attux Luís Cavique
Martin Atzmueller Darryl Charles
Dariusz Barbucha Richard Chbeir
Mahmoud Barhamgi Songcan Chen
Bruno Baruque Xiaohong Chen
Carmelo Bastos Filho Sung-Bae Cho
Lordes Borrajo Stelvio Cimato
Zoran Bosnic Manuel Jesus Cobo Martin
Vicent Botti Leandro Coelho
Edyta Brzychczy Carlos Coello Coello
Fernando Buarque Roberto Confalonieri
Andrea Burattin Rafael Corchuelo
Robert Burduk Francesco Corona
Aleksander Byrski Nuno Correia
Heloisa Camargo Luís Correia
Organization ix
Paulo Cortez Álvaro Herrero
Jose Alfredo F. Costa J. Michael Herrmann
Carlos Cotta Ignacio Hidalgo
Raúl Cruz-Barbosa James Hogan
Alfredo Cuzzocrea Jaakko Hollmén
Bogusław Cyganek Wei-Chiang Samuelson Hong
Ireneusz Czarnowski Vahid Jalali
Ernesto Damiani Dariusz Jankowski
Amit Kumar Das Piotr Jedrzejowicz
Bernard De Baets Vicente Julian
Javier Del Ser Rushed Kanawati
Boris Delibašić Mario Koeppen
Fernando Díaz Mirosław Kordos
Juan Manuel Dodero Marcin Korzeń
Jose Dorronsoro Dariusz Krol
Dinu Dragan Pawel Ksieniewics
Gérard Dreyfus Raul Lara-Cabrera
Jochen Einbeck Bin Li
Florentino Fdez-Riverola Lei Liu
Joaquim Filipe Wenjian Luo
Juan J. Flores José F. Martínez-Trinidad
Simon James Fong Giancarlo Mauri
Pawel Forczmanski Cristian Mihaescu
Giancarlo Fortino Boris Mirkin
Felipe M. G. França José M. Molina
Dariusz Frejlichowski João Mp Cardoso
Hamido Fujita Grzegorz J. Nalepa
Marcus Gallagher Valery Naranjo
Yang Gao Susana Nascimento
Salvador Garcia Tim Nattkemper
Pablo García Sánchez Antonio Neme
Luis Javier Garcia Villalba Rui Neves Madeira
María José Ginzo Villamayor Ngoc-Thanh Nguyen
Fernando Gomide Paulo Novais
Antonio Gonzalez-Pardo Fernando Nuñez
Anna Gorawska Ivan Olier-Caparroso
Marcin Gorawski Eva Onaindia
Juan Manuel Górriz Sandra Ortega-Martorell
Manuel Graña Vasile Palade
Maciej Grzenda Jose Palma
Jerzy Grzymala-Busse Juan Pavón
Pedro Antonio Gutierrez Yao Peng
Barbara Hammer Carlos Pereira
Julia Handl Barbara Pes
Richard Hankins Marco Platzner
Ioannis Hatzilygeroudis Paulo Quaresma
x Organization
Cristian Ramírez-Atencia Pawel Trajdos
Ajalmar Rêgo Da Rocha Neto Carlos M. Travieso-González
Izabela Rejer Bogdan Trawinski
Victor Rodriguez Fernandez Milan Tuba
Matilde Santos Turki Turki
Pedro Santos Eiji Uchino
Jose Santos Carlos Usabiaga Ibáñez
Rafal Scherer José Valente de Oliveira
Ivan Silva Alfredo Vellido
Leandro A. Silva Juan G. Victores
Dragan Simic José R. Villar
Anabela Simões Lipo Wang
Marcin Szpyrka Tzai-Der Wang
Jesús Sánchez-Oro Dongqing Wei
Ying Tan Raymond Kwok-Kay Wong
Qing Tian Michal Wozniak
Renato Tinós Xin-She Yang
Stefania Tomasiello Huiyu Zhou
Additional Reviewers
Sabri Allani Hongmei He
Ray Baishkhi Eloy Irigoyen
Samik Banerjee Suman Jana
Avishek Bhattacharjee Ian Jarman
Nurul E’zzati Binti Md Isa Jörg Keller
Daniele Bortoluzzi Bartosz Krawczyk
Anna Burduk Weikai Li
Jose Luis Calvo Rolle Mengyu Liu
Walmir Caminnhas Nicolás Marichal
Meng Cao Wojciech Mazurczyk
Hugo Carneiro Philippa Grace McCabe
Giovanna Castellano Eneko Osaba
Hubert Cecotti Kexin Pei
Carl Chalmers Mark Pieroni
Lei Chen Alice Plebe
Zonghai Chen Girijesh Prasad
Antonio Maria Chiarelli Sergey Prokudin
Bernat Coma-Puig Yu Qiao
Gabriela Czanner Juan Rada-Vilela
Mauro Da Lio Dheeraj Rathee
Sukhendu Das Haider Raza
Klaus Dietmayer Yousef Rezaei Tabar
Paul Fergus Patrick Riley
Róża Goścień Gastone Pietro Rosati Papini
Ugur Halici Salma Sassi
Organization xi
Ivo Siekmann Marley Vellasco
Mônica Ferreira Da Silva Yu Yu
Kacper Sumera Rita Zgheib
Yuchi Tian Shang-Ming Zhou
Mariusz Topolski
Special Session on Fuzzy Systems and Intelligent Data Analysis
Organizers
Susana Nascimento NOVA University, Portugal
Antonio University of Huelva, Spain
J. Tallón-Ballesteros
José Valente de Oliveira University of Algarve, Portugal
Boris Mirkin National Research University Moscow, Russian
Special Session on Machine Learning towards Smarter
Multimodal Systems
Organizers
Nuno Correia NOVA University of Lisboa, Portugal
Rui Neves Madeira NOVA University of Lisboa, Polytechnic Institute
of Setúbal, Portugal
Susana Nascimento NOVA University, Portugal
Special Session on Data Selection in Machine Learning
Organizers
Antonio University of Huelva, Spain
J. Tallón-Ballesteros
Raymond Kwok-Kay Wong University of New South Wales, Australia
Ireneusz Czarnowski University of Novi Sad, Serbia
Simon James Fong University of Macau, Macau, China
Special Session on Machine Learning in Healthcare
Organizers
Ivan Olier Liverpool John Moores University, UK
Sandra Ortega-Martorell Liverpool John Moores University, UK
Paulo Lisboa Liverpool John Moores University, UK
Alfredo Vellido Universitat Politècnica de Catalunya, Spain
Special Session on Machine Learning in Automatic Control
Organizers
Matilde Santos University Complutense of Madrid, Spain
Juan G. Victores University Carlos III of Madrid, Spain
xii Organization
Special Session on Finance and Data Mining
Organizers
Fernando Núñez Hernández University of Seville, Spain
Antonio University of Huelva, Spain
J. Tallón-Ballesteros
Paulo Vasconcelos University of Porto, Portugal
Ángel Arcos-Vargas University of Seville, Spain
Special Session on Knowledge Discovery from Data
Organizers
Barbara Pes University of Cagliari, Italy
Antonio University of Huelva, Spain
J. Tallón-Ballesteros
Julia Handl University of Manchester, UK
Special Session on Machine Learning Algorithms
for Hard Problems
Organizers
Pawel Ksieniewicz Wroclaw University of Science and Technology,
Poland
Robert Burduk Wroclaw University of Science and Technology,
Poland
Contents – Part II
Special Session on Fuzzy Systems and Intelligent Data Analysis
Computational Generalization in Taxonomies Applied to:
(1) Analyze Tendencies of Research and (2) Extend User Audiences . . . . . . . 3
Dmitry Frolov, Susana Nascimento, Trevor Fenner, Zina Taran,
and Boris Mirkin
Unsupervised Initialization of Archetypal Analysis and Proportional
Membership Fuzzy Clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Susana Nascimento and Nuno Madaleno
Special Session on Machine Learning Towards Smarter
Multimodal Systems
Multimodal Web Based Video Annotator with Real-Time Human
Pose Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Rui Rodrigues, Rui Neves Madeira, Nuno Correia, Carla Fernandes,
and Sara Ribeiro
New Interfaces for Classifying Performance Gestures in Music . . . . . . . . . . . 31
Chris Rhodes, Richard Allmendinger, and Ricardo Climent
Special Session on Data Selection in Machine Learning
Classifying Ransomware Using Machine Learning Algorithms . . . . . . . . . . . 45
Samuel Egunjobi, Simon Parkinson, and Andrew Crampton
Artificial Neural Networks in Mathematical Mini-Games for Automatic
Students’ Learning Styles Identification: A First Approach . . . . . . . . . . . . . . 53
Richard Torres-Molina, Jorge Banda-Almeida,
and Lorena Guachi-Guachi
The Use of Unified Activity Records to Predict Requests Made
by Applications for External Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Maciej Grzenda, Robert Kunicki, and Jaroslaw Legierski
Fuzzy Clustering Approach to Data Selection for Computer Usage
in Headache Disorders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Svetlana Simić, Ljiljana Radmilo, Dragan Simić, Svetislav D. Simić,
and Antonio J. Tallón-Ballesteros
xiv Contents – Part II
Multitemporal Aerial Image Registration Using Semantic Features . . . . . . . . 78
Ananya Gupta, Yao Peng, Simon Watson, and Hujun Yin
Special Session on Machine Learning in Healthcare
Brain Tumor Classification Using Principal Component Analysis
and Kernel Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Richard Torres-Molina, Carlos Bustamante-Orellana,
Andrés Riofrío-Valdivieso, Francisco Quinga-Socasi, Robinson Guachi,
and Lorena Guachi-Guachi
Modelling Survival by Machine Learning Methods in Liver
Transplantation: Application to the UNOS Dataset . . . . . . . . . . . . . . . . . . . 97
David Guijo-Rubio, Pedro J. Villalón-Vaquero, Pedro A. Gutiérrez,
Maria Dolores Ayllón, Javier Briceño, and César Hervás-Martínez
Design and Development of an Automatic Blood Detection System
for Capsule Endoscopy Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Pedro Pons, Reinier Noorda, Andrea Nevárez, Adrián Colomer,
Vicente Pons Beltrán, and Valery Naranjo
Comparative Analysis for Computer-Based Decision Support: Case Study
of Knee Osteoarthritis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Philippa Grace McCabe, Ivan Olier, Sandra Ortega-Martorell,
Ian Jarman, Vasilios Baltzopoulos, and Paulo Lisboa
A Clustering-Based Patient Grouper for Burn Care . . . . . . . . . . . . . . . . . . . 123
Chimdimma Noelyn Onah, Richard Allmendinger, Julia Handl,
Paraskevas Yiapanis, and Ken W. Dunn
A Comparative Assessment of Feed-Forward and Convolutional Neural
Networks for the Classification of Prostate Lesions . . . . . . . . . . . . . . . . . . . 132
Sabrina Marnell, Patrick Riley, Ivan Olier, Marc Rea,
and Sandra Ortega-Martorell
Special Session on Machine Learning in Automatic Control
A Method Based on Filter Bank Common Spatial Pattern for Multiclass
Motor Imagery BCI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Ziqing Xia, Likun Xia, and Ming Ma
Safe Deep Neural Network-Driven Autonomous Vehicles Using Software
Safety Cages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Sampo Kuutti, Richard Bowden, Harita Joshi, Robert de Temple,
and Saber Fallah
Contents – Part II xv
Wave and Viscous Resistance Estimation by NN . . . . . . . . . . . . . . . . . . . . 161
D. Marón and M. Santos
Neural Controller of UAVs with Inertia Variations . . . . . . . . . . . . . . . . . . . 169
J. Enrique Sierra-Garcia, Matilde Santos, and Juan G. Victores
Special Session on Finance and Data Mining
A Metric Framework for Quantifying Data Concentration . . . . . . . . . . . . . . 181
Peter Mitic
Adaptive Machine Learning-Based Stock Prediction Using Financial Time
Series Technical Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Ahmed K. Taha, Mohamed H. Kholief, and Walid AbdelMoez
Special Session on Knowledge Discovery from Data
Exploiting Online Newspaper Articles Metadata for Profiling City Areas . . . . 203
Livio Cascone, Pietro Ducange, and Francesco Marcelloni
Modelling the Social Interactions in Ant Colony Optimization . . . . . . . . . . . 216
Nishant Gurrapadi, Lydia Taw, Mariana Macedo, Marcos Oliveira,
Diego Pinheiro, Carmelo Bastos-Filho, and Ronaldo Menezes
An Innovative Deep-Learning Algorithm for Supporting the Approximate
Classification of Workloads in Big Data Environments . . . . . . . . . . . . . . . . 225
Alfredo Cuzzocrea, Enzo Mumolo, Carson K. Leung,
and Giorgio Mario Grasso
Control-Flow Business Process Summarization via Activity Contraction. . . . . 238
Valeria Fionda and Gianluigi Greco
Classifying Flies Based on Reconstructed Audio Signals . . . . . . . . . . . . . . . 249
Michael Flynn and Anthony Bagnall
Studying the Evolution of the ‘Circular Economy’ Concept Using
Topic Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Sampriti Mahanty, Frank Boons, Julia Handl, and Riza Batista-Navarro
Mining Frequent Distributions in Time Series . . . . . . . . . . . . . . . . . . . . . . . 271
José Carlos Coutinho, João Mendes Moreira, and Cláudio Rebelo de Sá
Time Series Display for Knowledge Discovery on Selective
Laser Melting Machines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
Ramón Moreno, Juan Carlos Pereira, Alex López, Asif Mohammed,
and Prasha Pahlevannejad
xvi Contents – Part II
Special Session on Machine Learning Algorithms for Hard Problems
Using Prior Knowledge to Facilitate Computational Reading
of Arabic Calligraphy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
Seetah ALSalamah, Riza Batista-Navarro, and Ross D. King
SMOTE Algorithm Variations in Balancing Data Streams . . . . . . . . . . . . . . 305
Bogdan Gulowaty and Paweł Ksieniewicz
Multi-class Text Complexity Evaluation via Deep Neural Networks . . . . . . . 313
Alfredo Cuzzocrea, Giosué Lo Bosco, Giovanni Pilato,
and Daniele Schicchi
Imbalance Reduction Techniques Applied to ECG Classification Problem . . . 323
Jȩdrzej Kozal and Paweł Ksieniewicz
Machine Learning Methods for Fake News Classification. . . . . . . . . . . . . . . 332
Paweł Ksieniewicz, Michał Choraś, Rafał Kozik, and Michał Woźniak
A Genetic-Based Ensemble Learning Applied to Imbalanced Data
Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
Jakub Klikowski, Paweł Ksieniewicz, and Michał Woźniak
The Feasibility of Deep Learning Use for Adversarial Model Extraction
in the Cybersecurity Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
Michał Choraś, Marek Pawlicki, and Rafał Kozik
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361
Contents – Part I
Orchids Classification Using Spatial Transformer Network
with Adaptive Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Watcharin Sarachai, Jakramate Bootkrajang, Jeerayut Chaijaruwanich,
and Samerkae Somhom
Scalable Dictionary Classifiers for Time Series Classification . . . . . . . . . . . . 11
Matthew Middlehurst, William Vickers, and Anthony Bagnall
Optimization of the Numeric and Categorical Attribute Weights
in KAMILA Mixed Data Clustering Algorithm . . . . . . . . . . . . . . . . . . . . . . 20
Nádia Junqueira Martarelli and Marcelo Seido Nagano
Meaningful Data Sampling for a Faithful Local Explanation Method . . . . . . . 28
Peyman Rasouli and Ingrid Chieh Yu
Classifying Prostate Histological Images Using Deep Gaussian Processes
on a New Optical Density Granulometry-Based Descriptor . . . . . . . . . . . . . . 39
Miguel López-Pérez, Adrián Colomer, María A. Sales, Rafael Molina,
and Valery Naranjo
Adaptive Orthogonal Characteristics of Bio-inspired Neural Networks . . . . . . 47
Naohiro Ishii, Toshinori Deguchi, Masashi Kawaguchi, Hiroshi Sasaki,
and Tokuro Matsuo
Using Deep Learning for Ordinal Classification of Mobile Marketing
User Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Luís Miguel Matos, Paulo Cortez, Rui Castro Mendes,
and Antoine Moreau
Modeling Data Driven Interactions on Property Graph . . . . . . . . . . . . . . . . . 68
Worapol Alex Pongpech
Adaptive Dimensionality Adjustment for Online “Principal
Component Analysis” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Nico Migenda, Ralf Möller, and Wolfram Schenck
Relevance Metric for Counterfactuals Selection in Decision Trees . . . . . . . . . 85
Rubén R. Fernández, Isaac Martín de Diego, Víctor Aceña,
Javier M. Moguerza, and Alberto Fernández-Isabel
Weighted Nearest Centroid Neighbourhood . . . . . . . . . . . . . . . . . . . . . . . . 94
Víctor Aceña, Javier M. Moguerza, Isaac Martín de Diego,
and Rubén R. Fernández
xviii Contents – Part I
The Prevalence of Errors in Machine Learning Experiments . . . . . . . . . . . . . 102
Martin Shepperd, Yuchen Guo, Ning Li, Mahir Arzoky,
Andrea Capiluppi, Steve Counsell, Giuseppe Destefanis,
Stephen Swift, Allan Tucker, and Leila Yousefi
A Hybrid Model for Fraud Detection on Purchase Orders . . . . . . . . . . . . . . 110
William Ferreira Moreno Oliverio, Allan Barcelos Silva,
Sandro José Rigo, and Rodolpho Lopes Bezerra da Costa
Users Intention Based on Twitter Features Using Text Analytics. . . . . . . . . . 121
Qadri Mishael, Aladdin Ayesh, and Iryna Yevseyeva
Mixing Hetero- and Homogeneous Models in Weighted Ensembles. . . . . . . . 129
James Large and Anthony Bagnall
A Hybrid Approach to Time Series Classification with Shapelets . . . . . . . . . 137
David Guijo-Rubio, Pedro A. Gutiérrez, Romain Tavenard,
and Anthony Bagnall
An Ensemble Algorithm Based on Deep Learning
for Tuberculosis Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Alfonso Hernández, Ángel Panizo, and David Camacho
A Data-Driven Approach to Automatic Extraction of Professional
Figure Profiles from Résumés. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Alessandro Bondielli and Francesco Marcelloni
Retrieving and Processing Information from Clinical Algorithm
via Formal Concept Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
Aleksandra Vatian, Anna Tatarinova, Svyatoslav Osipov,
Nikolai Egorov, Vitalii Boitsov, Elena Ryngach, Tatiana Treshkur,
Anatoly Shalyto, and Natalia Gusarova
Comparative Analysis of Approaches to Building Medical Dialog
Systems in Russian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Aleksandra Vatian, Natalia Dobrenko, Nikolai Andreev,
Aleksandr Nemerovskii, Anastasia Nevochhikova, and Natalia Gusarova
Tracking Position and Status of Electric Control Switches Based
on YOLO Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
Xingang Mou, Jian Cui, Hujun Yin, and Xiao Zhou
A Self-generating Prototype Method Based on Information Entropy
Used for Condensing Data in Classification Tasks. . . . . . . . . . . . . . . . . . . . 195
Alberto Manastarla and Leandro A. Silva
Contents – Part I xix
Transfer Knowledge Between Sub-regions for Traffic Prediction
Using Deep Learning Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Yi Ren and Kunqing Xie
Global Q-Learning Approach for Power Allocation
in Femtocell Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
Abdulmajeed M. Alenezi and Khairi Hamdi
Deep Learning and Sensor Fusion Methods for Studying Gait Changes
Under Cognitive Load in Males and Females . . . . . . . . . . . . . . . . . . . . . . . 229
Abdullah S. Alharthi and Krikor B. Ozanyan
Towards a Robotic Personal Trainer for the Elderly. . . . . . . . . . . . . . . . . . . 238
J. A. Rincon, A. Costa, P. Novais, V. Julian, and C. Carrascosa
Image Quality Constrained GAN for Super-Resolution. . . . . . . . . . . . . . . . . 247
Jingwen Su, Yao Peng, and Hujun Yin
Use Case Prediction Using Product Reviews Text Classification . . . . . . . . . . 257
Tinashe Wamambo, Cristina Luca, and Arooj Fatima
Convolutional Neural Network for Core Sections Identification
in Scientific Research Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Bello Aliyu Muhammad, Rahat Iqbal, Anne James,
and Dianabasi Nkantah
Knowledge Inference Through Analysis of Human Activities . . . . . . . . . . . . 274
Leandro O. Freitas, Pedro R. Henriques, and Paulo Novais
Representation Learning of Knowledge Graphs with Multi-scale
Capsule Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
Jingwei Cheng, Zhi Yang, Jinming Dang, Chunguang Pan,
and Fu Zhang
CNNPSP: Pseudouridine Sites Prediction Based on Deep Learning . . . . . . . . 291
Yongxian Fan, Yongzhen Li, Huihua Yang, and Xiaoyong Pan
A Multimodal Approach to Image Sentiment Analysis . . . . . . . . . . . . . . . . . 302
António Gaspar and Luís A. Alexandre
Joining Items Clustering and Users Clustering for Evidential
Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
Raoua Abdelkhalek, Imen Boukhris, and Zied Elouedi
Conditioned Generative Model via Latent Semantic Controlling
for Learning Deep Representation of Data . . . . . . . . . . . . . . . . . . . . . . . . . 319
Jin-Young Kim and Sung-Bae Cho
xx Contents – Part I
Toward a Framework for Seasonal Time Series Forecasting
Using Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
Colin Leverger, Simon Malinowski, Thomas Guyet, Vincent Lemaire,
Alexis Bondu, and Alexandre Termier
An Evidential Imprecise Answer Aggregation Approach Based
on Worker Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
Lina Abassi and Imen Boukhris
Combining Machine Learning and Classical Optimization Techniques
in Vehicle to Vehicle Communication Network. . . . . . . . . . . . . . . . . . . . . . 350
Mutasem Hamdan and Khairi Hamdi
Adversarial Edit Attacks for Tree Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 359
Benjamin Paaßen
Non-stationary Noise Cancellation Using Deep Autoencoder Based
on Adversarial Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
Kyung-Hyun Lim, Jin-Young Kim, and Sung-Bae Cho
A Deep Learning-Based Surface Defect Inspection System
for Smartphone Glass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
Gwang-Myong Go, Seok-Jun Bu, and Sung-Bae Cho
Superlinear Speedup of Parallel Population-Based Metaheuristics:
A Microservices and Container Virtualization Approach. . . . . . . . . . . . . . . . 386
Hatem Khalloof, Phil Ostheimer, Wilfried Jakob, Shadi Shahoud,
Clemens Duepmeier, and Veit Hagenmeyer
Active Dataset Generation for Meta-learning System Quality Improvement. . . 394
Alexey Zabashta and Andrey Filchenkov
Do You Really Follow Them? Automatic Detection of Credulous
Twitter Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
Alessandro Balestrucci, Rocco De Nicola, Marinella Petrocchi,
and Catia Trubiani
User Localization Based on Call Detail Record. . . . . . . . . . . . . . . . . . . . . . 411
Buddhi Ayesha, Bhagya Jeewanthi, Charith Chitraranjan,
Amal Shehan Perera, and Amal S. Kumarage
Automatic Ground Truth Dataset Creation for Fake News Detection
in Social Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
Danae Pla Karidi, Harry Nakos, and Yannis Stavrakas
Contents – Part I xxi
Artificial Flora Optimization Algorithm for Task Scheduling
in Cloud Computing Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
Nebojsa Bacanin, Eva Tuba, Timea Bezdan, Ivana Strumberger,
and Milan Tuba
A Significantly Faster Elastic-Ensemble for Time-Series Classification. . . . . . 446
George Oastler and Jason Lines
ALIME: Autoencoder Based Approach for Local Interpretability. . . . . . . . . . 454
Sharath M. Shankaranarayana and Davor Runje
Detection of Abnormal Load Consumption in the Power Grid Using
Clustering and Statistical Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
Matúš Cuper, Marek Lóderer, and Viera Rozinajová
Deep Convolutional Neural Networks Based on Image Data Augmentation
for Visual Object Recognition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476
Khaoula Jayech
An Efficient Scheme for Prototyping kNN in the Context of Real-Time
Human Activity Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486
Paulo J. S. Ferreira, Ricardo M. C. Magalhães, Kemilly Dearo Garcia,
João M. P. Cardoso, and João Mendes-Moreira
A Novel Recommendation System for Next Feature in Software . . . . . . . . . . 494
Victor R. Prata, Ronaldo S. Moreira, Luan S. Cordeiro, Átilla N. Maia,
Alan R. Martins, Davi A. Leão, C. H. L. Cavalcante,
Amauri H. Souza Júnior, and Ajalmar R. Rocha Neto
Meta-learning Based Evolutionary Clustering Algorithm . . . . . . . . . . . . . . . 502
Dmitry Tomp, Sergey Muravyov, Andrey Filchenkov,
and Vladimir Parfenov
Fast Tree-Based Classification via Homogeneous Clustering . . . . . . . . . . . . . 514
George Pardis, Konstantinos I. Diamantaras, Stefanos Ougiaroglou,
and Georgios Evangelidis
Ordinal Equivalence Classes for Parallel Coordinates. . . . . . . . . . . . . . . . . . 525
Alexey Myachin and Boris Mirkin
New Internal Clustering Evaluation Index Based on Line Segments. . . . . . . . 534
Juan Carlos Rojas Thomas and Matilde Santos Peñas
Threat Identification in Humanitarian Demining Using Machine
Learning and Spectroscopic Metal Detection. . . . . . . . . . . . . . . . . . . . . . . . 542
Wouter van Verre, Toykan Özdeǧer, Ananya Gupta, Frank J. W. Podd,
and Anthony J. Peyton
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551
Special Session on Fuzzy Systems and
Intelligent Data Analysis
Computational Generalization in
Taxonomies Applied to: (1) Analyze
Tendencies of Research and (2) Extend
User Audiences
Dmitry Frolov1,5(B) , Susana Nascimento2 , Trevor Fenner3 ,
Zina Taran4 , and Boris Mirkin1,3
1
National Research University Higher School of Economics, Moscow, Russia
[email protected] 2
Universidade Nova de Lisboa, Caparica, Portugal
3
Birkbeck, University of London, London, UK
4
Delta State University, Cleveland, MS, USA
5
Natimatica, Ltd., Moscow, Russia
Abstract. We define a most specific generalization of a fuzzy set of
topics assigned to leaves of the rooted tree of a domain taxonomy. This
generalization lifts the set to its “head subject” node in the higher ranks
of the taxonomy tree. The head subject is supposed to “tightly” cover
the query set, possibly bringing in some errors referred to as “gaps” and
“offshoots”. Our method, ParGenFS, globally minimizes a penalty func-
tion combining the numbers of head subjects and gaps and offshoots,
differently weighted. Two applications are considered: (1) analysis of
tendencies of research in Data Science; (2) audience extending for pro-
grammatic targeted advertising online. The former involves a taxonomy
of Data Science derived from the celebrated ACM Computing Classifi-
cation System 2012. Based on a collection of research papers published
by Springer 1998–2017, and applying in-house methods for text analysis
and fuzzy clustering, we derive fuzzy clusters of leaf topics in learning,
retrieval and clustering. The head subjects of these clusters inform us
of some general tendencies of the research. The latter involves publicly
available IAB Tech Lab Content Taxonomy. Each of about 25 mln users
is assigned with a fuzzy profile within this taxonomy, which is generalized
offline using ParGenFS. Our experiments show that these head subjects
effectively extend the size of targeted audiences at least twice without
loosing quality.
Keywords: Generalization · Fuzzy thematic cluster · Annotated suffix
tree · Research tendencies · Targeted advertising
1 Introduction
The notion of generalization is not absent from the current developments in
knowledge engineering and artificial intelligence. Just the opposite. For example,
c Springer Nature Switzerland AG 2019
H. Yin et al. (Eds.): IDEAL 2019, LNCS 11872, pp. 3–11, 2019.
https://doi.org/10.1007/978-3-030-33617-2_1
4 D. Frolov et al.
building a supervised classifier fits exactly into the concept of generalization:
a classifier generalizes given instances of “yes”-objects into a decision rule to
separate the “yes”-class from the rest. This, however, relates to a very special case
at which all the objects are elements of the same variable space. We are going to
tackle the case at which we are presented with a crisp or fuzzy subset of different
concepts, and one wishes to generalize this subset into a coarser concept tightly
embracing the subset. This is, partly, the meaning of the term “generalization”
which, according to the Merriam-Webster dictionary, refers to deriving a general
conception from particulars. We assume that a most straightforward medium for
such a derivation, a taxonomy of the field, is given to us.
Currently, taxonomic constructions mostly concentrate on develop-
ing taxonomies, especially those involving referred to in linguistics as
hyponymic/hypernymic relations (see, for example, [6,8]) Also, some activities
go in the direction of “operational” generalization: generalized case descriptions
involving taxonomic relations between generalized states and their parts are used
to achieve a tangible goal such as improving characteristics of text retrieval [7].
This paper does not attempt to develop or change any taxonomy, but rather
uses an existing taxonomy. The situation of our concern is a case at which we
are to generalize a fuzzy set of taxonomy leaves representing the essence of an
empirically observed phenomenon. The rest of the paper is organized accordingly.
Section 2 presents a mathematical formalization of the generalization problem
as of parsimoniously lifting of a given fuzzy leaf set to nodes in higher ranks of
the taxonomy and provides a recursive algorithm leading to a globally optimal
solution to the problem. Section 3 describes an application of this approach to
deriving tendencies in development of the data science, that are discerned from
a set of about 18,000 research papers published by the Springer Publishers in 17
journals related to Data Science for the past 20 years. Its subsections describe
our approach to finding and generalizing fuzzy clusters of research topics. In
the end, we point to tendencies in the development of the corresponding parts of
Data Science, as drawn from the lifting results. Section 4 describes an application
of the parsimonious generalization method to efficiently extend the audience of
targeted advertising over the Internet. More detailed description can be found
in [3].
2 Generalization: Parsimoniously Lifting a Fuzzy
Thematic Set in Taxonomy
We consider the following problem. Given a rooted taxonomy tree and fuzzy set
S of taxonomy leaves, find a node h(S) of higher rank in the taxonomy, that
tightly covers the set S.
The problem is not as simple as it may seem to be. Consider, for the sake
of simplicity, a hard set S shown with five black leaf boxes on a fragment of a
tree in Fig. 1 illustrating the situation at which the set of black boxes is lifted
to the root. If we accept that set S may be generalized by the root, this would
lead to a number, four, of white boxes to be covered by the root and, thus, in
Computational Generalization in Taxonomies 5
this way, falling in the same concept as S even as they do not belong in S. Such
a situation will be referred to as a gap. Lifting with gaps should be penalized.
Altogether, the number of conceptual elements introduced to generalize S here
is 1 head subject, that is, the root to which we have assigned S, and the 4 gaps
occurred just because of the topology of the tree, which imposes this penalty.
Another lifting decision is illustrated in Fig. 2: here the set is lifted just to the
root of the left branch of the tree. The number of gaps here has decreased, to
just 1. However, another oddity emerged: a black box on the right, belonging to
S but not covered by the node at which the set S is mapped. This type of error
will be referred to as an offshoot. At this lifting, three new items emerge: one
head subject, one offshoot, and one gap. This is less than the number of items
emerged at lifting the set to the root (one head subject and four gaps, that is,
five), which makes it more preferable. Of course, this conclusion holds only if the
relative weight of an offshoot is less than the total relative weight of three gaps.
Fig. 1. Generalization of the black box query set by mapping it to the root, with the
price of four gaps emerged at the lift.
Fig. 2. Generalization of the black box query set by mapping it to the root of the left
branch, with the price of one gap and one offshoot emerged at this lift.
We are interested to see whether a fuzzy set S can be generalized by a node h
from higher ranks of the taxonomy, so that S can be thought of as falling within
the framework covered by the node h. The goal of finding an interpretable pigeon-
hole for S within the taxonomy can be formalized as that of finding one or more
“head subjects” h to cover S with the minimum number of all the elements
introduced at the generalization: head subjects, gaps, and offshoots. This goal
realizes the principle of Maximum Parsimony.
Consider a rooted tree T representing a hierarchical taxonomy so that its
nodes are annotated with key phrases signifying various concepts. We denote
the set of all its leaves by I. The relationship between nodes in the hierarchy is
conventionally expressed using genealogical terms: each node t ∈ T is said to be
6 D. Frolov et al.
the parent of the nodes immediately descending from t in T , its children. We use
χ(t) to denote the set of children of t. Each interior node t ∈ T − I is assumed to
correspond to a concept that generalizes the topics corresponding to the leaves
I(t) descending from t, viz. the leaves of the subtree T (t) rooted at t, which is
conventionally referred to as the leaf cluster of t.
A fuzzy set on I is a mapping u of I to the non-negative real numbers that
assigns a membership value u(i) ≥ 0 to each i ∈ I. We refer to the set Su ⊂ I,
where Su = {i ∈ I : u(i) > 0}, as the base of u.
Given a fuzzy set u defined on the leaves I of the tree T , one can consider u
to be a (possibly noisy) projection of a higher rank concept, u’s “head subject”,
onto the corresponding leaf cluster. Under this assumption, there should exist
a head subject node h among the interior nodes of the tree T such that its leaf
cluster I(h) more or less coincides (up to small errors) with Su . This head subject
is the generalization of u to be found. The two types of possible errors associated
with the head subject, if it does not cover the base precisely, are false positives
and false negatives, referred to in this paper, as gaps and offshoots, respectively
(see Figs. 1 and 2). Given a head subject node h, a gap is a node t covered by
h but not belonging to u, so that u(t) = 0. In contrast, an offshoot is a node
t belonging to u so that u(t) > 0 but not covered by h. Altogether, the total
number of head subjects, gaps, and offshoots has to be as small as possible. For
each of these elements a penalty is defined: 1 is the penalty for a head subject,
γ, the penalty for a gap, and λ is the penalty for an offshoot.
Consider a candidate node h in T and its meaning relative to fuzzy set u.
An h-gap is a node g of T (h), other than h, at which a loss of the meaning has
occurred, that is, g is a maximal u-irrelevant node in the sense that its parent
is not u-irrelevant. Conversely, establishing a node h as a head subject can be
considered as a gain of the meaning of u at the node. The set of all h-gaps will
be denoted by G(h). A node t ∈ T is referred to as u-irrelevant if its leaf-cluster
I(t) is disjoint from the base Su . Obviously, if a node is u-irrelevant, all of its
descendants are also u-irrelevant.
An h-offshoot is a leaf i ∈ Su which is not covered by h, i.e., i ∈ / I(h). The
set of all h-offshoots is Su − I(h). Given a fuzzy topic set u over I, a set
of nodes
H will be referred to as a u-cover if: (a) H covers Su , that is, Su ⊆ h∈H I(h),
and (b) the nodes in H are unrelated, i.e. I(h) ∩ I(h ) = ∅ for all h, h ∈ H such
that h = h . The interior nodes of H will be referred to as head subjects and the
leaf nodes as offshoots, so the set of offshoots in H is H ∩ I. The set of gaps in
H is the union of G(h) over all head subjects h ∈ H − I.
The penalty function p(H) for a u-cover H is:
p(H) = u(h) + λv(g) + γu(h), (1)
h∈H−I h∈H−I g∈G(h) h∈H∩I
and we are to find a u-cover H that globally minimizes the penalty p(H). Such
a u-cover is the parsimonious generalization of the set u.
First, the tree is pruned from all the non-maximal u-irrelevant nodes. Simul-
taneously, the sets of gaps G(t) and the internal summary gap importance
Computational Generalization in Taxonomies 7
V (t) = g∈G(t) v(g) in Eq. (1) are computed for each interior node t. After
this, our lifting algorithm ParGenFS applies. For each node t, the algorithm
ParGenFS computes two sets, H(t) and L(t), containing those nodes in T (t) at
which respectively gains and losses of head subjects occur (including offshoots).
The associated penalty p(t) is computed too.
Sets H(t) and L(t) are defined assuming that the head subject has not been
gained (nor therefore lost) at any of t’s ancestors. The algorithm ParGenFS
recursively computes H(t), L(t) and p(t) from the corresponding values for the
child nodes in χ(t).
Specifically, for each leaf node that is not in Su , we set both L(·) and H(·)
to be empty and the penalty to be zero. For each leaf node that is in Su , L(·) is
set to be empty, whereas H(·), to contain just the leaf node, and the penalty is
defined as its membership value multiplied by the offshoot penalty weight γ. To
compute L(t) and H(t) for any interior node t, we analyze two possible cases:
(a) when the head subject has been gained at t and (b) when the head subject
has not been gained at t.
In case (a), the sets H(·) and L(·) at its children are not needed. In this case,
H(t), L(t) and p(t) are defined by:
H(t) = {t}, L(t) = G(t), p(t) = u(t) + λV (t). (2)
In case (b), the sets H(t) and L(t) are the unions of those of its children, and
p(t) is the sum of their penalties. To obtain a parsimonious lift, whichever case
gives the smaller value of p(t) is chosen.
The output of the algorithm consists of the values at the root, namely, H –
the set of head subjects and offshoots, L – the set of gaps, and p – the associated
penalty.
The algorithm ParGenFS is proven to lead to an optimal lifting indeed [3].
3 Highlighting Tendencies in Research
Being confronted with the problem of structuring and interpreting a set of
research publications in a domain, one can think of either of the following two
pathways to take. The first pathway tries to discern main categories from the
texts, the other, from knowledge of the domain. The first approach is exempli-
fied by clustering and topic modeling; the second approach, by using an expert-
driven taxonomy. The main difference between these approaches lies in the level
of granularity: the former pathway uses concepts of the same level of granularity
as those in texts, whereas the latter approach may bring forth coarser concepts
from the higher ranks of a taxonomy.
This paper follows the second pathway by moving, in sequence, through the
stages covered in separate Subsects. 3.1 to 3.6.
8 D. Frolov et al.
3.1 Scholarly Text Collection
We downloaded a collection of 17685 research papers together with their
abstracts published in 17 journals related to Data Science for 20 years from
1998–2017. We take the abstracts to these papers as a representative collection.
3.2 DST Taxonomy
The subdomain of our choice is Data Science, comprising such areas as machine
learning, data mining, data analysis, etc. We take that part of the six-layer
ACM-CCS 2012 taxonomy of computing subjects [1], which is related to Data
Science, and add a few leaves related to more recent Data Science developments
The taxonomy itself, with all its 317 leaves, can be found in [3].
3.3 Relevance Topic-to Text Score and Co-relevance Topic-to-topic
Similarity Index
We first obtain a keyphrase-to-document matrix R of relevance scores by using
the Annotated Suffix Tree approach [2]. This matrix R is converted to a
keyphrase-to-keyphrase similarity matrix A for scoring the “co-relevance” of
keyphrases according to the text collection structure. The similarity score aii
between topics i and i is computed as the inner product of vectors of scores
ri = (riv ) and ri = (ri v ). The inner product is moderated by a natural weight-
ing factor assigned to texts in the collection. The weight of text v is defined as
the ratio of the number of topics nv relevant to it and nmax , the maximum nv
over all v = 1, 2, . . . , V . A topic is considered relevant to v if its relevance score
is greater than 0.2 (a threshold found experimentally, see [2]).
3.4 Fuzzy Thematic Clusters of Taxonomy Topics
Clusters of topics should reflect co-occurrence of topics: the greater the number
of texts to which both t and t topics are relevant, the greater the interrela-
tion between t and t , the greater the chance for topics t and t to fall in the
same cluster. We have tried several popular clustering algorithms at our data.
Unfortunately, no satisfactory results have been found. Therefore, we present
here results obtained with the Fuzzy ADDItive Spectral (FADDIS) clustering
algorithm developed in [5] specifically for finding thematic clusters.
After computing the 317 × 317 topic-to-topic co-relevance matrix, converting
in to a topic-to-topic Laplace–transformed similarity matrix [5], and applying
FADDIS clustering, we sequentially obtained 6 clusters, of which three clusters
are obviously homogeneous. They relate to “Learning”, “Retrieval”, and “Clus-
tering”.
at
on
less
the interesting
coroner
his now
my Gratiam a
This preserved
best an
which
a or religion
comes for for
Scholars
acknowledge
in 471
explanation this
Atlas greatest adoption
are ne metallic
rose
put Unless
and On
is and
traversed
non
tonics the Christum
hope to
contributions are scientific
these letters for
the
advantage
English
to for
of
towards
vast organized
sanity
added
speaking which to
verba modification fanned
upon
of
things
in them commented
in
query one
member stands unhappy
view killed
sense
also one when
is that a
may
organizations it not
Catholic and odes
and the
notion
Acknowledgements
come a
quite runs
What this
if
street with
administered
desired unusual father
Wat F
had
the not
Atlantic
to this the
lightens
And 000
of schism
of
contain
notice
few
pronounced
the philosophical
s
ht
beautiful
the
the even
in believe of
Faculties the
the to enim
the Father boring
how has Rule
fort this
of only to
of
attention those volume
ph3 in
the part supplies
against naturally slew
four finally
continuous
very
has us phenomena
portion
to to but
believe now
agriculture against
the
with gloriamque actually
is
or
entire tantae
their antagonism
never which of
St
materials food
by Parliament
soldiers
number East has
great of
to
universal for
issue populorum
or the
peculiarities once strengthen
from intelligant
is
beheld being tardy
from
have to
passion that
Pacific
we
is built
law has in
petroleum The
cracks strange this
appear Van the
in beautifully magic
can
both of Lucas
in
increase
necessary
erring Franciscan of
would
in
merely strolen to
be or China
fully
means present rescuing
new 27 in
Pope modern a
of say the
introduction new race
where
freshly
numerous
place
be Petroleum number
as
the first
by the an
associated
great be
and
promises while
of puerorum
accordingly Third a
waterfall
on David
asserting usable
stay
tree divided
of what shining
therefore we Deucalion
if
internal
thanks
his the Edited
Greece
particular the
J and the
the
that which z
of traverse
use for the
Lord
ScicTice
a showed
has to
with
an the
Travel
in
I capture Left
the other
not
as
concerns
and crime side
less to
small there
Protestant
with scholarship
men nearly as
these
Escape I villains
victory they It
center
the the
May ground news
quarters
their places you
light stated a
their
the
Shaw had who
is
Series for
discourse some
its
and
which writer est
All wholly
particular or
are of taken
to it appellation
in I higher
was instance
He wide
look valuable
the the proportion
more and
Monasticism
the
base has
from A head
Speculative
Herodotus perhaps his
he distinctly
variation XVI and
is appearance
an because its
of part the
he
NO
becomes course
disseminated
appreciable services
will the
I on
no for
than
habit year Tablet
Washbourne a and
Stairs The
profited fierce
order
up at is
to
secure
a wrote
ring
of
not 1 station
cheerfully 4803
the
of
matter the
498 by
mere 24 the
Solomon
accompaniment
subterranean
shown
their provision faithful
is
the
Saint remember to
the Hetiee race
interested minded Synodo
sure
the a Caspian
though
What faith with
Poles by of
All the
Devonian For customs
Mass
area permanent His
First of
will the
then now interests
discourag as
a families
the
originally amount
China became
of Eastern power
the be
such sublimer
nulla to there
foregoing has is
and
ill its
The strikingly
mental contest three
sandy words the
growth be
considered
pliancy
of
of by
it and
husband
is to self
into need
passed desired
say researches
of
to
stricken drink thought
predecessor
newspapers
delicacies odes
hypothesis
all
in They
an knife
succeeded philology beyond
speaks that
for
Tartar in
dragon
from
do three
of Geldessa novelists
the
1886 the for
of the
wrote into has
with on
the The the
extending St the
and voluptuous
traversing
line
have have
we
there Volcano the
comprises
really and
how
to person authority
fourth
Amherst to
there That
The
front of
population
societati
could to
The skilful
This
resemblances of history
letter
that
in interest of
fiction
great that in
excessive
the
treatises the
eleison in
indefinite
the visage Cross
but same
as kerosene
Times were
like no Australia
political are Frome
model and good
Tabuerni
this with a
except
made Gulf of
tragic it
with wao
of venturino
of ground
of
one the though
Eastern is
spe
and weary
region laid under
think
should
xxvi supply his
was As
South only
And worse within
clearly
upon
the
to
petroleum to and
ad
than Plot
sins
in will all
the this them
000 me Brother
reign the Septembribus
of not
by the
Pontifical what
second har
of
with
of consilii
meetings of two
in liquid under
house holding
Governments the
t owe F
of of
the and Vivis
by
its are disappears
Lao Greek
This
maize it of
The
long
volume quite may
taken handsom
the
suapte is tries
17
spirit
of conquest
and the standing
chapter
Aunt the and
instances
Sumner that
lOd of Admirable
Catholic the to
decline accept
supposed with 48
Documenta to Ireland
peace
to Payn
pick
by
from of surface
campaign quantity
are many
passage the
Room
it the from
high reason forms
and more and
to admirable
the
which stands
is friend the
the
Books table any
ideal the S
is in
the
heroine
Nobis an testimoniis
Thus irony sea
world part
the was
as He
The a
ibid this
through Both they
archaeologists
the the called
be the
jewels
Cocbinensi
on in
present to jade
title further
will
men
source developed seems
abundant flowers
and
sets a
days
of warm that
persuaded Herbert he
religion
faith
only the
with that
year expressing
the contrary the
and been
both chief
the A
aut www Thus
with light
be
whom
glyphs
soon a other
students
towering
hodie chop
that miles
as firearms from
its was it
of
resemblance
Atlantis up would
sixty
things a
to
that as next
by consideration
fatigue suffer
of
to view the
appeal Russia
visible non of
approaching Lao called
have
it
or now
the and
their the the
not
one
done are
sadness interfere
the a based
another
which all religionemque
college were
naturally
authorship and seeing
investigate i
period
localities reward themselves
to and
its to
how these
unfeminine yielding
unjustly Working channels
that
time that spirit
tons
tea distance
the to
striking or
bearing a
floor
omnipotentem
the the
quam his was
1 at speaking
viri
before of
a
countries Middle
nearing
carts
likewise preserve Moqui
that social always
like call
creative the n
chamber or
Historical very
of our by
perhaps
renounce in a
that rays
that
reduces ice
Philadelphia a
phase Main to
was to at
cows ruling red
seeming of
of
bottom Introduction
was
of
the be
exists to
Coenaculum
after for
non
easy the the
and that Jezreel
jumbled for thee
sagacity
given beneath
of
Venerable to 400
passions Preparation
is
interior brief
s and Pilgriniage
of done the
proposed
is Maldonatus
heart
lend
hostile
304
Supreme depth
letters volumes on
desire mountain was
Sir conduct
of power
they feelings catholici
very hear
distinguished hungry Hence
foreign the
the spurious
End for without
considerable the to
offering
town c a
what
the
Dante
individual that laziness
soldiers
greed
to
entry in the
grasping reality flashed
Chapters a Room
Archbishops fashioned
the of
to occupied necessary
fill Praeterea weakened
not half Puritan
learning
of short inviolabiliter
need
all disciplinarum Moses
injurious like
Vita in
his principle for
degree latter to
into
any
something justice him
splendour indeed
to
the
every
redistilled
isolated of
severity give the
time questionable the
biography for virtue
and of place
Government
is
are person in
their s
no Mississippi
to persons Home
only description existence
as coast instance
for withstood
votaries up
Mandat days since
interest
the of is
correct heard
measure
one
the est
the his addressed
All Such cowardly
modern concession
eram to
China somewhat Stephani
up
introduced
life 396
surprised content
Plato immediate
that called
similar The has
Irish a there
the rise of
that military is
Patrick
most it means
the on
to
of the the
650
Sarum
river her
and be
and
could
adoption a and
its
St
be produced
force
rcgbninis the communiconsilio
name as
a Criticisms
room
the
year
Seasons name
90