0% found this document useful (0 votes)
26 views132 pages

Intelligent Data Engineering and Automated Learning IDEAL 2019 20th International Conference Manchester UK November 14 16 2019 Proceedings Part II Hujun Yin PDF Download

The document details the proceedings of the 20th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2019), held in Manchester, UK, from November 14 to 16, 2019. It includes 94 accepted papers covering topics such as big data challenges, machine learning, and AI, showcasing the latest advancements in these fields. The conference was co-sponsored by various institutions and featured keynotes from prominent leaders in data science and machine learning.

Uploaded by

qdlcfceuc895
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views132 pages

Intelligent Data Engineering and Automated Learning IDEAL 2019 20th International Conference Manchester UK November 14 16 2019 Proceedings Part II Hujun Yin PDF Download

The document details the proceedings of the 20th International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2019), held in Manchester, UK, from November 14 to 16, 2019. It includes 94 accepted papers covering topics such as big data challenges, machine learning, and AI, showcasing the latest advancements in these fields. The conference was co-sponsored by various institutions and featured keynotes from prominent leaders in data science and machine learning.

Uploaded by

qdlcfceuc895
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 132

Intelligent Data Engineering and Automated

Learning IDEAL 2019 20th International


Conference Manchester UK November 14 16 2019
Proceedings Part II Hujun Yin pdf download

https://textbookfull.com/product/intelligent-data-engineering-and-automated-learning-
ideal-2019-20th-international-conference-manchester-uk-november-14-16-2019-proceedings-part-ii-
hujun-yin/
★★★★★ 4.6/5.0 (33 reviews) ✓ 71 downloads ■ TOP RATED
"Great resource, downloaded instantly. Thank you!" - Lisa K.

DOWNLOAD EBOOK
Intelligent Data Engineering and Automated Learning IDEAL
2019 20th International Conference Manchester UK November 14
16 2019 Proceedings Part II Hujun Yin pdf download

TEXTBOOK EBOOK TEXTBOOK FULL

Available Formats

■ PDF eBook Study Guide TextBook

EXCLUSIVE 2025 EDUCATIONAL COLLECTION - LIMITED TIME

INSTANT DOWNLOAD VIEW LIBRARY


Collection Highlights

Intelligent Data Engineering and Automated Learning IDEAL


2019 20th International Conference Manchester UK November
14 16 2019 Proceedings Part I Hujun Yin

Intelligent Data Engineering and Automated Learning IDEAL


2018 19th International Conference Madrid Spain November
21 23 2018 Proceedings Part II Hujun Yin

Intelligent Data Engineering and Automated Learning IDEAL


2018 19th International Conference Madrid Spain November
21 23 2018 Proceedings Part I Hujun Yin

Intelligent Data Engineering and Automated Learning IDEAL


2014 15th International Conference Salamanca Spain
September 10 12 2014 Proceedings 1st Edition Emilio
Corchado
Creativity in Intelligent Technologies and Data Science
Third Conference CIT DS 2019 Volgograd Russia September 16
19 2019 Proceedings Part II Alla G. Kravets

Intelligence Science and Big Data Engineering Big Data and


Machine Learning 9th International Conference IScIDE 2019
Nanjing China October 17 20 2019 Proceedings Part II Zhen
Cui

Web Information Systems Engineering WISE 2019 20th


International Conference Hong Kong China November 26 30
2019 Proceedings Reynold Cheng

Creativity in Intelligent Technologies and Data Science


Third Conference CIT DS 2019 Volgograd Russia September 16
19 2019 Proceedings Part I Alla G. Kravets

Advances in Knowledge Discovery and Data Mining 23rd


Pacific Asia Conference PAKDD 2019 Macau China April 14 17
2019 Proceedings Part II Qiang Yang
Hujun Yin · David Camacho ·
Peter Tino · Antonio J. Tallón-Ballesteros ·
Ronaldo Menezes · Richard Allmendinger (Eds.)

Intelligent
Data Engineering and
LNCS 11872

Automated Learning –
IDEAL 2019
20th International Conference
Manchester, UK, November 14–16, 2019
Proceedings, Part II
Lecture Notes in Computer Science 11872

Founding Editors
Gerhard Goos
Karlsruhe Institute of Technology, Karlsruhe, Germany
Juris Hartmanis
Cornell University, Ithaca, NY, USA

Editorial Board Members


Elisa Bertino
Purdue University, West Lafayette, IN, USA
Wen Gao
Peking University, Beijing, China
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Gerhard Woeginger
RWTH Aachen, Aachen, Germany
Moti Yung
Columbia University, New York, NY, USA
More information about this series at http://www.springer.com/series/7409
Hujun Yin David Camacho
• •

Peter Tino Antonio J. Tallón-Ballesteros


• •

Ronaldo Menezes Richard Allmendinger (Eds.)


Intelligent
Data Engineering and
Automated Learning –
IDEAL 2019
20th International Conference
Manchester, UK, November 14–16, 2019
Proceedings, Part II

123
Editors
Hujun Yin David Camacho
University of Manchester Technical University of Madrid
Manchester, UK Madrid, Spain
Peter Tino Antonio J. Tallón-Ballesteros
University of Birmingham University of Huelva
Birmingham, UK Huelva, Spain
Ronaldo Menezes Richard Allmendinger
University of Exeter University of Manchester
Exeter, UK Manchester, UK

ISSN 0302-9743 ISSN 1611-3349 (electronic)


Lecture Notes in Computer Science
ISBN 978-3-030-33616-5 ISBN 978-3-030-33617-2 (eBook)
https://doi.org/10.1007/978-3-030-33617-2
LNCS Sublibrary: SL3 – Information Systems and Applications, incl. Internet/Web, and HCI

© Springer Nature Switzerland AG 2019


This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, expressed or implied, with respect to the material contained herein or for any errors or
omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface

This year saw the 20th edition of the International Conference on Intelligent Data
Engineering and Automated Learning (IDEAL 2019), held for the second time in
Manchester, UK – the birthplace of one of the world’s first electronic computers as well
as artificial intelligence (AI) marked by Alan Turing’s seminal and pioneering work.
The IDEAL conference has been serving its unwavering role in data analytics and
machine learning for the last 20 years. It strives to provide an ideal platform for the
scientific communities and researchers from near and far to exchange latest findings,
disseminate cutting-edge results, and to forge alliances on tackling many real-world
challenging problems. The core themes of the IDEAL 2019 include big data
challenges, machine learning, data mining, information retrieval and management,
bio-/neuro-informatics, bio-inspired models (including neural networks, evolutionary
computation, and swarm intelligence), agents and hybrid intelligent systems, real-world
applications of intelligent techniques, and AI.
In total, 149 submissions were received and subsequently underwent rigorous peer
reviews by the Program Committee members and experts. Only the papers judged to be
of the highest quality and novelty were accepted and included in the proceedings.
These volumes contain 94 papers (58 for the main track and 36 for special sessions)
accepted and presented at IDEAL 2019, held during November 14–16, 2019, at the
University of Manchester, Manchester, UK. These papers provided a timely snapshot
of the latest topics and advances in data analytics and machine learning, from
methodologies, frameworks, and algorithms to applications. IDEAL 2019 enjoyed
outstanding keynotes from leaders in the field, Thomas Bäck of Leiden University and
Damien Coyle of University of Ulster, and an inspiring tutorial from Peter Tino of
University of Birmingham.
IDEAL 2019 was hosted by the University Manchester and was co-sponsored by the
Alan Turing Institute and Manchester City Council. It was also technically
co-sponsored by the IEEE Computational Intelligence Society UK and Ireland Chapter.
We would like to thank our sponsors for their financial and technical support. We
would also like to thank all the people who devoted so much time and effort to the
successful running of the conference, in particular the members of the Program
Committee and reviewers, organizers of the special sessions, as well as the authors who
contributed to the conference. We are also very grateful to the hard work by the local
Organizing Committee at the University of Manchester, in particular, Yao Peng and
vi Preface

Jingwen Su for checking through all the camera-ready files. Continued support,
sponsorship, and collaboration from Springer LNCS are also greatly appreciated.

September 2019 Hujun Yin


David Camacho
Peter Tino
Antonio J. Tallón-Ballesteros
Ronaldo Menezes
Richard Allmendinger
Organization

General Chairs
Hujun Yin University of Manchester, UK
David Camacho Universidad Politecnica de Madrid, Spain
Peter Tino University of Birmingham, UK

Programme Co-chairs
Hujun Yin University of Manchester, UK
David Camacho Universidad Politecnica de Madrid, Spain
Antonio University of Huelva, Spain
J. Tallón-Ballesteros
Peter Tino University of Birmingham, UK
Ronaldo Menezes University of Exeter, UK
Richard Allmendinger University of Manchester, UK

International Advisory Committee


Lei Xu Chinese University of Hong Kong, Hong Kong, China,
and Shanghai Jiaotong University, China
Yaser Abu-Mostafa CALTECH, USA
Shun-ichi Amari RIKEN, Japan
Michael Dempster University of Cambridge, UK
Francisco Herrera Autonomous University of Madrid, Spain
Nick Jennings University of Southampton, UK
Soo-Young Lee KAIST, South Korea
Erkki Oja Helsinki University of Technology, Finland
Latit M. Patnaik Indian Institute of Science, India
Burkhard Rost Columbia University, USA
Xin Yao Southern University of Science and Technology,
China, and University of Birmingham, UK

Steering Committee
Hujun Yin (Chair) University of Manchester, UK
Laiwan Chan (Chair) Chinese University of Hong Kong, Hong Kong, China
Guilherme Barreto Federal University of Ceará, Brazil
Yiu-ming Cheung Hong Kong Baptist University, Hong Kong, China
Emilio Corchado University of Salamanca, Spain
Jose A. Costa Federal University of Rio Grande do Norte, Brazil
Marc van Hulle KU Leuven, Belgium
viii Organization

Samuel Kaski Aalto University, Finland


John Keane University of Manchester, UK
Jimmy Lee Chinese University of Hong Kong, Hong Kong, China
Malik Magdon-Ismail Rensselaer Polytechnic Institute, USA
Peter Tino University of Birmingham, UK
Zheng Rong Yang University of Exeter, UK
Ning Zhong Maebashi Institute of Technology, Japan

Publicity Co-chairs/Liaisons
Jose A. Costa Federal University of Rio Grande do Norte, Brazil
Bin Li University of Science and Technology of China, China
Yimin Wen Guilin University of Electronic Technology, China

Local Organizing Committee

Hujun Yin Richard Allmendinger


Richard Hankins Yao Peng
Ananya Gupta Jingwen Su
Mengyu Liu

Program Committee

Ajith Abraham Josep Carmona


Jesus Alcala-Fdez Mercedes Carnero
Richardo Aler Carlos Carrascosa
Davide Anguita Andre de Carvalho
Anastassia Angelopoulou Joao Carvalho
Ángel Arcos-Vargas Pedro Castillo
Romis Attux Luís Cavique
Martin Atzmueller Darryl Charles
Dariusz Barbucha Richard Chbeir
Mahmoud Barhamgi Songcan Chen
Bruno Baruque Xiaohong Chen
Carmelo Bastos Filho Sung-Bae Cho
Lordes Borrajo Stelvio Cimato
Zoran Bosnic Manuel Jesus Cobo Martin
Vicent Botti Leandro Coelho
Edyta Brzychczy Carlos Coello Coello
Fernando Buarque Roberto Confalonieri
Andrea Burattin Rafael Corchuelo
Robert Burduk Francesco Corona
Aleksander Byrski Nuno Correia
Heloisa Camargo Luís Correia
Organization ix

Paulo Cortez Álvaro Herrero


Jose Alfredo F. Costa J. Michael Herrmann
Carlos Cotta Ignacio Hidalgo
Raúl Cruz-Barbosa James Hogan
Alfredo Cuzzocrea Jaakko Hollmén
Bogusław Cyganek Wei-Chiang Samuelson Hong
Ireneusz Czarnowski Vahid Jalali
Ernesto Damiani Dariusz Jankowski
Amit Kumar Das Piotr Jedrzejowicz
Bernard De Baets Vicente Julian
Javier Del Ser Rushed Kanawati
Boris Delibašić Mario Koeppen
Fernando Díaz Mirosław Kordos
Juan Manuel Dodero Marcin Korzeń
Jose Dorronsoro Dariusz Krol
Dinu Dragan Pawel Ksieniewics
Gérard Dreyfus Raul Lara-Cabrera
Jochen Einbeck Bin Li
Florentino Fdez-Riverola Lei Liu
Joaquim Filipe Wenjian Luo
Juan J. Flores José F. Martínez-Trinidad
Simon James Fong Giancarlo Mauri
Pawel Forczmanski Cristian Mihaescu
Giancarlo Fortino Boris Mirkin
Felipe M. G. França José M. Molina
Dariusz Frejlichowski João Mp Cardoso
Hamido Fujita Grzegorz J. Nalepa
Marcus Gallagher Valery Naranjo
Yang Gao Susana Nascimento
Salvador Garcia Tim Nattkemper
Pablo García Sánchez Antonio Neme
Luis Javier Garcia Villalba Rui Neves Madeira
María José Ginzo Villamayor Ngoc-Thanh Nguyen
Fernando Gomide Paulo Novais
Antonio Gonzalez-Pardo Fernando Nuñez
Anna Gorawska Ivan Olier-Caparroso
Marcin Gorawski Eva Onaindia
Juan Manuel Górriz Sandra Ortega-Martorell
Manuel Graña Vasile Palade
Maciej Grzenda Jose Palma
Jerzy Grzymala-Busse Juan Pavón
Pedro Antonio Gutierrez Yao Peng
Barbara Hammer Carlos Pereira
Julia Handl Barbara Pes
Richard Hankins Marco Platzner
Ioannis Hatzilygeroudis Paulo Quaresma
x Organization

Cristian Ramírez-Atencia Pawel Trajdos


Ajalmar Rêgo Da Rocha Neto Carlos M. Travieso-González
Izabela Rejer Bogdan Trawinski
Victor Rodriguez Fernandez Milan Tuba
Matilde Santos Turki Turki
Pedro Santos Eiji Uchino
Jose Santos Carlos Usabiaga Ibáñez
Rafal Scherer José Valente de Oliveira
Ivan Silva Alfredo Vellido
Leandro A. Silva Juan G. Victores
Dragan Simic José R. Villar
Anabela Simões Lipo Wang
Marcin Szpyrka Tzai-Der Wang
Jesús Sánchez-Oro Dongqing Wei
Ying Tan Raymond Kwok-Kay Wong
Qing Tian Michal Wozniak
Renato Tinós Xin-She Yang
Stefania Tomasiello Huiyu Zhou

Additional Reviewers

Sabri Allani Hongmei He


Ray Baishkhi Eloy Irigoyen
Samik Banerjee Suman Jana
Avishek Bhattacharjee Ian Jarman
Nurul E’zzati Binti Md Isa Jörg Keller
Daniele Bortoluzzi Bartosz Krawczyk
Anna Burduk Weikai Li
Jose Luis Calvo Rolle Mengyu Liu
Walmir Caminnhas Nicolás Marichal
Meng Cao Wojciech Mazurczyk
Hugo Carneiro Philippa Grace McCabe
Giovanna Castellano Eneko Osaba
Hubert Cecotti Kexin Pei
Carl Chalmers Mark Pieroni
Lei Chen Alice Plebe
Zonghai Chen Girijesh Prasad
Antonio Maria Chiarelli Sergey Prokudin
Bernat Coma-Puig Yu Qiao
Gabriela Czanner Juan Rada-Vilela
Mauro Da Lio Dheeraj Rathee
Sukhendu Das Haider Raza
Klaus Dietmayer Yousef Rezaei Tabar
Paul Fergus Patrick Riley
Róża Goścień Gastone Pietro Rosati Papini
Ugur Halici Salma Sassi
Organization xi

Ivo Siekmann Marley Vellasco


Mônica Ferreira Da Silva Yu Yu
Kacper Sumera Rita Zgheib
Yuchi Tian Shang-Ming Zhou
Mariusz Topolski

Special Session on Fuzzy Systems and Intelligent Data Analysis


Organizers
Susana Nascimento NOVA University, Portugal
Antonio University of Huelva, Spain
J. Tallón-Ballesteros
José Valente de Oliveira University of Algarve, Portugal
Boris Mirkin National Research University Moscow, Russian

Special Session on Machine Learning towards Smarter


Multimodal Systems
Organizers
Nuno Correia NOVA University of Lisboa, Portugal
Rui Neves Madeira NOVA University of Lisboa, Polytechnic Institute
of Setúbal, Portugal
Susana Nascimento NOVA University, Portugal

Special Session on Data Selection in Machine Learning


Organizers
Antonio University of Huelva, Spain
J. Tallón-Ballesteros
Raymond Kwok-Kay Wong University of New South Wales, Australia
Ireneusz Czarnowski University of Novi Sad, Serbia
Simon James Fong University of Macau, Macau, China

Special Session on Machine Learning in Healthcare


Organizers
Ivan Olier Liverpool John Moores University, UK
Sandra Ortega-Martorell Liverpool John Moores University, UK
Paulo Lisboa Liverpool John Moores University, UK
Alfredo Vellido Universitat Politècnica de Catalunya, Spain

Special Session on Machine Learning in Automatic Control


Organizers
Matilde Santos University Complutense of Madrid, Spain
Juan G. Victores University Carlos III of Madrid, Spain
xii Organization

Special Session on Finance and Data Mining


Organizers
Fernando Núñez Hernández University of Seville, Spain
Antonio University of Huelva, Spain
J. Tallón-Ballesteros
Paulo Vasconcelos University of Porto, Portugal
Ángel Arcos-Vargas University of Seville, Spain

Special Session on Knowledge Discovery from Data


Organizers
Barbara Pes University of Cagliari, Italy
Antonio University of Huelva, Spain
J. Tallón-Ballesteros
Julia Handl University of Manchester, UK

Special Session on Machine Learning Algorithms


for Hard Problems
Organizers
Pawel Ksieniewicz Wroclaw University of Science and Technology,
Poland
Robert Burduk Wroclaw University of Science and Technology,
Poland
Contents – Part II

Special Session on Fuzzy Systems and Intelligent Data Analysis

Computational Generalization in Taxonomies Applied to:


(1) Analyze Tendencies of Research and (2) Extend User Audiences . . . . . . . 3
Dmitry Frolov, Susana Nascimento, Trevor Fenner, Zina Taran,
and Boris Mirkin

Unsupervised Initialization of Archetypal Analysis and Proportional


Membership Fuzzy Clustering. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Susana Nascimento and Nuno Madaleno

Special Session on Machine Learning Towards Smarter


Multimodal Systems

Multimodal Web Based Video Annotator with Real-Time Human


Pose Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Rui Rodrigues, Rui Neves Madeira, Nuno Correia, Carla Fernandes,
and Sara Ribeiro

New Interfaces for Classifying Performance Gestures in Music . . . . . . . . . . . 31


Chris Rhodes, Richard Allmendinger, and Ricardo Climent

Special Session on Data Selection in Machine Learning

Classifying Ransomware Using Machine Learning Algorithms . . . . . . . . . . . 45


Samuel Egunjobi, Simon Parkinson, and Andrew Crampton

Artificial Neural Networks in Mathematical Mini-Games for Automatic


Students’ Learning Styles Identification: A First Approach . . . . . . . . . . . . . . 53
Richard Torres-Molina, Jorge Banda-Almeida,
and Lorena Guachi-Guachi

The Use of Unified Activity Records to Predict Requests Made


by Applications for External Services . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Maciej Grzenda, Robert Kunicki, and Jaroslaw Legierski

Fuzzy Clustering Approach to Data Selection for Computer Usage


in Headache Disorders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Svetlana Simić, Ljiljana Radmilo, Dragan Simić, Svetislav D. Simić,
and Antonio J. Tallón-Ballesteros
xiv Contents – Part II

Multitemporal Aerial Image Registration Using Semantic Features . . . . . . . . 78


Ananya Gupta, Yao Peng, Simon Watson, and Hujun Yin

Special Session on Machine Learning in Healthcare

Brain Tumor Classification Using Principal Component Analysis


and Kernel Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Richard Torres-Molina, Carlos Bustamante-Orellana,
Andrés Riofrío-Valdivieso, Francisco Quinga-Socasi, Robinson Guachi,
and Lorena Guachi-Guachi

Modelling Survival by Machine Learning Methods in Liver


Transplantation: Application to the UNOS Dataset . . . . . . . . . . . . . . . . . . . 97
David Guijo-Rubio, Pedro J. Villalón-Vaquero, Pedro A. Gutiérrez,
Maria Dolores Ayllón, Javier Briceño, and César Hervás-Martínez

Design and Development of an Automatic Blood Detection System


for Capsule Endoscopy Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
Pedro Pons, Reinier Noorda, Andrea Nevárez, Adrián Colomer,
Vicente Pons Beltrán, and Valery Naranjo

Comparative Analysis for Computer-Based Decision Support: Case Study


of Knee Osteoarthritis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
Philippa Grace McCabe, Ivan Olier, Sandra Ortega-Martorell,
Ian Jarman, Vasilios Baltzopoulos, and Paulo Lisboa

A Clustering-Based Patient Grouper for Burn Care . . . . . . . . . . . . . . . . . . . 123


Chimdimma Noelyn Onah, Richard Allmendinger, Julia Handl,
Paraskevas Yiapanis, and Ken W. Dunn

A Comparative Assessment of Feed-Forward and Convolutional Neural


Networks for the Classification of Prostate Lesions . . . . . . . . . . . . . . . . . . . 132
Sabrina Marnell, Patrick Riley, Ivan Olier, Marc Rea,
and Sandra Ortega-Martorell

Special Session on Machine Learning in Automatic Control

A Method Based on Filter Bank Common Spatial Pattern for Multiclass


Motor Imagery BCI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
Ziqing Xia, Likun Xia, and Ming Ma

Safe Deep Neural Network-Driven Autonomous Vehicles Using Software


Safety Cages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Sampo Kuutti, Richard Bowden, Harita Joshi, Robert de Temple,
and Saber Fallah
Contents – Part II xv

Wave and Viscous Resistance Estimation by NN . . . . . . . . . . . . . . . . . . . . 161


D. Marón and M. Santos

Neural Controller of UAVs with Inertia Variations . . . . . . . . . . . . . . . . . . . 169


J. Enrique Sierra-Garcia, Matilde Santos, and Juan G. Victores

Special Session on Finance and Data Mining

A Metric Framework for Quantifying Data Concentration . . . . . . . . . . . . . . 181


Peter Mitic

Adaptive Machine Learning-Based Stock Prediction Using Financial Time


Series Technical Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
Ahmed K. Taha, Mohamed H. Kholief, and Walid AbdelMoez

Special Session on Knowledge Discovery from Data

Exploiting Online Newspaper Articles Metadata for Profiling City Areas . . . . 203
Livio Cascone, Pietro Ducange, and Francesco Marcelloni

Modelling the Social Interactions in Ant Colony Optimization . . . . . . . . . . . 216


Nishant Gurrapadi, Lydia Taw, Mariana Macedo, Marcos Oliveira,
Diego Pinheiro, Carmelo Bastos-Filho, and Ronaldo Menezes

An Innovative Deep-Learning Algorithm for Supporting the Approximate


Classification of Workloads in Big Data Environments . . . . . . . . . . . . . . . . 225
Alfredo Cuzzocrea, Enzo Mumolo, Carson K. Leung,
and Giorgio Mario Grasso

Control-Flow Business Process Summarization via Activity Contraction. . . . . 238


Valeria Fionda and Gianluigi Greco

Classifying Flies Based on Reconstructed Audio Signals . . . . . . . . . . . . . . . 249


Michael Flynn and Anthony Bagnall

Studying the Evolution of the ‘Circular Economy’ Concept Using


Topic Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259
Sampriti Mahanty, Frank Boons, Julia Handl, and Riza Batista-Navarro

Mining Frequent Distributions in Time Series . . . . . . . . . . . . . . . . . . . . . . . 271


José Carlos Coutinho, João Mendes Moreira, and Cláudio Rebelo de Sá

Time Series Display for Knowledge Discovery on Selective


Laser Melting Machines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 280
Ramón Moreno, Juan Carlos Pereira, Alex López, Asif Mohammed,
and Prasha Pahlevannejad
xvi Contents – Part II

Special Session on Machine Learning Algorithms for Hard Problems

Using Prior Knowledge to Facilitate Computational Reading


of Arabic Calligraphy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
Seetah ALSalamah, Riza Batista-Navarro, and Ross D. King

SMOTE Algorithm Variations in Balancing Data Streams . . . . . . . . . . . . . . 305


Bogdan Gulowaty and Paweł Ksieniewicz

Multi-class Text Complexity Evaluation via Deep Neural Networks . . . . . . . 313


Alfredo Cuzzocrea, Giosué Lo Bosco, Giovanni Pilato,
and Daniele Schicchi

Imbalance Reduction Techniques Applied to ECG Classification Problem . . . 323


Jȩdrzej Kozal and Paweł Ksieniewicz

Machine Learning Methods for Fake News Classification. . . . . . . . . . . . . . . 332


Paweł Ksieniewicz, Michał Choraś, Rafał Kozik, and Michał Woźniak

A Genetic-Based Ensemble Learning Applied to Imbalanced Data


Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
Jakub Klikowski, Paweł Ksieniewicz, and Michał Woźniak

The Feasibility of Deep Learning Use for Adversarial Model Extraction


in the Cybersecurity Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
Michał Choraś, Marek Pawlicki, and Rafał Kozik

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 361


Contents – Part I

Orchids Classification Using Spatial Transformer Network


with Adaptive Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Watcharin Sarachai, Jakramate Bootkrajang, Jeerayut Chaijaruwanich,
and Samerkae Somhom

Scalable Dictionary Classifiers for Time Series Classification . . . . . . . . . . . . 11


Matthew Middlehurst, William Vickers, and Anthony Bagnall

Optimization of the Numeric and Categorical Attribute Weights


in KAMILA Mixed Data Clustering Algorithm . . . . . . . . . . . . . . . . . . . . . . 20
Nádia Junqueira Martarelli and Marcelo Seido Nagano

Meaningful Data Sampling for a Faithful Local Explanation Method . . . . . . . 28


Peyman Rasouli and Ingrid Chieh Yu

Classifying Prostate Histological Images Using Deep Gaussian Processes


on a New Optical Density Granulometry-Based Descriptor . . . . . . . . . . . . . . 39
Miguel López-Pérez, Adrián Colomer, María A. Sales, Rafael Molina,
and Valery Naranjo

Adaptive Orthogonal Characteristics of Bio-inspired Neural Networks . . . . . . 47


Naohiro Ishii, Toshinori Deguchi, Masashi Kawaguchi, Hiroshi Sasaki,
and Tokuro Matsuo

Using Deep Learning for Ordinal Classification of Mobile Marketing


User Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Luís Miguel Matos, Paulo Cortez, Rui Castro Mendes,
and Antoine Moreau

Modeling Data Driven Interactions on Property Graph . . . . . . . . . . . . . . . . . 68


Worapol Alex Pongpech

Adaptive Dimensionality Adjustment for Online “Principal


Component Analysis” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Nico Migenda, Ralf Möller, and Wolfram Schenck

Relevance Metric for Counterfactuals Selection in Decision Trees . . . . . . . . . 85


Rubén R. Fernández, Isaac Martín de Diego, Víctor Aceña,
Javier M. Moguerza, and Alberto Fernández-Isabel

Weighted Nearest Centroid Neighbourhood . . . . . . . . . . . . . . . . . . . . . . . . 94


Víctor Aceña, Javier M. Moguerza, Isaac Martín de Diego,
and Rubén R. Fernández
xviii Contents – Part I

The Prevalence of Errors in Machine Learning Experiments . . . . . . . . . . . . . 102


Martin Shepperd, Yuchen Guo, Ning Li, Mahir Arzoky,
Andrea Capiluppi, Steve Counsell, Giuseppe Destefanis,
Stephen Swift, Allan Tucker, and Leila Yousefi

A Hybrid Model for Fraud Detection on Purchase Orders . . . . . . . . . . . . . . 110


William Ferreira Moreno Oliverio, Allan Barcelos Silva,
Sandro José Rigo, and Rodolpho Lopes Bezerra da Costa

Users Intention Based on Twitter Features Using Text Analytics. . . . . . . . . . 121


Qadri Mishael, Aladdin Ayesh, and Iryna Yevseyeva

Mixing Hetero- and Homogeneous Models in Weighted Ensembles. . . . . . . . 129


James Large and Anthony Bagnall

A Hybrid Approach to Time Series Classification with Shapelets . . . . . . . . . 137


David Guijo-Rubio, Pedro A. Gutiérrez, Romain Tavenard,
and Anthony Bagnall

An Ensemble Algorithm Based on Deep Learning


for Tuberculosis Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
Alfonso Hernández, Ángel Panizo, and David Camacho

A Data-Driven Approach to Automatic Extraction of Professional


Figure Profiles from Résumés. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Alessandro Bondielli and Francesco Marcelloni

Retrieving and Processing Information from Clinical Algorithm


via Formal Concept Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
Aleksandra Vatian, Anna Tatarinova, Svyatoslav Osipov,
Nikolai Egorov, Vitalii Boitsov, Elena Ryngach, Tatiana Treshkur,
Anatoly Shalyto, and Natalia Gusarova

Comparative Analysis of Approaches to Building Medical Dialog


Systems in Russian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175
Aleksandra Vatian, Natalia Dobrenko, Nikolai Andreev,
Aleksandr Nemerovskii, Anastasia Nevochhikova, and Natalia Gusarova

Tracking Position and Status of Electric Control Switches Based


on YOLO Detector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
Xingang Mou, Jian Cui, Hujun Yin, and Xiao Zhou

A Self-generating Prototype Method Based on Information Entropy


Used for Condensing Data in Classification Tasks. . . . . . . . . . . . . . . . . . . . 195
Alberto Manastarla and Leandro A. Silva
Contents – Part I xix

Transfer Knowledge Between Sub-regions for Traffic Prediction


Using Deep Learning Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
Yi Ren and Kunqing Xie

Global Q-Learning Approach for Power Allocation


in Femtocell Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
Abdulmajeed M. Alenezi and Khairi Hamdi

Deep Learning and Sensor Fusion Methods for Studying Gait Changes
Under Cognitive Load in Males and Females . . . . . . . . . . . . . . . . . . . . . . . 229
Abdullah S. Alharthi and Krikor B. Ozanyan

Towards a Robotic Personal Trainer for the Elderly. . . . . . . . . . . . . . . . . . . 238


J. A. Rincon, A. Costa, P. Novais, V. Julian, and C. Carrascosa

Image Quality Constrained GAN for Super-Resolution. . . . . . . . . . . . . . . . . 247


Jingwen Su, Yao Peng, and Hujun Yin

Use Case Prediction Using Product Reviews Text Classification . . . . . . . . . . 257


Tinashe Wamambo, Cristina Luca, and Arooj Fatima

Convolutional Neural Network for Core Sections Identification


in Scientific Research Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Bello Aliyu Muhammad, Rahat Iqbal, Anne James,
and Dianabasi Nkantah

Knowledge Inference Through Analysis of Human Activities . . . . . . . . . . . . 274


Leandro O. Freitas, Pedro R. Henriques, and Paulo Novais

Representation Learning of Knowledge Graphs with Multi-scale


Capsule Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
Jingwei Cheng, Zhi Yang, Jinming Dang, Chunguang Pan,
and Fu Zhang

CNNPSP: Pseudouridine Sites Prediction Based on Deep Learning . . . . . . . . 291


Yongxian Fan, Yongzhen Li, Huihua Yang, and Xiaoyong Pan

A Multimodal Approach to Image Sentiment Analysis . . . . . . . . . . . . . . . . . 302


António Gaspar and Luís A. Alexandre

Joining Items Clustering and Users Clustering for Evidential


Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
Raoua Abdelkhalek, Imen Boukhris, and Zied Elouedi

Conditioned Generative Model via Latent Semantic Controlling


for Learning Deep Representation of Data . . . . . . . . . . . . . . . . . . . . . . . . . 319
Jin-Young Kim and Sung-Bae Cho
xx Contents – Part I

Toward a Framework for Seasonal Time Series Forecasting


Using Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
Colin Leverger, Simon Malinowski, Thomas Guyet, Vincent Lemaire,
Alexis Bondu, and Alexandre Termier

An Evidential Imprecise Answer Aggregation Approach Based


on Worker Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
Lina Abassi and Imen Boukhris

Combining Machine Learning and Classical Optimization Techniques


in Vehicle to Vehicle Communication Network. . . . . . . . . . . . . . . . . . . . . . 350
Mutasem Hamdan and Khairi Hamdi

Adversarial Edit Attacks for Tree Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 359


Benjamin Paaßen

Non-stationary Noise Cancellation Using Deep Autoencoder Based


on Adversarial Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
Kyung-Hyun Lim, Jin-Young Kim, and Sung-Bae Cho

A Deep Learning-Based Surface Defect Inspection System


for Smartphone Glass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375
Gwang-Myong Go, Seok-Jun Bu, and Sung-Bae Cho

Superlinear Speedup of Parallel Population-Based Metaheuristics:


A Microservices and Container Virtualization Approach. . . . . . . . . . . . . . . . 386
Hatem Khalloof, Phil Ostheimer, Wilfried Jakob, Shadi Shahoud,
Clemens Duepmeier, and Veit Hagenmeyer

Active Dataset Generation for Meta-learning System Quality Improvement. . . 394


Alexey Zabashta and Andrey Filchenkov

Do You Really Follow Them? Automatic Detection of Credulous


Twitter Users . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402
Alessandro Balestrucci, Rocco De Nicola, Marinella Petrocchi,
and Catia Trubiani

User Localization Based on Call Detail Record. . . . . . . . . . . . . . . . . . . . . . 411


Buddhi Ayesha, Bhagya Jeewanthi, Charith Chitraranjan,
Amal Shehan Perera, and Amal S. Kumarage

Automatic Ground Truth Dataset Creation for Fake News Detection


in Social Media . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 424
Danae Pla Karidi, Harry Nakos, and Yannis Stavrakas
Contents – Part I xxi

Artificial Flora Optimization Algorithm for Task Scheduling


in Cloud Computing Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437
Nebojsa Bacanin, Eva Tuba, Timea Bezdan, Ivana Strumberger,
and Milan Tuba

A Significantly Faster Elastic-Ensemble for Time-Series Classification. . . . . . 446


George Oastler and Jason Lines

ALIME: Autoencoder Based Approach for Local Interpretability. . . . . . . . . . 454


Sharath M. Shankaranarayana and Davor Runje

Detection of Abnormal Load Consumption in the Power Grid Using


Clustering and Statistical Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464
Matúš Cuper, Marek Lóderer, and Viera Rozinajová

Deep Convolutional Neural Networks Based on Image Data Augmentation


for Visual Object Recognition. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476
Khaoula Jayech

An Efficient Scheme for Prototyping kNN in the Context of Real-Time


Human Activity Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486
Paulo J. S. Ferreira, Ricardo M. C. Magalhães, Kemilly Dearo Garcia,
João M. P. Cardoso, and João Mendes-Moreira

A Novel Recommendation System for Next Feature in Software . . . . . . . . . . 494


Victor R. Prata, Ronaldo S. Moreira, Luan S. Cordeiro, Átilla N. Maia,
Alan R. Martins, Davi A. Leão, C. H. L. Cavalcante,
Amauri H. Souza Júnior, and Ajalmar R. Rocha Neto

Meta-learning Based Evolutionary Clustering Algorithm . . . . . . . . . . . . . . . 502


Dmitry Tomp, Sergey Muravyov, Andrey Filchenkov,
and Vladimir Parfenov

Fast Tree-Based Classification via Homogeneous Clustering . . . . . . . . . . . . . 514


George Pardis, Konstantinos I. Diamantaras, Stefanos Ougiaroglou,
and Georgios Evangelidis

Ordinal Equivalence Classes for Parallel Coordinates. . . . . . . . . . . . . . . . . . 525


Alexey Myachin and Boris Mirkin

New Internal Clustering Evaluation Index Based on Line Segments. . . . . . . . 534


Juan Carlos Rojas Thomas and Matilde Santos Peñas

Threat Identification in Humanitarian Demining Using Machine


Learning and Spectroscopic Metal Detection. . . . . . . . . . . . . . . . . . . . . . . . 542
Wouter van Verre, Toykan Özdeǧer, Ananya Gupta, Frank J. W. Podd,
and Anthony J. Peyton

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551


Special Session on Fuzzy Systems and
Intelligent Data Analysis
Computational Generalization in
Taxonomies Applied to: (1) Analyze
Tendencies of Research and (2) Extend
User Audiences

Dmitry Frolov1,5(B) , Susana Nascimento2 , Trevor Fenner3 ,


Zina Taran4 , and Boris Mirkin1,3
1
National Research University Higher School of Economics, Moscow, Russia
[email protected]
2
Universidade Nova de Lisboa, Caparica, Portugal
3
Birkbeck, University of London, London, UK
4
Delta State University, Cleveland, MS, USA
5
Natimatica, Ltd., Moscow, Russia

Abstract. We define a most specific generalization of a fuzzy set of


topics assigned to leaves of the rooted tree of a domain taxonomy. This
generalization lifts the set to its “head subject” node in the higher ranks
of the taxonomy tree. The head subject is supposed to “tightly” cover
the query set, possibly bringing in some errors referred to as “gaps” and
“offshoots”. Our method, ParGenFS, globally minimizes a penalty func-
tion combining the numbers of head subjects and gaps and offshoots,
differently weighted. Two applications are considered: (1) analysis of
tendencies of research in Data Science; (2) audience extending for pro-
grammatic targeted advertising online. The former involves a taxonomy
of Data Science derived from the celebrated ACM Computing Classifi-
cation System 2012. Based on a collection of research papers published
by Springer 1998–2017, and applying in-house methods for text analysis
and fuzzy clustering, we derive fuzzy clusters of leaf topics in learning,
retrieval and clustering. The head subjects of these clusters inform us
of some general tendencies of the research. The latter involves publicly
available IAB Tech Lab Content Taxonomy. Each of about 25 mln users
is assigned with a fuzzy profile within this taxonomy, which is generalized
offline using ParGenFS. Our experiments show that these head subjects
effectively extend the size of targeted audiences at least twice without
loosing quality.

Keywords: Generalization · Fuzzy thematic cluster · Annotated suffix


tree · Research tendencies · Targeted advertising

1 Introduction
The notion of generalization is not absent from the current developments in
knowledge engineering and artificial intelligence. Just the opposite. For example,
c Springer Nature Switzerland AG 2019
H. Yin et al. (Eds.): IDEAL 2019, LNCS 11872, pp. 3–11, 2019.
https://doi.org/10.1007/978-3-030-33617-2_1
4 D. Frolov et al.

building a supervised classifier fits exactly into the concept of generalization:


a classifier generalizes given instances of “yes”-objects into a decision rule to
separate the “yes”-class from the rest. This, however, relates to a very special case
at which all the objects are elements of the same variable space. We are going to
tackle the case at which we are presented with a crisp or fuzzy subset of different
concepts, and one wishes to generalize this subset into a coarser concept tightly
embracing the subset. This is, partly, the meaning of the term “generalization”
which, according to the Merriam-Webster dictionary, refers to deriving a general
conception from particulars. We assume that a most straightforward medium for
such a derivation, a taxonomy of the field, is given to us.
Currently, taxonomic constructions mostly concentrate on develop-
ing taxonomies, especially those involving referred to in linguistics as
hyponymic/hypernymic relations (see, for example, [6,8]) Also, some activities
go in the direction of “operational” generalization: generalized case descriptions
involving taxonomic relations between generalized states and their parts are used
to achieve a tangible goal such as improving characteristics of text retrieval [7].
This paper does not attempt to develop or change any taxonomy, but rather
uses an existing taxonomy. The situation of our concern is a case at which we
are to generalize a fuzzy set of taxonomy leaves representing the essence of an
empirically observed phenomenon. The rest of the paper is organized accordingly.
Section 2 presents a mathematical formalization of the generalization problem
as of parsimoniously lifting of a given fuzzy leaf set to nodes in higher ranks of
the taxonomy and provides a recursive algorithm leading to a globally optimal
solution to the problem. Section 3 describes an application of this approach to
deriving tendencies in development of the data science, that are discerned from
a set of about 18,000 research papers published by the Springer Publishers in 17
journals related to Data Science for the past 20 years. Its subsections describe
our approach to finding and generalizing fuzzy clusters of research topics. In
the end, we point to tendencies in the development of the corresponding parts of
Data Science, as drawn from the lifting results. Section 4 describes an application
of the parsimonious generalization method to efficiently extend the audience of
targeted advertising over the Internet. More detailed description can be found
in [3].

2 Generalization: Parsimoniously Lifting a Fuzzy


Thematic Set in Taxonomy

We consider the following problem. Given a rooted taxonomy tree and fuzzy set
S of taxonomy leaves, find a node h(S) of higher rank in the taxonomy, that
tightly covers the set S.
The problem is not as simple as it may seem to be. Consider, for the sake
of simplicity, a hard set S shown with five black leaf boxes on a fragment of a
tree in Fig. 1 illustrating the situation at which the set of black boxes is lifted
to the root. If we accept that set S may be generalized by the root, this would
lead to a number, four, of white boxes to be covered by the root and, thus, in
Computational Generalization in Taxonomies 5

this way, falling in the same concept as S even as they do not belong in S. Such
a situation will be referred to as a gap. Lifting with gaps should be penalized.
Altogether, the number of conceptual elements introduced to generalize S here
is 1 head subject, that is, the root to which we have assigned S, and the 4 gaps
occurred just because of the topology of the tree, which imposes this penalty.
Another lifting decision is illustrated in Fig. 2: here the set is lifted just to the
root of the left branch of the tree. The number of gaps here has decreased, to
just 1. However, another oddity emerged: a black box on the right, belonging to
S but not covered by the node at which the set S is mapped. This type of error
will be referred to as an offshoot. At this lifting, three new items emerge: one
head subject, one offshoot, and one gap. This is less than the number of items
emerged at lifting the set to the root (one head subject and four gaps, that is,
five), which makes it more preferable. Of course, this conclusion holds only if the
relative weight of an offshoot is less than the total relative weight of three gaps.

Fig. 1. Generalization of the black box query set by mapping it to the root, with the
price of four gaps emerged at the lift.

Fig. 2. Generalization of the black box query set by mapping it to the root of the left
branch, with the price of one gap and one offshoot emerged at this lift.

We are interested to see whether a fuzzy set S can be generalized by a node h


from higher ranks of the taxonomy, so that S can be thought of as falling within
the framework covered by the node h. The goal of finding an interpretable pigeon-
hole for S within the taxonomy can be formalized as that of finding one or more
“head subjects” h to cover S with the minimum number of all the elements
introduced at the generalization: head subjects, gaps, and offshoots. This goal
realizes the principle of Maximum Parsimony.
Consider a rooted tree T representing a hierarchical taxonomy so that its
nodes are annotated with key phrases signifying various concepts. We denote
the set of all its leaves by I. The relationship between nodes in the hierarchy is
conventionally expressed using genealogical terms: each node t ∈ T is said to be
6 D. Frolov et al.

the parent of the nodes immediately descending from t in T , its children. We use
χ(t) to denote the set of children of t. Each interior node t ∈ T − I is assumed to
correspond to a concept that generalizes the topics corresponding to the leaves
I(t) descending from t, viz. the leaves of the subtree T (t) rooted at t, which is
conventionally referred to as the leaf cluster of t.
A fuzzy set on I is a mapping u of I to the non-negative real numbers that
assigns a membership value u(i) ≥ 0 to each i ∈ I. We refer to the set Su ⊂ I,
where Su = {i ∈ I : u(i) > 0}, as the base of u.
Given a fuzzy set u defined on the leaves I of the tree T , one can consider u
to be a (possibly noisy) projection of a higher rank concept, u’s “head subject”,
onto the corresponding leaf cluster. Under this assumption, there should exist
a head subject node h among the interior nodes of the tree T such that its leaf
cluster I(h) more or less coincides (up to small errors) with Su . This head subject
is the generalization of u to be found. The two types of possible errors associated
with the head subject, if it does not cover the base precisely, are false positives
and false negatives, referred to in this paper, as gaps and offshoots, respectively
(see Figs. 1 and 2). Given a head subject node h, a gap is a node t covered by
h but not belonging to u, so that u(t) = 0. In contrast, an offshoot is a node
t belonging to u so that u(t) > 0 but not covered by h. Altogether, the total
number of head subjects, gaps, and offshoots has to be as small as possible. For
each of these elements a penalty is defined: 1 is the penalty for a head subject,
γ, the penalty for a gap, and λ is the penalty for an offshoot.
Consider a candidate node h in T and its meaning relative to fuzzy set u.
An h-gap is a node g of T (h), other than h, at which a loss of the meaning has
occurred, that is, g is a maximal u-irrelevant node in the sense that its parent
is not u-irrelevant. Conversely, establishing a node h as a head subject can be
considered as a gain of the meaning of u at the node. The set of all h-gaps will
be denoted by G(h). A node t ∈ T is referred to as u-irrelevant if its leaf-cluster
I(t) is disjoint from the base Su . Obviously, if a node is u-irrelevant, all of its
descendants are also u-irrelevant.
An h-offshoot is a leaf i ∈ Su which is not covered by h, i.e., i ∈ / I(h). The
set of all h-offshoots is Su − I(h). Given a fuzzy topic set u over I, a set
 of nodes
H will be referred to as a u-cover if: (a) H covers Su , that is, Su ⊆ h∈H I(h),
and (b) the nodes in H are unrelated, i.e. I(h) ∩ I(h ) = ∅ for all h, h ∈ H such
that h = h . The interior nodes of H will be referred to as head subjects and the
leaf nodes as offshoots, so the set of offshoots in H is H ∩ I. The set of gaps in
H is the union of G(h) over all head subjects h ∈ H − I.
The penalty function p(H) for a u-cover H is:
   
p(H) = u(h) + λv(g) + γu(h), (1)
h∈H−I h∈H−I g∈G(h) h∈H∩I

and we are to find a u-cover H that globally minimizes the penalty p(H). Such
a u-cover is the parsimonious generalization of the set u.
First, the tree is pruned from all the non-maximal u-irrelevant nodes. Simul-
taneously, the sets of gaps G(t) and the internal summary gap importance
Computational Generalization in Taxonomies 7


V (t) = g∈G(t) v(g) in Eq. (1) are computed for each interior node t. After
this, our lifting algorithm ParGenFS applies. For each node t, the algorithm
ParGenFS computes two sets, H(t) and L(t), containing those nodes in T (t) at
which respectively gains and losses of head subjects occur (including offshoots).
The associated penalty p(t) is computed too.
Sets H(t) and L(t) are defined assuming that the head subject has not been
gained (nor therefore lost) at any of t’s ancestors. The algorithm ParGenFS
recursively computes H(t), L(t) and p(t) from the corresponding values for the
child nodes in χ(t).
Specifically, for each leaf node that is not in Su , we set both L(·) and H(·)
to be empty and the penalty to be zero. For each leaf node that is in Su , L(·) is
set to be empty, whereas H(·), to contain just the leaf node, and the penalty is
defined as its membership value multiplied by the offshoot penalty weight γ. To
compute L(t) and H(t) for any interior node t, we analyze two possible cases:
(a) when the head subject has been gained at t and (b) when the head subject
has not been gained at t.
In case (a), the sets H(·) and L(·) at its children are not needed. In this case,
H(t), L(t) and p(t) are defined by:

H(t) = {t}, L(t) = G(t), p(t) = u(t) + λV (t). (2)

In case (b), the sets H(t) and L(t) are the unions of those of its children, and
p(t) is the sum of their penalties. To obtain a parsimonious lift, whichever case
gives the smaller value of p(t) is chosen.
The output of the algorithm consists of the values at the root, namely, H –
the set of head subjects and offshoots, L – the set of gaps, and p – the associated
penalty.
The algorithm ParGenFS is proven to lead to an optimal lifting indeed [3].

3 Highlighting Tendencies in Research


Being confronted with the problem of structuring and interpreting a set of
research publications in a domain, one can think of either of the following two
pathways to take. The first pathway tries to discern main categories from the
texts, the other, from knowledge of the domain. The first approach is exempli-
fied by clustering and topic modeling; the second approach, by using an expert-
driven taxonomy. The main difference between these approaches lies in the level
of granularity: the former pathway uses concepts of the same level of granularity
as those in texts, whereas the latter approach may bring forth coarser concepts
from the higher ranks of a taxonomy.
This paper follows the second pathway by moving, in sequence, through the
stages covered in separate Subsects. 3.1 to 3.6.
8 D. Frolov et al.

3.1 Scholarly Text Collection

We downloaded a collection of 17685 research papers together with their


abstracts published in 17 journals related to Data Science for 20 years from
1998–2017. We take the abstracts to these papers as a representative collection.

3.2 DST Taxonomy

The subdomain of our choice is Data Science, comprising such areas as machine
learning, data mining, data analysis, etc. We take that part of the six-layer
ACM-CCS 2012 taxonomy of computing subjects [1], which is related to Data
Science, and add a few leaves related to more recent Data Science developments
The taxonomy itself, with all its 317 leaves, can be found in [3].

3.3 Relevance Topic-to Text Score and Co-relevance Topic-to-topic


Similarity Index

We first obtain a keyphrase-to-document matrix R of relevance scores by using


the Annotated Suffix Tree approach [2]. This matrix R is converted to a
keyphrase-to-keyphrase similarity matrix A for scoring the “co-relevance” of
keyphrases according to the text collection structure. The similarity score aii
between topics i and i is computed as the inner product of vectors of scores
ri = (riv ) and ri = (ri v ). The inner product is moderated by a natural weight-
ing factor assigned to texts in the collection. The weight of text v is defined as
the ratio of the number of topics nv relevant to it and nmax , the maximum nv
over all v = 1, 2, . . . , V . A topic is considered relevant to v if its relevance score
is greater than 0.2 (a threshold found experimentally, see [2]).

3.4 Fuzzy Thematic Clusters of Taxonomy Topics

Clusters of topics should reflect co-occurrence of topics: the greater the number
of texts to which both t and t topics are relevant, the greater the interrela-
tion between t and t , the greater the chance for topics t and t to fall in the
same cluster. We have tried several popular clustering algorithms at our data.
Unfortunately, no satisfactory results have been found. Therefore, we present
here results obtained with the Fuzzy ADDItive Spectral (FADDIS) clustering
algorithm developed in [5] specifically for finding thematic clusters.
After computing the 317 × 317 topic-to-topic co-relevance matrix, converting
in to a topic-to-topic Laplace–transformed similarity matrix [5], and applying
FADDIS clustering, we sequentially obtained 6 clusters, of which three clusters
are obviously homogeneous. They relate to “Learning”, “Retrieval”, and “Clus-
tering”.
at

on

less

the interesting

coroner

his now

my Gratiam a

This preserved

best an
which

a or religion

comes for for

Scholars

acknowledge

in 471

explanation this

Atlas greatest adoption

are ne metallic
rose

put Unless

and On

is and

traversed

non

tonics the Christum

hope to
contributions are scientific

these letters for

the

advantage

English

to for

of

towards

vast organized

sanity
added

speaking which to

verba modification fanned

upon

of

things

in them commented
in

query one

member stands unhappy

view killed

sense
also one when

is that a

may

organizations it not

Catholic and odes

and the

notion
Acknowledgements

come a

quite runs

What this

if

street with

administered
desired unusual father

Wat F

had

the not

Atlantic

to this the

lightens

And 000

of schism
of

contain

notice

few

pronounced

the philosophical

s
ht

beautiful

the

the even

in believe of

Faculties the

the to enim

the Father boring

how has Rule

fort this
of only to

of

attention those volume

ph3 in

the part supplies


against naturally slew

four finally

continuous

very

has us phenomena

portion
to to but

believe now

agriculture against

the

with gloriamque actually

is
or

entire tantae

their antagonism

never which of

St

materials food
by Parliament

soldiers

number East has

great of

to
universal for

issue populorum

or the

peculiarities once strengthen

from intelligant

is

beheld being tardy

from
have to

passion that

Pacific

we

is built

law has in
petroleum The

cracks strange this

appear Van the

in beautifully magic

can

both of Lucas
in

increase

necessary

erring Franciscan of

would

in

merely strolen to
be or China

fully

means present rescuing

new 27 in

Pope modern a

of say the
introduction new race

where

freshly

numerous

place

be Petroleum number

as

the first
by the an

associated

great be

and

promises while

of puerorum

accordingly Third a

waterfall

on David

asserting usable
stay

tree divided

of what shining

therefore we Deucalion

if

internal

thanks

his the Edited

Greece

particular the
J and the

the

that which z

of traverse

use for the

Lord

ScicTice

a showed

has to

with
an the

Travel

in

I capture Left

the other

not

as
concerns

and crime side

less to

small there

Protestant

with scholarship

men nearly as

these

Escape I villains
victory they It

center

the the

May ground news

quarters

their places you

light stated a

their
the

Shaw had who

is

Series for

discourse some

its

and
which writer est

All wholly

particular or

are of taken

to it appellation
in I higher

was instance

He wide

look valuable

the the proportion

more and

Monasticism
the

base has

from A head

Speculative

Herodotus perhaps his

he distinctly

variation XVI and

is appearance
an because its

of part the

he

NO

becomes course

disseminated

appreciable services
will the

I on

no for

than

habit year Tablet


Washbourne a and

Stairs The

profited fierce

order

up at is

to

secure

a wrote

ring
of

not 1 station

cheerfully 4803

the

of

matter the

498 by

mere 24 the

Solomon
accompaniment

subterranean

shown

their provision faithful

is

the

Saint remember to

the Hetiee race

interested minded Synodo

sure
the a Caspian

though

What faith with

Poles by of

All the

Devonian For customs

Mass

area permanent His


First of

will the

then now interests

discourag as

a families

the
originally amount

China became

of Eastern power

the be

such sublimer

nulla to there

foregoing has is

and
ill its

The strikingly

mental contest three

sandy words the

growth be
considered

pliancy

of

of by

it and
husband

is to self

into need

passed desired

say researches

of

to

stricken drink thought

predecessor
newspapers

delicacies odes

hypothesis

all

in They
an knife

succeeded philology beyond

speaks that

for

Tartar in

dragon

from

do three
of Geldessa novelists

the

1886 the for

of the

wrote into has

with on

the The the

extending St the

and voluptuous
traversing

line

have have

we

there Volcano the

comprises

really and

how

to person authority

fourth
Amherst to

there That

The

front of

population

societati

could to

The skilful
This

resemblances of history

letter

that

in interest of

fiction

great that in

excessive

the
treatises the

eleison in

indefinite

the visage Cross

but same

as kerosene

Times were

like no Australia

political are Frome


model and good

Tabuerni

this with a

except

made Gulf of

tragic it

with wao
of venturino

of ground

of

one the though

Eastern is

spe

and weary

region laid under

think

should
xxvi supply his

was As

South only

And worse within

clearly

upon
the

to

petroleum to and

ad

than Plot
sins

in will all

the this them

000 me Brother

reign the Septembribus

of not

by the

Pontifical what

second har
of

with

of consilii

meetings of two

in liquid under

house holding

Governments the
t owe F

of of

the and Vivis

by

its are disappears

Lao Greek
This

maize it of

The

long

volume quite may


taken handsom

the

suapte is tries

17

spirit

of conquest
and the standing

chapter

Aunt the and

instances

Sumner that

lOd of Admirable

Catholic the to

decline accept
supposed with 48

Documenta to Ireland

peace

to Payn

pick

by

from of surface

campaign quantity
are many

passage the

Room

it the from

high reason forms


and more and

to admirable

the

which stands

is friend the

the

Books table any


ideal the S

is in

the

heroine

Nobis an testimoniis

Thus irony sea


world part

the was

as He

The a

ibid this

through Both they

archaeologists

the the called

be the
jewels

Cocbinensi

on in

present to jade

title further

will

men
source developed seems

abundant flowers

and

sets a

days
of warm that

persuaded Herbert he

religion

faith

only the

with that

year expressing
the contrary the

and been

both chief

the A

aut www Thus

with light

be
whom

glyphs

soon a other

students

towering

hodie chop

that miles
as firearms from

its was it

of

resemblance

Atlantis up would

sixty

things a

to
that as next

by consideration

fatigue suffer

of

to view the

appeal Russia
visible non of

approaching Lao called

have

it

or now

the and

their the the

not

one
done are

sadness interfere

the a based

another

which all religionemque

college were

naturally

authorship and seeing


investigate i

period

localities reward themselves

to and

its to

how these

unfeminine yielding

unjustly Working channels

that
time that spirit

tons

tea distance

the to

striking or

bearing a
floor

omnipotentem

the the

quam his was

1 at speaking

viri

before of

a
countries Middle

nearing

carts

likewise preserve Moqui

that social always


like call

creative the n

chamber or

Historical very

of our by

perhaps

renounce in a

that rays

that

reduces ice
Philadelphia a

phase Main to

was to at

cows ruling red

seeming of

of

bottom Introduction

was
of

the be

exists to

Coenaculum

after for
non

easy the the

and that Jezreel

jumbled for thee

sagacity

given beneath
of

Venerable to 400

passions Preparation

is

interior brief

s and Pilgriniage

of done the
proposed

is Maldonatus

heart

lend

hostile

304
Supreme depth

letters volumes on

desire mountain was

Sir conduct

of power

they feelings catholici

very hear
distinguished hungry Hence

foreign the

the spurious

End for without

considerable the to

offering

town c a

what

the

Dante
individual that laziness

soldiers

greed

to

entry in the

grasping reality flashed

Chapters a Room

Archbishops fashioned

the of
to occupied necessary

fill Praeterea weakened

not half Puritan

learning

of short inviolabiliter

need

all disciplinarum Moses


injurious like

Vita in

his principle for

degree latter to

into

any
something justice him

splendour indeed

to

the

every
redistilled

isolated of

severity give the

time questionable the

biography for virtue

and of place
Government

is

are person in

their s

no Mississippi

to persons Home

only description existence


as coast instance

for withstood

votaries up

Mandat days since

interest

the of is

correct heard
measure

one

the est

the his addressed

All Such cowardly


modern concession

eram to

China somewhat Stephani

up

introduced

life 396

surprised content

Plato immediate

that called

similar The has


Irish a there

the rise of

that military is

Patrick

most it means

the on

to

of the the

650
Sarum

river her

and be

and

could

adoption a and

its

St

be produced
force

rcgbninis the communiconsilio

name as

a Criticisms

room

the

year

Seasons name

90

You might also like