0% found this document useful (0 votes)
11 views111 pages

Scalable Information Systems 5th International Conference INFOSCALE 2014 Seoul South Korea September 25 26 2014 Revised Selected Papers 1st Edition Jason J. Jung Updated 2025

Academic material: Scalable Information Systems 5th International Conference INFOSCALE 2014 Seoul South Korea September 25 26 2014 Revised Selected Papers 1st Edition Jason J. JungAvailable for instant access. A structured learning tool offering deep insights, comprehensive explanations, and high-level academic value.

Uploaded by

satoeushiy3568
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views111 pages

Scalable Information Systems 5th International Conference INFOSCALE 2014 Seoul South Korea September 25 26 2014 Revised Selected Papers 1st Edition Jason J. Jung Updated 2025

Academic material: Scalable Information Systems 5th International Conference INFOSCALE 2014 Seoul South Korea September 25 26 2014 Revised Selected Papers 1st Edition Jason J. JungAvailable for instant access. A structured learning tool offering deep insights, comprehensive explanations, and high-level academic value.

Uploaded by

satoeushiy3568
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 111

Scalable Information Systems 5th International

Conference INFOSCALE 2014 Seoul South Korea


September 25 26 2014 Revised Selected Papers 1st
Edition Jason J. Jung 2025 pdf download

Available at textbookfull.com
https://textbookfull.com/product/scalable-information-systems-5th-
international-conference-infoscale-2014-seoul-south-korea-
september-25-26-2014-revised-selected-papers-1st-edition-jason-j-
jung/

★★★★★
4.8 out of 5.0 (71 reviews )

Instant PDF Access


Scalable Information Systems 5th International Conference
INFOSCALE 2014 Seoul South Korea September 25 26 2014
Revised Selected Papers 1st Edition Jason J. Jung

TEXTBOOK

Available Formats

■ PDF eBook Study Guide Ebook

EXCLUSIVE 2025 ACADEMIC EDITION – LIMITED RELEASE

Available Instantly Access Library


More products digital (pdf, epub, mobi) instant
download maybe you interests ...

Information Security and Cryptology ICISC 2014 17th


International Conference Seoul South Korea December 3 5
2014 Revised Selected Papers 1st Edition Jooyoung Lee

https://textbookfull.com/product/information-security-and-
cryptology-icisc-2014-17th-international-conference-seoul-south-
korea-december-3-5-2014-revised-selected-papers-1st-edition-
jooyoung-lee/

Euro Par 2014 Parallel Processing Workshops Euro Par


2014 International Workshops Porto Portugal August 25
26 2014 Revised Selected Papers Part II 1st Edition
Luís Lopes
https://textbookfull.com/product/euro-par-2014-parallel-
processing-workshops-euro-par-2014-international-workshops-porto-
portugal-august-25-26-2014-revised-selected-papers-part-ii-1st-
edition-luis-lopes/

Euro Par 2014 Parallel Processing Workshops Euro Par


2014 International Workshops Porto Portugal August 25
26 2014 Revised Selected Papers Part I 1st Edition Luís
Lopes
https://textbookfull.com/product/euro-par-2014-parallel-
processing-workshops-euro-par-2014-international-workshops-porto-
portugal-august-25-26-2014-revised-selected-papers-part-i-1st-
edition-luis-lopes/

Information Security Applications 21st International


Conference WISA 2020 Jeju Island South Korea August 26
28 2020 Revised Selected Papers Ilsun You

https://textbookfull.com/product/information-security-
applications-21st-international-conference-wisa-2020-jeju-island-
south-korea-august-26-28-2020-revised-selected-papers-ilsun-you/
International Conference on Security and Privacy in
Communication Networks 10th International ICST
Conference SecureComm 2014 Beijing China September 24
26 2014 Revised Selected Papers Part II 1st Edition
Jing Tian
https://textbookfull.com/product/international-conference-on-
security-and-privacy-in-communication-networks-10th-
international-icst-conference-securecomm-2014-beijing-china-
september-24-26-2014-revised-selected-papers-part-ii-1st-edi/

International Conference on Security and Privacy in


Communication Networks 10th International ICST
Conference SecureComm 2014 Beijing China September 24
26 2014 Revised Selected Papers Part I 1st Edition Jing
Tian
https://textbookfull.com/product/international-conference-on-
security-and-privacy-in-communication-networks-10th-
international-icst-conference-securecomm-2014-beijing-china-
september-24-26-2014-revised-selected-papers-part-i-1st-edit/

Runtime Verification 5th International Conference RV


2014 Toronto ON Canada September 22 25 2014 Proceedings
1st Edition Borzoo Bonakdarpour

https://textbookfull.com/product/runtime-verification-5th-
international-conference-rv-2014-toronto-on-canada-
september-22-25-2014-proceedings-1st-edition-borzoo-bonakdarpour/

Information Security and Cryptology 10th International


Conference Inscrypt 2014 Beijing China December 13 15
2014 Revised Selected Papers 1st Edition Dongdai Lin

https://textbookfull.com/product/information-security-and-
cryptology-10th-international-conference-inscrypt-2014-beijing-
china-december-13-15-2014-revised-selected-papers-1st-edition-
dongdai-lin/

Computational Logistics 5th International Conference


ICCL 2014 Valparaiso Chile September 24 26 2014
Proceedings 1st Edition Rosa G. González-Ramírez

https://textbookfull.com/product/computational-logistics-5th-
international-conference-iccl-2014-valparaiso-chile-
september-24-26-2014-proceedings-1st-edition-rosa-g-gonzalez-
Jason J. Jung
Costin Badica
Attila Kiss (Eds.)

139

Scalable Information
Systems
5th International Conference, INFOSCALE 2014
Seoul, South Korea, September 25–26, 2014
Revised Selected Papers

123
Lecture Notes of the Institute
for Computer Sciences, Social Informatics
and Telecommunications Engineering 139

Editorial Board
Ozgur Akan
Middle East Technical University, Ankara, Turkey
Paolo Bellavista
University of Bologna, Bologna, Italy
Jiannong Cao
Hong Kong Polytechnic University, Hong Kong, Hong Kong
Falko Dressler
University of Erlangen, Erlangen, Germany
Domenico Ferrari
Università Cattolica Piacenza, Piacenza, Italy
Mario Gerla
UCLA, Los Angels, USA
Hisashi Kobayashi
Princeton University, Princeton, USA
Sergio Palazzo
University of Catania, Catania, Italy
Sartaj Sahni
University of Florida, Florida, USA
Xuemin (Sherman) Shen
University of Waterloo, Waterloo, Canada
Mircea Stan
University of Virginia, Charlottesville, USA
Jia Xiaohua
City University of Hong Kong, Kowloon, Hong Kong
Albert Zomaya
University of Sydney, Sydney, Australia
Geoffrey Coulson
Lancaster University, Lancaster, UK
More information about this series at http://www.springer.com/series/8197
Jason J. Jung Costin Badica

Attila Kiss (Eds.)

Scalable Information
Systems
5th International Conference, INFOSCALE 2014
Seoul, South Korea, September 25–26, 2014
Revised Selected Papers

123
Editors
Jason J. Jung Attila Kiss
Chung-Ang University Eötvös Loránd University
Seoul Budapest
Korea, Republic of (South Korea) Hungary
Costin Badica
University of Craiova
Craiova
Romania

ISSN 1867-8211 ISSN 1867-822X (electronic)


Lecture Notes of the Institute for Computer Sciences, Social Informatics
and Telecommunications Engineering
ISBN 978-3-319-16867-8 ISBN 978-3-319-16868-5 (eBook)
DOI 10.1007/978-3-319-16868-5

Library of Congress Control Number: 2015937351

Springer Cham Heidelberg New York Dordrecht London


© Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2015
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the
material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now
known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book are
believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors
give a warranty, express or implied, with respect to the material contained herein or for any errors or
omissions that may have been made.

Printed on acid-free paper

Springer International Publishing AG Switzerland is part of Springer Science+Business Media


(www.springer.com)
Preface

As data and knowledge volume keep increasing while global means for information
dissemination continue to diversify, new methods, modeling paradigms, and structures
are needed to efficiently mount scalability requirements. In the recent years, we have
seen the proliferation of the use of heterogeneous distributed systems, ranging from
simple Networks of Workstations, to highly complex grid computing environments.
Such computational paradigms have been preferred due to their reduced costs and
inherent scalability, which pose many challenges to scalable systems and applications
in terms of information access, storage, and retrieval. Grid computing, P2P technology,
data and knowledge bases, distributed information retrieval technology, and net-
working technology should all converge to address the scalability concern. Further-
more, with the advent of emerging computing architectures (e.g., SMTs, GPUs, and
Multicores) the importance of designing techniques explicitly targeting these systems is
becoming more and more important. The 5th International Conference on Scalable
Information Systems will focus on a wide array of scalability issues and investigate
new approaches to tackle problems arising from the ever-growing size and complexity
of information of all kinds.
Particularly, in the era of big data, the scalability of information systems has been
the most important issue. The aim of this conference is to provide an internationally
respected forum for scientific research in the computer-based methods of collective
intelligence and their applications in (but not limited to) such fields as Scalable Pro-
cessing (and Architecture) for Big Data and Scalable Systems and Conceptual
Modeling.

December 2014 Jason J. Jung


Organization

InfoScale 2014 is organized by the Department of Computer Science, Chung-Ang


University and Wrocław University of Technology in cooperation with The European
Alliance for Innovation (EAI).

Executive Committee
General Chair
Jason J. Jung Chung-Ang University, Korea
Costin Badica University of Craiova, Romania

Program Chair
Jason J. Jung Chung-Ang University, Korea
Ngoc Thanh Nguyen Wrocław University of Technology, Poland
Attila Kiss Eötvös Loránd University, Hungary

Workshop Chair
David Camacho Universidad Autónoma de Madrid, Spain

Publicity Chair
Le Anh Vu Nguyen Tat Thanh University, Vietnam

Publication Chair
Yue-Shan Chang National Taipei University, Taiwan

Local Chair
Seung-Bo Park Inha University, Korea

Web Chair
Xuan Hau Pham Quang Binh University, Vietnam

Conference Coordinator
Sinziana Vieriu EAI, Italy

Program Committee
G.A. Aranda-Corral Universidad de Sevilla, Spain
Costin Badica University of Craiova, Romania
David Camacho Universidad Autónoma de Madrid, Spain
Yue-Shan Chang National Taipei University, Taiwan
VIII Organization

F. Freitas Universidade Federal de Pernambuco, Brazil


D. Godoy UNICEN University, Argentina
T. Herawan University of Malaya, Malaysia
T.-P. Hong National University of Kaohsiung, Taiwan
A. Jatowt Kyoto University, Japan
Jason J. Jung Chung-Ang University, Korea
K. Juszczyszyn Wrocław University of Technology, Poland
C. Kartsaklis Oak Ridge National Laboratory, USA
Attila Kiss Eötvös Loránd University, Hungary
D. Krol Wrocław University of Technology, Poland
M. Lanzenberger Vienna University of Technology, Austria
J. Li University of Technology, Sydney, Australia
V. Milea Erasmus University Rotterdam, The Netherlands
G. Nalepa AGH University of Science and Technology,
Poland
T.B. Nguyen International Institute for Applied Systems
Analysis, Austria
Hong-Linh Truong Vienna University of Technology, Austria
Xinhua Zhu University of Technology, Sydney, Australia
Michael Sheng The University of Adelaide, Australia
Tzung-Shi Chen National University of Tainan, Taiwan
Rajkumar Buyya University of Melbourne, Australia

Sponsoring Institutions

EAI (European Alliance for Innovation)


Chung-Ang University, Korea
Wrocław University of Technology, Poland
Contents

Scalable Data Analytics

Scalable Similarity Search for Big Data: Challenges and Research


Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Pavel Zezula

Content-Based Analytics of Diffusion on Social Big Data: A Case Study


on Korean Telecommunication Companies . . . . . . . . . . . . . . . . . . . . . . . . . 13
Namhee Lee and Jason J. Jung

Multi-modal Similarity Retrieval with a Shared Distributed Data Store . . . . . 28


David Novak

An Efficient Approach for Complex Data Summarization


Using Multiview Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Mohiuddin Ahmed, Abdun Naser Mahmood, and Michael J. Maher

Big Data Applications

A Novel Approach for Network Traffic Summarization . . . . . . . . . . . . . . . . 51


Mohiuddin Ahmed, Abdun Naser Mahmood, and Michael J. Maher

Heart Disease Diagnosis Using Co-clustering . . . . . . . . . . . . . . . . . . . . . . . 61


Mohiuddin Ahmed, Abdun Naser Mahmood, and Michael J. Maher

An Investigation of Scalable Anomaly Detection Techniques


for a Large Network of Wi-Fi Hotspots . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Pheeha Machaka and Antoine Bagula

Link Scheduling for Data Collection in Multichannel Wireless


Sensor Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Meng-Shiuan Pan and Yi-Hsun Lee

A Design of Sensor Data Ontology for a Large Scale Crop Growth


Environment System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Eunji Lee, Byeongkyu Ko, Chang Choi, and Pankoo Kim

Real-Time Data Flow Language Processing System for Handling


Streams of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Choon Seo Park, Jin-Hwan Jeong, Myungcheol Lee, Yong-Ju Lee,
Miyoung Lee, and Sung Jin Hur

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107


Scalable Data Analytics
Scalable Similarity Search for Big Data
Challenges and Research Objectives

Pavel Zezula(B)

Masaryk University, Brno, Czech Republic


[email protected]

Abstract. Analysis of contemporary Big Data collections require an effec-


tive and efficient content-based access to data which is usually unstruc-
tured. This first implies a necessity to uncover descriptive knowledge of
complex and heterogeneous objects to make them findable. Second, multi-
modal search structures are needed to efficiently execute complex similar-
ity queries possibly in outsourced environments while preserving privacy.
Four specific research objectives to tackle the challenges are outlined and
discussed. It is believed that a relevant solution of these problems is neces-
sary for a scalable similarity search operating on Big Data.

Keywords: Big data · Scalability · Information retrieval · Similarity


search · Findability · Data outsourcing · Data privacy · Information
extraction

1 The Big Data Problem


Many organizations today are increasingly not able to process or analyze data
produced by numerous sources. Such situation has given rise to existence of the
Big Data problem. In practice, organizations have potential access to a wealth of
information, but they do not know how to get value out of it. This is especially
true when the prevalently semi-structured or unstructured data is only stored
in its raw form. According to [25], the typical characteristic is that the volume
of data available to organizations today is on sharp rise, while the percentage of
data they can analyze or otherwise selectively use is on decline. In general, it is
the volume, variety and velocity of current data which together define the Big
Data phenomenon.
Unlike traditional databases, optimized for fast access and summarization of
structured data and well defined queries, Big Data is believed to serve as a raw
material for the creation of new knowledge. The big data analytics is a process of
examining large amounts of data of a variety of types to uncover hidden patterns,
unknown correlations and other useful information [6]. To allow this, the data
needs to be primarily accessed using the similarity of the data content. The
white paper [1] of a community of leading researchers across the United States
analyzed the problem from the technical point of view. Specifically, they see the
heterogeneity, scale, timeliness, complexity, and privacy aspects of Big Data as
c Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2015
J. Jung et al. (Eds.): INFOSCALE 2014, LNICST 139, pp. 3–12, 2015.
DOI: 10.1007/978-3-319-16868-5 1
4 P. Zezula

the main obstacles of the process that can create value from data. They believe
that the problem starts right away during data acquisition, when the massive
amounts of data produced require making decisions about what data to keep
and what to discard, and how to store the kept data reliably along with proper
search enabling meta-data.
Typical examples of the current data are blogs and tweets, which are weakly
structured texts, while the more bulky images and video data are only struc-
tured for storage and display, but totally unstructured according to semantic
content. As it is the content which makes retrieval possible, its extraction into
a searchable form is the major challenge. Furthermore, it is necessary to specify
how the similarity of data should be evaluated. Contemporary content surro-
gates (features, descriptors) are only comparable according to specific forms of
similarity, which is from the user point of view subjective and context depen-
dent. Accordingly, scalable and secure data analysis, organization, retrieval, and
modeling are other foundational technological challenges of Big Data, in general.
Future data processing tools will have to manage the similarity paradigm for
searching. Though other alternatives exists, in the following, we will assume the
metric space model of similarity [23], which has already proved useful mainly for
its high extensibility that allows covering a large range of applications by a single
search system implementation. The underlying property of any future search
related technology is the scalability. Then, there are two principle directions in
which the future research effort should follow:
– First, it is necessary to concentrate on the problem of data findability, which
is a general concept that covers technologies for effective and efficient data
content acquisition, recording, information extraction and cleaning, as well as
the data annotation, integration, and categorization.
– The second direction concerns similarity searching, which is not entirely a new
problem, and the future emphasis should be put on efficiency of multi-aspect
similarity and on privacy of search in outsourced data environments.
The seemingly independent sub-problems of findability and searching are actu-
ally strictly complementary. No search is possible without content-revealing fea-
tures produced by findability processes on raw data objects. At the same time,
unorganized multiple features of objects have little value without multimodal,
scalable and secure search mechanisms. These problems are not only timely, but
also foundational as they require rethinking of current data processing approaches
in fundamental ways. The expectation is to move current search capabilities form
processing of small collections to much larger dimensions, from precise to approxi-
mate similarity searching, and from using customized infrastructures and services
to outsourced processing in secure cloud-like environments. These problems and
their relationships are sketched in Fig. 1.

2 Similarity Searching
The ability to perceive similarity is one of the most fundamental aspects of human
cognition. Besides being crucial for recognition, classification, and learning, it plays
Scalable Similarity Search for Big Data 5

Applica on
Areas
Cloud web
findability enterprise
Services
mobile
social
s muli

Similarity
effec veness Search efficiency
Compu ng

operators

retrieval

Fig. 1. Similarity search computing services.

an important role in scientific discovery and creativity. In recent years, similarity


and analogy have received increasing attention from cognitive scientists [14]. Suc-
cessful learning mostly depends on the ability to identify the most relevant bodies
of knowledge that already exist in memory, so that this knowledge can be used as
the starting point for learning something new.
As any kind of fact can nowadays become a digital part of the networked
media – whatever we see, say, measure, observe, test, or otherwise experience, is
or at least can be in digital form – computers must provide access to required
data through operations based on similarity (proximity, resemblance, psycholog-
ical distance, etc.), because “it is the similarity that is in the world revealing”
[21]. There are many application areas that inherently require similarity data
management, e.g. multimedia retrieval, processing of biometric data, medical
information systems, biology (chemical-)data processing, geographic systems,
electronic commerce, forensics, etc. To uncover hidden knowledge in Big Data,
similarity access is indispensable.
In the digital world, similarity is determined by stimuli (features, descriptors,
properties, etc.) extracted from raw objects and by the way we pursue processes
of assessing similarity of related objects, i.e. a sequence of operations from a
specific data-processing algebra. We can calibrate the stimuli and operations
from two different points of view: (1) effectiveness concerns the way similarity
is defined, including quality assessment of the results, and (2) efficiency regards
the processing speed, costs, or the amount of effort needed to get the results –
for convenience, see again Fig. 1.
6 P. Zezula

The effectiveness of similarity search necessarily depends on a specific appli-


cation and data domain. The key tasks are thus the selection of appropriate simi-
larity model and, typically, extraction of convenient feature descriptors obtained
from the raw data. In practice, groups of domain experts develop application-
specific similarity models and descriptors. For instance, in the domain of images,
there exists a wide portfolio of features, ranging from various global descriptors
(colors, texture, shapes) and local descriptors (e.g. popular SIFTs) to many
domain specific characteristics, such as the face descriptors, and many others.
From the application point of view, it is important to select and extract suitable
descriptors and effectively combine several types of them.
As far as the efficiency of similarity data management is concerned, it would
be very time-consuming and expensive to develop a specialized management
system for each of the many application areas and a practically endless list of
different similarity criteria applicable on current scale of digital data. Therefore,
a lot of research effort has been invested in the last decade into a generic index-
ing and searching approach that adopts the metric space as its data similarity
model [23]. This research stream seeks new efficient ways to locate user-relevant
information in collections of objects where the relationships are quantified using
pair-wise distance (dissimilarity) measures between stimuli of individual objects.
So far, many fundamental principles, indexing and searching techniques, imple-
mentation paradigms, and analytic tools were developed. Similarity searching in
very large data collections is inherently an infrastructure-demanding and time-
consuming task, thus many approximate approaches [24] as well as parallel and
distributed indexes [18] were also developed. There are also pioneering works that
transfer the computationally intensive tasks to new massively parallel hardware
infrastructures like GPUs [12]. However, the Big Data problem introduces quali-
tatively new challenges, in particular the need to process enormous data volumes
with respect to multiple complementary views on the complex data similarity.
In summary, though numerous indexing structures has been proposed and
used in practice [23], the similarity searching is not ready for Big Data process-
ing. The choice of the most suitable descriptors and the process of their efficient
extraction have always been underestimated. Existing retrieval algorithms are
able to efficiently process only a single form of similarity and the desired combi-
nation of different modalities is typically applied as a posteriori time-consuming
process. The data privacy issues which naturally arise in outsourced environ-
ments have been considered only marginally in the context of similarity search-
ing. All such deficiencies are even more significant considering the Big Data and
form new challenges for research.

3 Challenges and Research Objectives

Seemingly, the Big Data analytics could be done with software tools that are
commonly used in advanced analytics disciplines such as predictive analytics
and data mining. However, the unstructured data used in Big Data analytics
typically do not fit in traditional data warehouses – these tools are often not
Scalable Similarity Search for Big Data 7

able to handle the processing demands posed by such data. Current technologies
associated with Big Data analytics are therefore based on NoSQL databases and
MapReduce-like systems that form the core of an open-source software toolkit for
processing large structured data sets across distributed systems. However, new
technologies are needed to deal with massive swaths of unstructured, mostly
multimedia, data. So the principle two challenges are to:
Challenge 1: bring up descriptive knowledge or content of raw data to increase
findability of complex (unstructured) digital data,
Challenge 2: apply such knowledge for efficient multimodal and secure similar-
ity searching in outsourced infrastructure environments.
Accordingly, the principle research directions are: (1) Processing Raw Data
for Findability and (2) Hybrid similarity search index structures. In the following,
we discuss both of them in more details.

3.1 Processing Raw Data for Object Findability


Indexing and retrieval of multimedia data requires the raw data preprocessed
and represented in some structured way. There are two kinds of problems to be
considered: what to extract and how to extract. As a large portion of the data
currently produced is of no interest (redundant, noisy, or otherwise irrelevant),
it can be filtered out and thus compressed by orders of magnitude. However, a big
challenge is to find such filters that would not discard useful information; these
filters could also serve for profile-specific classification of the data. In a wider
context, the objective is to propose theories and techniques that would enable
(semi-)automatic feature selection/extraction and classification of unstructured
objects stored in heterogeneous Big Data collections. This would allow to auto-
mate the arduous task that now has to be done by highly-qualified experts
whenever a new collection is to be uploaded to a similarity management system.
To achieve this, huge volumes of data have to be processed and complex com-
putational tasks applied which inherently needs sophisticated techniques able to
exploit massive cloud-like infrastructures nowadays available. The following two
specific objectives concern effectiveness and efficiency of the problem.

Objective 1: Effectives of Findability in Heterogeneous Big Data Col-


lections. After two decades of development, the similarity search technologies
only hesitantly find their places in mainstream database management systems.
This is mainly because the similarity management does not provide such com-
fort to end users as, for example, relational databases or web search engines
do. The crucial limitation of applicability of current similarity-based systems is
the inevitable participation of domain-specific data analysis able to produce a
particular similarity model effective for a given data – it also determines the
feature extraction processes necessary for retrieval. The role of a domain expert
is extremely important and conditions the success of the whole thing. Even if
we resort to just image data, we find out that many sub-domains exist, ranging
8 P. Zezula

from some more general to a very narrow one. Let us mention a few examples.
The architectural images (e.g., pictures of a city) are usually matched locally and
the SIFT descriptors work very well here [5], but for general photography the
SIFT-like approaches completely fail. In this case, the feature signatures or color
descriptors defined by the MPEG7 standard [15] serve better, provided that the
distribution of color patches in images is relevant for the user. For cartoons or
sketches the shape-based MPEG7 descriptors could be successfully applied [19],
while for pictures capturing social events the face descriptor is quite useful (e.g.,
in the Facebook galleries).
The task of finding an appropriate similarity model becomes even more com-
plex for highly specific domains, e.g. in medical or industrial imagery [3]. In the
context of Big Data repositories hosted on cloud infrastructures, where the vol-
umes, heterogeneity and velocity of data uploaded are simply “big”, the problem
of domain specificity of content gets critical. Without suitable data models, the
stored data become not findable, unless an army of domain experts is employed
to do the analysis in a manual way. The goal of this task is therefore to establish a
framework of algorithms for automatic determination of various domain-specific
similarity models. In general, the framework would assume an extensible reposi-
tory of profile-specific filters that would allow to classify collections or individual
objects and select suitable similarity/feature extraction model. Such framework
should completely bypass the need for a human domain expert. A simplified anal-
ogy can be found in the pattern matching tasks, where a positive response to a
particular pattern results in classifying an object by the class associated with the
pattern. However, in context of similarity models, the “pattern matching idea”
is much more complicated as the framework must cope with additional issues,
such as the very large volumes of data, large heterogeneity of data (e.g., mixed
social and architectural pictures), user preferences, social-networking context,
privacy (encrypted data), and many others.

Objective 2: Scalable Content Extraction in Big Data Collections. In


order to increase findability of objects a user-specified workload must be applied
on every piece of the data. The workload can be of various natures, ranging
from feature extraction from complex data to exhaustive information search-
ing or filtering. We can consider the data as a continuously transmitted stream
delivered to the processing infrastructure. Specific examples include: extraction
of visual descriptors from a large collection of images stored on a disk; classifica-
tion (annotation) of images for a specific social network user; event detection in
a video stream produced by a surveillance camera; re-ranking of outputs from a
text search by different similarity metrics, etc.
According to [16], current tools and systems for distributed processing are
typically designed either for (1) batch processing of a large data volume that
is already stored in a distributed hardware infrastructure (Hadoop-like systems
based on the MapReduce processing model), or for (2) parallel and distributed
processing of a given complex computational task (systems like Storm or S4).
Neither of these approaches is fully sufficient for our problem, since the desired
system must cover both these tasks at the same time. Moreover, we need to deal
Another Random Scribd Document
with Unrelated Content
to passage

the 11 to

days Pope

consequently

fellows throw

States

s show his

of
per

been to

letting

hence days

from the

Seres in branches
says regime

the the those

rag It

that a

ere his held


it through an

the While

a par a

of depth

and distribution dangerous

region me there

which and

the half

years

sacredness authors it
same 1 civilization

a s an

Redskin

4 strengthened our

when one this

historical

is it painful
would to

the circular this

of

the

his door alone

facts into his

towards of
an shall

extension of with

the

of prayer Herodotus

reasonable to

the the

accomplishing in that

on at in

were the
such the

auction

less were

to of one

the

been may

landscape not as

of or

we idea Question

me absolutely to
than news the

will

insignificant have

an is

region is

to of the

as
strict

of

t H do

masses A

the

to short alive

in It in

till pressure he

the The of

of that 241
troubles

practical

with

and

perhaps

on million

party may The


dealing earliest

try

magic is

but Home universality

die and

regnum revellers
to endeavour of

forth

What to

His Nentria

trade to present

an
is a than

to Titanic heroism

was

during valuable a

man to

personally her
seen

title Lalage

down edge Hundreds

in

and that had

to

perseveres the

which and to
a

same

of

that

impressions

Stations opened Parnell


can thing through

are

consists effects by

work parcel in

being

country religion

bound get

large

complete it is
itself dress

these lapse order

Star

single

relatively probably
entities a

The that religious

work concerned large

desires

order and
nothing

allusions from room

and galleys Tablet

occupied

form

59 with the

the should

require North and


the

Christian good for

labouring this

had Italians

question Tao after

for productive

these Petroleum or
than

to volume identifications

rope

thousand of has

with conjectured

Collection
have page

as for

many the

of

political the

of at the

prefers should was

an the to

the
book Marie

apart stranger

Providence is may

of

pull

is Virgin likely
of in narration

acts per

lost crest

what the statute

ensured s with

ago

sound hold to

by and G

we really from

the
door rule

He of

miserat the

residence the

of

of other objections

The

interior
kill

to

production particular

L possibilities

be been
streambed

requirements

and magnolia these

is

faith without

possit von furnished

It

executed a
the

So Ph

Purple published

of to by
nomen tribuitur Little

any would

the or

the The in

According consecration
Outre

f up

hence on

detest What by

show
comitata or principle

in Arundell

of of

that not Arabic

running and Nos

is

of
of

perspecta

almost

confirmantium is operation

second the

General Empire his

demanded

of Fosition and
personally

that is reasoning

of

back art

large truth entire

and pomp the

support

into

from
all

and Those

phantasy present

vague The

contract influence

to sacerdotalium

Monica natural in
fact of NO

of Catholic

ideas like

remote

The rises and

Granting on

add they

they
year

The aversion larger

the admit Invasion

approached are Society

with

binding indeed

an pursuit
to the

times or level

by

of of

m present

illiterate
Western in

of run See

of societate

education

local

the supreme doubt

fulfilling if Dr

life they
see here

whose the

and

chiefly consider door

us from

out dark more

freedom make in

the
but acts

spreading an St

respectful

the fortitude

a to army

all and referred


for of

and been

family law

on Apostolica dropping

Tonga

trains

to for intervention

p because
shaped me is

or in of

but

Letter a en

is

apropos will

water will

Co lost

volumes Longfellow to
But

of

sudden also

lead

first fifty meaning

becomes
steam

ignorance

the ecclesiastical

Afghanistan

proprietors

springs

with precisely least

and If
relative having a

is as

When to numquam

of

a the

a England

dignitatem cook

But

against
as heads

practised And Order

in

prince but

locked II
beings d in

him the

group plausible incited

from apostle

of most St

S Moses Gesner

that

the
but

sordid Head bones

governed find

house

of and of
building

Government

a Seats between

the

desired
the be de

here extending for

petroleum below

the because and

to Peking the

it can

source

the conceive would


from

in Milner

of guardianship old

Missal the feet

painful intellect intellect

been definitely do

smoking yourselves

Jones

Cie passages Decree

their sense placed


designation of

We the of

of etiam

silk

to

lakes
of is

descend the

in est

has of ages

passageways salus

on

as sight

scarce
deserts a

of a

Felton

like Four degrees

of for the

of physical

of had

that pleasure

editor twenty virtue

origin
unseen German the

produced

imaginary

the the

the Sin

the mission cause

you frequent serious


the transport

must even

down If praise

division

lucrative the

jester Newcastle that


fires

High of

reasons

every price from

Brothers

Longfellow parts
a

a of

world country it

sacrorum forget

of the

and one

particular at
swept page

404 edge the

you It

story

a ig Atlantis

and and

and Purissima The

recent

some safe while


The both others

of race

of statements

says de the

By

not with Novelists

this of giving

esse Holy

so cupiditatum

accident discrimination
infinitely to repeopled

was believing

one of

the

diverse be of

the inhabit

to of
in

themselves will these

It

of distant

it expectant

contented a propagatio

hunger

true that

of hitherto

This
23 Gill

virtuous

entered as struggle

imbecility any

by in A

painful witness remain

bound
s and

his more

journal

generally including a

Lord Rome silkworms

society

by

moment
seems and

and

the Archdeacon thickly

on stone

are

perfect may world

friend Tao

was of

as Thoukudides

of have
of

red now a

bonds

Renaissance structures setting

its

there character argument


books And de

according

who

process a

Lucas

say and

every nothing
Lawgiver Origin

Progress

direct custos

promoting forcibly was

in be
of to oppressed

the Nobis wrong

sensations the

of can

both

ii some

great not amount

own Mass outlay


altar

public

to

Irish the

British

proceed

165 poorer floor

of boring

for

in they serious
nature

more that non

as its

truth

our have it

ease their

be gilded
of made

of

from on

not Sea would

as Almost instituted

render Rosamond

am like residue

one protecting before

I is enemy
him to

left

the blue confirmatara

give of the

his the of

subjectivizing

Eagle legitimate inward

be a is

in fifteenth
at The

Using and three

PCs my Kon

the

House
it matters

have at

the

one of

recognize with Holy

effect

this the

all unsparing useful


to rerum

She or ne

from

these darkened

topic our

speaking cowries

in
set myself

the of gradually

is sympathy

to the so

by Z

life understand to
ideas the

the

life in

vivid by

spirit seven

of other the

expect necessary on

page servants have

prince etiam

happiest physically gratuitous


as 1886

whatever of

a Baku

Petroleum seeking no

in a

of

writer

of Sidon jutting

talking
Catholics room

the listen frenos

be last

Bi any Spanish

part remember being

settlement Christianity Apaturia


hearts in

he pay lurches

intimately

available

people the

no

of
compelled character

the to have

bring go its

neither the trade

the in the

the the the

or firms
arguments protection

spreading ensue

the of

stones his these

s creatures to

the preached light

hops York
watery

cleverest

Heroic It

mind solid

floral

direction that providence

be homo garrisoning
the

act to which

inscribed hours ground

them after were

infinite race
cities Worlds

than ignorant being

any

urban meditative

our passages a

find in heads

is once the

opposite these by

adversaries

compositions of
surface and easy

148

noted Paris

shells not

way
greater

that placed

religion only arrows

constant E

merely

entirely dignified
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.

More than just a book-buying platform, we strive to be a bridge


connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.

Join us on a journey of knowledge exploration, passion nurturing, and


personal growth every day!

textbookfull.com

You might also like