Academia.eduAcademia.edu

Knowledge Discovery in Databases

description1,082 papers
group25,279 followers
lightbulbAbout this topic
Knowledge Discovery in Databases (KDD) is the process of identifying valid, novel, and potentially useful patterns from large datasets through data mining techniques, statistical analysis, and machine learning, aimed at extracting meaningful information and insights to support decision-making.
lightbulbAbout this topic
Knowledge Discovery in Databases (KDD) is the process of identifying valid, novel, and potentially useful patterns from large datasets through data mining techniques, statistical analysis, and machine learning, aimed at extracting meaningful information and insights to support decision-making.

Key research themes

1. How can human-centered process modeling improve the effectiveness of knowledge discovery in databases (KDD)?

This research area focuses on understanding and characterizing the iterative, knowledge-intensive interactions between humans and data during the KDD process. Recognizing that learning algorithms are only a segment of the overall KDD workflow, it emphasizes the design of systems that assist human analysts across all phases, from data selection and cleaning to interpretation and deployment. The goal is to create integrated environments that support users’ tasks holistically, rather than solely providing standalone data mining algorithms. This emphasis on user and task-centered perspectives addresses the complexity and real-world application challenges in knowledge discovery.

Key finding: This foundational paper delineates the major phases of the KDD process as complex interactions between human analysts and large databases, illustrating that rule induction algorithms represent only a portion of real-world KDD... Read more
Key finding: This comprehensive introduction defines KDD as an iterative and interactive process involving nine steps, starting from understanding domain goals to deployment and feedback cycles. It highlights human decisions at each... Read more
Key finding: This thesis addresses the limitations of fully automated data mining by incorporating expert involvement and incremental learning into the knowledge discovery lifecycle. It presents a methodology derived from multiple... Read more

2. What are effective methodologies and process models for applying data mining and knowledge discovery in healthcare settings?

This theme investigates domain-specific adaptations of KDD and data mining tailored for healthcare’s complex, voluminous, and heterogeneous data. It explores methodological considerations like integrating clinical, financial, demographic, and socioeconomic data, along with handling data types such as patient records and genomic data. The literature evaluates application areas including diagnosis, treatment optimization, patient risk prediction, and healthcare management, alongside discussing challenges around data quality, system integration, and domain knowledge incorporation to improve decision making and patient outcomes.

Key finding: This study categorizes healthcare KDD applications into patient, market, and system views, elucidating that mining healthcare data supports a wide array of goals from direct patient data mining to decision support system... Read more
Key finding: This review synthesizes healthcare KDD process models and applications, underscoring gains in clinical decision support and cost reduction via rapid and accurate data analysis. It identifies common challenges such as data... Read more
Key finding: This research develops predictive models that utilize integrated datasets combining claims, enrollment, medical, and pharmacy data to identify individuals likely to become high utilizers of healthcare services. Employing... Read more

3. How can data mining methods address challenges of continuously arriving data streams and temporal event analysis?

This theme examines methodological frameworks and algorithms designed to extract knowledge from data streams characterized by high volume, velocity, and continuous generation. It addresses the constraints of conventional static data mining on streaming data, necessitating one-pass, real-time, and resource-efficient processing. Furthermore, it delves into mining temporal relationships between interval events and detection of associations that unfold over time, providing insights into dynamic behaviors in applications like network monitoring and financial transactions.

Key finding: This chapter conceptualizes data stream mining as a process that must process infinite, rapid, and continuous data with limited memory and computational resources. It introduces algorithms adapted to handle the constraints of... Read more
Key finding: This research proposes a novel technique to mine temporal containment relationships in series of interval events, defining an event X containing Y if Y occurs entirely within X's time interval. The method advances beyond... Read more
Key finding: This work advances statistical pattern discovery by defining event association patterns as compound events whose observed occurrence frequencies deviate significantly from expectations under independence. It introduces... Read more

All papers in Knowledge Discovery in Databases

Se describe el desarrollo y de una aplicación para el análisis semántico de una colección documental. La herramienta permite interactuar con otros sistemas de gestión documental que aporten una organización del conocimiento específica, a... more
In recent years, digital marketing has transformed the way in which companies communicate with their customers around the world. The increase in the use of social networks and how users communicate with companies on the Internet has given... more
Executive Summary: The Vision Imagine every human thought, sentence, image, video, code snippet, or creative work having a unique "DNA fingerprint" that can be traced across languages, domains, and time. Like Bitcoin created decentralized... more
This paper focuses on the development of a quality assured software, it shows the importance of software reviews in the software development process and also how indispensable security is to software development; any software with no... more
Clinical trials for drug repositioning aim at evaluating the effectiveness and safety of existing drugs as new treatments. This involves managing and semantically correlating many interdependent parameters and details in order to clearly... more
This system also ensures safety and efficient use of parking space. As the system is automatic, the time consumed is reduced.
SQL supports a wide range of both technical and business-oriented roles within various organizations, reflecting its versatility and importance in the modern data-driven landscape. For business users, it is crucial to grasp the nuances... more
This study estimates the factors affecting socially vulnerable groups’ demand for and accessibility levels to green public spaces in Dhaka City, Bangladesh. Dhaka is a high-density city with one of the lowest levels of green space per... more
A distributed system is one in which components located at networked computers communicate and coordinate their actions only by passing messages. Now days Distributed system face different type of problems. One of those problems is load... more
by Dan Deac and 
1 more
The paper presents a new interpretation regarding a funerary monument, namely a profiled funerary stela, discovered at Porolissum, Roman Dacia. Helped by the Highlight Reflectance Transformation Imaging (H-RTI) method, improvements on... more
Terrorist attacks are the biggest challenging problem for the mankind across the world, which need the wholly attention of the researchers, practitioners to cope up deliberately. To predict the terrorist group which is responsible of... more
This study proposes a new method of identifying possible kidney disease biomarkers using Kidney Diseases Biomarker Ontology (KDBO). This study classically merges clinical imaging, biopsy data, proteomics, and genomic data derived from... more
"The real voyage of discovery consists not in seeking new landscapes, but in having new eyes."
- Marcel Proust 
"In Search of Lost Time"
The functional stability of ecosystems depends greatly on interspecific differences in responses to environmental perturbation. However, responses to perturbation are not necessarily invariant among populations of the same species, so... more
In many real-world tasks of image classification, limited amounts of labeled data are available to train automatic classifiers. Consequently, extensive human expert involvement is required for verification. A novel solution is presented... more
Certified AI Cybersecurity Analysts are not just professionals — they are the frontline architects of digital trust. Their hybrid expertise allows organizations to combat threats with unmatched intelligence and foresight. Learn more in... more
The availability of Arabic text documents on the Internet entails the use of convenient Arabic ‎text ‎classification (TC) techniques. Arabic TC requires extensive work in analyzing the content of valuable ‎Arabic documents. Its rich... more
En este artículo se muestra por medio de consultas específicas el comportamiento del modelo relacional y objetorelacional, y se presenta así un estudio en el que se mide y a la vez se compara la eficiencia (tiempo, uso de recursos del... more
Hukum Gauss merupakan salah satu pilar utama dalam elektromagnetisme dan mekanika klasik yang memberikan wawasan mendalam mengenai bagaimana fluks medan listrik atau gravitasi tersebar dalam ruang. Hukum ini dinyatakan dalam bentuk... more
Hukum Gauss merupakan salah satu pilar utama dalam elektromagnetisme dan mekanika klasik yang memberikan wawasan mendalam mengenai bagaimana fluks medan listrik atau gravitasi tersebar dalam ruang. Hukum ini dinyatakan dalam bentuk... more
Hukum Gauss merupakan salah satu pilar utama dalam elektromagnetisme dan mekanika klasik yang memberikan wawasan mendalam mengenai bagaimana fluks medan listrik atau gravitasi tersebar dalam ruang. Hukum ini dinyatakan dalam bentuk... more
The Meta-Unsolvability Function: Recursive Limits of Mathematical Knowledge and the Paradox of Proving Unprovability We introduce the Meta-Unsolvability Function (MUF), proving that some mathematical problems cannot even be proven... more
This paper presents an AI-powered personal fitness coaching system utilizing deep learning and real-time computer vision to assist users in exercise recognition and personalized workout planning. Leveraging YOLOv11 convolutional neural... more
Hukum Gauss merupakan salah satu pilar utama dalam elektromagnetisme dan mekanika klasik yang memberikan wawasan mendalam mengenai bagaimana fluks medan listrik atau gravitasi tersebar dalam ruang. Hukum ini dinyatakan dalam bentuk... more
Hukum Gauss merupakan salah satu pilar utama dalam elektromagnetisme dan mekanika klasik yang memberikan wawasan mendalam mengenai bagaimana fluks medan listrik atau gravitasi tersebar dalam ruang. Hukum ini dinyatakan dalam bentuk... more
Disusun Oleh: Kelompok 1 S1 Teknik Elektro UPN Veteran Jakarta Raffi Akbar (2310314036) Sion Lukas Elkana (2310314068) Dandi Ardian (2310314079) Wanda Ramadhan (2310314087) Gracia Aurora (2310314090) Nikola Tesla adalah tokoh penting... more
Hukum Gauss merupakan salah satu prinsip fundamental dalam elektromagnetisme dan gravitasi yang menyatakan bahwa fluks medan listrik atau gravitasi melalui suatu permukaan tertutup sebanding dengan jumlah muatan atau massa di dalamnya.... more
People often value the sensual, celebratory, and health aspects of food, but behind this experience exists many other value-laden agricultural production, distribution, manufacturing, and physiological processes that support or undermine... more
Рецензенти: докт. екон. наук, докт. техн. наук, професор, завідувач кафедри економічної кібернетики Східноукраїнського національного університету ім. В. Даля, заслужений діяч науки і техніки України Рамазанов С. К.; докт. екон. наук,... more
The VoH image database reflects the spatial and temporal breadth envisioned in the project by collecting thousands of visual forms and formats of astral knowledge from Eurasian history of almost 6000 years. The collection includes... more
Using a web-based survey methodology, the accessibility and completeness of library information on the Web sites of 20 major theological seminaries representing a various Christian denominations were appraised. Ten seminaries were chosen... more
The problem of finding connected components in undirected graphs has been well studied. It is an essential pre-processing step to many graph computations, and a fundamental task in graph analytics applications, such as social network... more
The problem of finding connected components in undirected graphs has been well studied. It is an essential pre-processing step to many graph computations, and a fundamental task in graph analytics applications, such as social network... more
This study aims to propose a Technology network architecture and explains a new algorithm that boosts the effectiveness of the present smart parking cloud-based system. Finding open parking places is difficult for users to perform at busy... more
One of the "holy grails" of computational linguistics has been to have a machine carry out a conversation, and to have some idea of what it is talking about. Loglan's (Brown, & 1975) ) machine grammar was a first attempt to carry out such... more
Esta tesis de master esta enfocada en el analisis de trayectorias de procesos discretos , implementando un diagrama que permita representar de forma visual el comportamiento de un trayecto generado por la clasificacion basada en reglas de... more
In this paper we propose a novel parallel algorithm for frequent itemset mining. The algorithm is based on the filter-stream programming model, in which the frequent itemset mining process is represented as a data flow controlled by a... more
auto-évolutifs, c'est-à-dire capables de s'adapter au style d'écriture et aux habitudes de chacun, sans toutefois nécessiter de période d'apprentissage fastidieuse. Nous utilisons une approche d'apprentissage incrémental de classifieurs... more
Abstract: Bayesian graphical models are commonly used to build student models from data. A number of standard algorithms are available to train Bayesian models from student skills assessment data. These models can assess student knowledge... more
Many quality improvement (QI) programs including six sigma, design for six sigma, and kaizen require collection and analysis of data to solve quality problems. Due to advances in data collection systems and analysis tools, data mining... more
This report provides insights into the challenges, emerging topics, and opportunities related to human–data interaction and visual analytics in the AI era. The BigVis 2024 organizing committee conducted a survey among experts in the... more
Durant la ultima decada hem viscut l'explosio de la fotografia digital. Les possibilitats de captar imatges s'ha multiplicat de manera gairebe infinita sense implicar costos afegits pel fotograf. Aixo ens obliga a replantejar-nos... more
In software or program ontologies described in the literature, the separation between abstract and implementation views is evident. On the other hand, the ontological entities involved in such views are not very well defined in the... more
A Feature selection algorithm employ for removing irrelevant, redundant information from the data. Amongst feature subset selection algorithm filter methods are used because of its generality and are usually good choice when numbers of... more
We have a set S ⊂ {0, 1} n , together with, for each x ∈ S, the result of some unknown function F : {0, 1} n → {0, 1} applied to x, and a method for generating a hypothesis h ∈ H about F given S. We present theoretical and experimental... more
Download research papers for free!