0% found this document useful (0 votes)
41 views5 pages

Clustering The Results of PKL Collection Object Images As Data Mining Learning Digital Libraries Using Orange

Author: Yucke Aulia's Through the use of Orange, I grouped images of PKL collection objects as part of data mining learning for digital libraries. This technique helps identify patterns and relationships between data, enabling the development of digital library systems that are more efficient and relevant to user needs.

Uploaded by

yucke aulia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views5 pages

Clustering The Results of PKL Collection Object Images As Data Mining Learning Digital Libraries Using Orange

Author: Yucke Aulia's Through the use of Orange, I grouped images of PKL collection objects as part of data mining learning for digital libraries. This technique helps identify patterns and relationships between data, enabling the development of digital library systems that are more efficient and relevant to user needs.

Uploaded by

yucke aulia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Clustering the Results of PKL Collection Object Images as Data Mining Learning

Digital Libraries Using Orange


Yucke Aulia
Department of Information and Library Science
Universitas Airlangga
Surabaya, Indonesia
[email protected]

Abstract— Digital learning brings Orange based on Picture segmentation is a key component of different
open-source proprietary components for data
vision applications. The goal is to isolate a given image
development. Orange data mining is proposed to users in
data development who want to create or test algorithms to picture into perceptual domains with pixels in each
create Python content libraries for data analysis. This domain belonging to the same visual object with slightly
Orange software creates a classification model with an
different features. Fine art photography is an art category
intuitive visual programming approach. Graphical
components that utilize and support digital-based that is concerned with aesthetic value and beauty rather
communications and can be combined into an application than functional value. Drawings using artistic drawing
with visual programming tools. Since image pixels are not
techniques, namely collections that include archeology,
usually labeled, the commonly used approach for this
clustering. archival and cultural heritage. A frequent challenge in

Keywords— orange, clustering, visualization, efforts to grow crops for Indonesian libraries and
image analytics, machine learning. museums is deciding what kind of collection to maintain,
because there is no way to assist the collection selection
1. INTRODUCTION process quickly and accurately. Therefore, there is a need

Through the human background that always for a suitable method to automatically classify collection

generates a lot of data all the time, it is necessary to artifacts in a short time with high accuracy.

perform data mining so that the recorded data is not 2. LITERATURE REVIEW
wasted and data mining to help gather insights about the Data mining is the extraction of important or
information, beliefs or trends. Data mining is the interesting information or patterns from existing data in a
discovery and interpretation of (unknown) patterns in data database. In scientific journals, data mining is also known
to solve business problems. Programming Orange Data as Knowledge Discovery in databases. The explanation of
Mining to analyze cluster data using hierarchical each step is as follows: (1) Data cleaning (to remove noise
clustering. It is a clustering method by dividing data into from inconsistent data); (2) Data integration (can combine
hierarchical clusters then displaying them in dendrogram separated data sources); (3) Data selection (data related to
form as a chart. Clustering is a versatile machine-learning the analysis task is returned to the database); (4) Data
technique that serves as a precursor to the application of Transformation, Transformation of data (modified or
learning techniques such as classification. This clustering combined data in a form suitable for discovery with a
technique groups data with similar distinguishing performance summary or active); (5) Data mining (an
characteristics for identification useful for image essential process where appropriate methods are used to
segmentation. extract data patterns); (6) Evolution of models (to identify
An elective arrangement is to utilize a really interesting models that represent knowledge-based
calculation to see pictures of items. Think of each photo some interesting actions); (7) Presentation of knowledge
as unlabeled information that the demonstrate will at that (illustration of images and knowledge techniques used to
point name as either inadequate or satisfactory. Image impart knowledge mined for users).
In image mining, clustering techniques are used variables. Called dissimilarity clustering, the majority of
to group shot data based on the similarity of the content data analysis methods have been adapted or determined
they have into several clusters, so that each cluster will directly to handle different matrices.
contain shot image data with similar content. Clusters Then, the cosine distance is determined as
were detected using the agglomeration method. follows: Cosine Distance=1−Cosine similarity. The above
Unsupervised clustering is identification of natural cosine distance is specified for positive values ​only. It is
groupings data without prior knowledge of labels or also not an exact distance because of inequality. The
classes. For a cluster to be determined mathematically, the initial process in image preprocessing is filtering, namely
sample variance within the cluster must be small (in terms deleting or saving image options. Here is a process where
of variance), while the variance between clusters must be the image filtering process is not needed in the next
large (between the variances). Image analysis is a key process.
component of the task to analyze visual image techniques
as both are used in persuading the reader of a particular
viewpoint and so do the visual weather photographs.
3. RESEARCH METHODOLOGY

This study uses action Clustering technology


research methods. As is known, clustering is a natural
grouping. View the images in the folder without grouping
them. This record is a semi-objective dataset as it shows Figure 1. Image clustering network structure

some records downloaded from the web or sources by a Image embedding is an important part of data
third party. Some of us do the following configuration to science and an analytics report wouldn’t be complete
create a cluster. Natural clustering results by selecting without it. Orange offers several options for saving and
different clusters in hierarchical clustering. An Orange editing. The embedding process represents vectorized
data mining workflow with widgets from the Image feature images, which enable Orange's standard utility
Analytics add-on. store to be used for clustering, classification, or other

Perform data processing and extract data on the types of feature-based analysis. Image Analyzer also

information that will be clustered: provides a utility for viewing images and makes it
possible to save them.
IMAGE TYPE TOTAL
4. RESULT AND DISCUSSION
ARKEOLOGI 44

ARSIP 18

HERITAGE 38

RESULTS 100

Consider the general case of different matrix


methods working on different matrices. The traditional Figure 2. Import images from a directory

way of grouping proximity data is to use standard Contains the number of directories that contain
hierarchical cluster analysis. Next, know how to calculate subdirectories. In this case, Orange data mining will treat
the cosine distance (or cosine similarity, angular cosine each directory as a class value. In the example above, an
distance, angular cosine similarity) between two
image upload containing 100 images will be used as the Enhanced output data table containing image
layer value. descriptors represented from attributes n0, n1, n2, n3…
for each source image processed by the Image Analysis
utility only partial descriptors extraction only.

Figure 3. Displays Image Viewer

The Image Viewer tool can display images from


datasets stored locally by the Internet. There are 100 Figure 6. The distance categories involved

images in the collection. Passing data to distance the link criterion

How to convert the raw image into a vector distance function is also an important factor in the

representation using the extracted factor called Image formation of hierarchical clusters. Since clusters are

Embedding. Image Embedding here widget sends the composed of many elements in this method, the distance

image to the server and embeds the computer remotely. calculation involves many elements.

a. Results on Clustered Collections

In the test phase with archaeological objects, the


results show that clustered archeology and identified
cosine similarity determines the similarity between
images or vectors.

Figure 4. Classification of data table clustering

Data table is just a set of Meta Information with


image name, image format, size, width, height and
numbers that describe the content of the image.

Figure 7. Hierarchical structure of dataset well clustered

Cosine similarity and cosine distance for the final result


that occurs. The cosine similarity says that to find the
similarity between two points or vectors, find the angle
between them. In the results above, we can see that the
archaeological images are grouped together. After
Figure 5. Classification of data table technique
opening up the archeology, it turned out that the two
Look at the table data well not only Meta collections were a collection of images of the same object
information but it also has 2048 additional features for the giving rise to many forms of archeology. The two objects
profile picture along with the files or around it there are have different characteristics, starting from the name of
2048 features either with embedding we can compare the shadow, size, length, angle width and light. However,
image and calculate similarities.
when expanding the result group, the results appear as The results show the dendrogram, dendrogram
Figure 7 above. have grouped the types of collection that are already put

b. Results on Unclustered Collections in the folder. Those pictures look at the image viewer to
match it. It looks like the embedding image is perfectly
aligned, all collections are grouped together.

5. CONCLUSION

The research results have a conclusion to


leverage and evaluate hierarchical data clustering using a
variety of distance metrics and association methods.
Explain the corresponding dendrogram. Apply
Figure 8. Hierarchical structure of dataset not well clustered
density-based clustering to your data and observe the
The results of the image comparison do not show effect of different parameters. Create, evaluate, and
anything similar, they don't look clustered smoothly as interpret association rules based on event data in images.
you can see two archaeological photos 37 and 39 should Knowing the other details of tinkering with different
not be classified as heritage types. The similarity of these settings and different models that are up to do as kind of a
two documents is only 7.6%, which makes sense since separate activity from the data set. Illustration of possible
they are "archaeological" images, which do not even ways to find images similar to each other based on land
appear once in a given image. belongs to the “heritage” use patterns, structures and shapes. Some of the results
category. Makes the final result not well clustered. were surprising and could not be confirmed by human
visual inspection alone without machine learning.

REFERENCES

Li, CT., Lin, X. A fast source-oriented image clustering


method for digital forensics. J Image Video Proc.
2017, 69 (2017).
https://doi.org/10.1186/s13640-017-0217-y
Figure 9. Hierarchical structure of dataset not well clustered
wikipedia. (n.d.). Orange(Software). Retrieved June 12,
Similarly to Figure 8. As expected, classify this
2021, from wikipedia.org website:
figure less similarly than the first two. 40% is still
https://en.wikipedia.org/wiki/Orange_(software)
relatively high, but likely because they both have a big
Zhu Q, Hu L, Wang R. Image Clustering Algorithm
highlight in the center with little in common in color. As
Based on Predefined Evenly-Distributed Class
with the sample text, you can set a limit on something that
Centroids and Composite Cosine Distance.
is "similar enough", which makes cosine similarity ideal
Entropy. 2022; 24(11):1533.
for grouping and other sorting methods. Makes the final
https://doi.org/10.3390/e24111533
result not well clustered.
Abd Elaziz M, Abo Zaid EO, Al-qaness MAA, Ibrahim
Hierarchical clustering is mainly classified as RA. Automatic Superpixel-Based Clustering for
agglomerative hierarchical clustering. Start with Color Image Segmentation Using q-Generalized
individual elements of different patterns and merge them Pareto Distribution under Linear Normalization
until reach a stopping criterion. Hierarchical clustering and Hunger Games Search. Mathematics. 2021;
begins by dividing the entire dataset into clusters which 9(19):2383.
are regarded as one cluster. https://doi.org/10.3390/math9192383
Mittal, H., Pandey, A.C., Saraswat, M. et al. A B. I. Khaleel, “Image Clustering based on Artificial
comprehensive survey of image segmentation: Intelligence Techniques,” AL-Rafidain Journal
clustering methods, performance parameters, and of Computer Sciences and Mathematics, vol. 11 ,
benchmark datasets. Multimed Tools Appl 81, no. 1, pp. 99-112, 2014, doi:
35001–35026 (2022). 10.33899/csmj.2014.163735.
https://doi.org/10.1007/s11042-021-10594-9 Ahmed M, Seraj R, Islam SMS. The k-means Algorithm:
Godec, P., Pančur, M., Ilenič, N. et al. Democratized A Comprehensive Survey and Performance
image analytics by visual programming through Evaluation. Electronics. 2020; 9(8):1295.
integration of deep models and small-scale https://doi.org/10.3390/electronics9081295
machine learning. Nat Commun 10, 4551 (2019). Li, C. T., & Lin, X. (2017). A fast source-oriented image
https://doi.org/10.1038/s41467-019-12397-x clustering method for digital forensics. Eurasip
Asanka, Dinesh. 2020. “Image Classification in Orange” Journal on Image and Video Processing, 2017,
http://dbfriend.blogspot.com/2020/11/image-clas 1-16. [69].
sification-in-orange.html. Retrieved 22 June https://doi.org/10.1186/s13640-017-0217-y
2023.
Li, H. (2009). Text Clustering. In: LIU, L., ÖZSU, M.T.
(eds) Encyclopedia of Database Systems.
Springer, Boston, MA.
https://doi.org/10.1007/978-0-387-39940-9_415
Taskesen, Erdogan. 2021. “A step-by-step-guide for
clustering images”
https://towardsdatascience.com/a-step-by-step-gu
ide-for-clustering-images-4b45f9906128.
Retrieved 22 June 2023.
Suroyo, H. (2019). Penerapan Machine Learning dengan
Aplikasi Orange Data Mining Untuk
Menentukan Jenis Buah Mangga.
C. Kurniawan, S. Rizki Kusumaningrum, E. Surahman
and Z. Zakaria, "Clustering of Fine Art-Images
as Digital Learning Content using Data
Mining-Image Analysis Techniques," 2022 2nd
International Conference on Information
Technology and Education (ICIT&E), Malang,
Indonesia, 2022, pp. 37-42, doi:
10.1109/ICITE54466.2022.9759840.
A. Malakar and J. Mukherjee, “Image Clustering using
Color Moments, Histogram, Edge and K-means
Clustering,” International Journal of Science and
Research (IJSR), vol. 2, no. 1, pp. 532-537,
2013.

You might also like