0% found this document useful (0 votes)

12 views

TextTech - Final Report - ACunningham

The document discusses using a graph database instead of a relational database for storing and querying course data. It describes how the current project uses a relational database to organize course information into tables but explores how a graph database could model the relationships between courses and their attributes. The graph structure may improve query performance and allow a larger dataset by reducing complexity compared to the relational structure.

Uploaded by

amariac810

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

TextTech - Final Report - ACunningham

Uploaded by

amariac810

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Text Technology - Summer 2023 Andrea Cunningham (3594623)

Text Encoding for Course Data

Project Report

The objective of our project Text Encoding for Course Data was to give students the ability to

query for information about university courses. Such functions are useful for students when

planning their class schedules and sorting through candidates. Though we use a relational

database rendered by SQLite via Python, this paper explores the question of how using a graph

database might impact the data structure and functionality of our search engine. I will begin by

giving an overview of our project architecture followed by a discussion of this hypothetical

question.

Collect

We began by choosing a small database consisting of course data with various parameters that

would be relevant to students, such as the title,course number, location, etc. Our database is

comprised of course data from “Courses from Reed College '', which we found on the University

of Washington’s XML Data Repository1 (a collection of publicly available datasets used for

research purposes). The database specified the following course parameters: registration

number, subject, course number, section, title, units, instructor, days, place, and start and end

times. However, we tweak the parameter breadth during the encoding process.

Prepare

In order to read the dataset and create a database, we first converted our XML dataset to a

dictionary to be inserted into our database using SQLite functionalities (passing insert

statements as a parameter to the execute() method). The database was divided into three

tables: courses, time, and place. This is because the database’s original data structure had

already separated course information from metadata like time and place. Because the original
1
http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/data/courses/reed.xml
Text Technology - Summer 2023 Andrea Cunningham (3594623)

database contained eleven parameters per course, we decided only to encode eight of the most

crucial, relevant parameters from the database (registration number, subject, title, units,

instructor, days, and start and end times).

Access

Finally, we used XPath expressions in order to retrieve specific information from the

XML-encoded course data. However, in order to provide an interface for the user to enter their

queries, we built a web application via the Django framework. This way, a user can type their

query into the search bar and retrieve any of the eight parameters from the XML-encoded

course data file. Another useful feature of this framework is the ability to filter courses by the

time boundaries in which they occur—a sort of preservation of the database’s original metadata

structure. The web application then displays the results. An example of a query could simply be

the instructor’s name or the course subject, and returned would be a discrete list of courses

each with their eight attributes listed. Figure 1 shows an example of a search query containing

an instructor’s name and the courses taught by said instructor.

Figure 1: An example of a user query and the respective search results.

Text Technology - Summer 2023 Andrea Cunningham (3594623)

What if one had used a graph database?

A graph database would be constructed based on the relations between entities (in this case,

courses), whereas our relational database simply organizes entities and their parameters into

columns. The reason why a graph database might be conducive to our object is that many

entities have overlapping features, and the parameters of our dataset range from general to

unique. For example, the subject ‘CHEM’ would be considered a general parameter in that

many courses in the database could share this attribute; while the course title “Field Biology of

Amphibians” would be considered a unique parameter, as names are a more exclusive

category. Therefore, one could build a constellation of entities (courses) and their relations

(“Bacterial Pathogensis” and “Seminar in Biology” belong under “BIOL”). A benefit to such a

data structure is that it would reduce complexity. It would require much less code than it would

take to connect more entities with relations. Such would speed up the querying process. This

would also allow one to use a much larger dataset. Yet, a downside to this structure may be

sparseness of data. Some entities may not have enough relations with other entities, causing

outliers.

Scalability of Databases For Digital Libraries
100% (2)
Scalability of Databases For Digital Libraries
10 pages
The Design of POSTGRES
No ratings yet
The Design of POSTGRES
28 pages
ADBMS
No ratings yet
ADBMS
3 pages
Bai Tap Nhóm Co Hien.
No ratings yet
Bai Tap Nhóm Co Hien.
17 pages
Slicing A New Approach To Privacy Preserving Data Publishing
No ratings yet
Slicing A New Approach To Privacy Preserving Data Publishing
19 pages
Content-Based Audio Retrieval Using A Generalized Algorithm
No ratings yet
Content-Based Audio Retrieval Using A Generalized Algorithm
13 pages
Ans: A: 1. Describe The Following: Dimensional Model
No ratings yet
Ans: A: 1. Describe The Following: Dimensional Model
8 pages
15IT302J DBMS Sessionwise Unit I 1
No ratings yet
15IT302J DBMS Sessionwise Unit I 1
53 pages
Interactive Query and Search in Semistructured Databases: Roy Goldman, Jennifer Widom
No ratings yet
Interactive Query and Search in Semistructured Databases: Roy Goldman, Jennifer Widom
7 pages
International Journal of Computational Engineering Research (IJCER)
No ratings yet
International Journal of Computational Engineering Research (IJCER)
7 pages
Unit 2
No ratings yet
Unit 2
21 pages
Database Chapter 1 CS307
No ratings yet
Database Chapter 1 CS307
60 pages
Structural XML Query Processing
No ratings yet
Structural XML Query Processing
41 pages
Building A Generative AI Platform
No ratings yet
Building A Generative AI Platform
26 pages
ADBCH1(1)
No ratings yet
ADBCH1(1)
10 pages
Text Retrieval Oodb
No ratings yet
Text Retrieval Oodb
21 pages
SILABUS Algorithm and Data Structure
No ratings yet
SILABUS Algorithm and Data Structure
6 pages
Information Retrieval Thesis Topics
100% (3)
Information Retrieval Thesis Topics
6 pages
Introduction To Data Structures: Dept. of Computer Science Faculty of Science and Technology
No ratings yet
Introduction To Data Structures: Dept. of Computer Science Faculty of Science and Technology
20 pages
Grouper A Dynamic Cluster Interface To Web Search Results
No ratings yet
Grouper A Dynamic Cluster Interface To Web Search Results
15 pages
SEM 4 MC0077 Advances Database System
No ratings yet
SEM 4 MC0077 Advances Database System
38 pages
Document Clustering: Alankrit Bhardwaj 18BIT0142 Priyanshu Gupta 18BIT0146 Aditya Raj 18BIT0412
No ratings yet
Document Clustering: Alankrit Bhardwaj 18BIT0142 Priyanshu Gupta 18BIT0146 Aditya Raj 18BIT0412
33 pages
Data Structure csc204 - 1716370942
No ratings yet
Data Structure csc204 - 1716370942
95 pages
DBMS Assignment Answers
No ratings yet
DBMS Assignment Answers
14 pages
Thesis Database System
100% (2)
Thesis Database System
5 pages
Supporting Search-As-You-Type Using SQL in Databases: Guoliang Li, Jianhua Feng, Member, IEEE, and Chen Li, Member, IEEE
No ratings yet
Supporting Search-As-You-Type Using SQL in Databases: Guoliang Li, Jianhua Feng, Member, IEEE, and Chen Li, Member, IEEE
15 pages
Glossing The Information From Distributed Databases
No ratings yet
Glossing The Information From Distributed Databases
4 pages
Project Proposal
No ratings yet
Project Proposal
10 pages
Intelligent Information Retrieval From The Web
No ratings yet
Intelligent Information Retrieval From The Web
4 pages
Data Structure
No ratings yet
Data Structure
81 pages
Adbms Lectures Midterm-ADBMS
No ratings yet
Adbms Lectures Midterm-ADBMS
3 pages
Query Optimization and Execution Plan Generation in Object-Oriented Data Management Systems
No ratings yet
Query Optimization and Execution Plan Generation in Object-Oriented Data Management Systems
51 pages
An Improved Technique For Document Clustering
No ratings yet
An Improved Technique For Document Clustering
4 pages
QA Review: IR-based Question Answering
No ratings yet
QA Review: IR-based Question Answering
11 pages
Designing_and_Building_an_Automatic_Information_Re
No ratings yet
Designing_and_Building_an_Automatic_Information_Re
7 pages
Reengineering of Relational Databases To Object Oriented Database
No ratings yet
Reengineering of Relational Databases To Object Oriented Database
3 pages
Mphil Thesis in Computer Science Data Mining
100% (3)
Mphil Thesis in Computer Science Data Mining
7 pages
Interoperable & Efficient: Linked Data For The Internet of Things
No ratings yet
Interoperable & Efficient: Linked Data For The Internet of Things
15 pages
PSO11
No ratings yet
PSO11
5 pages
Unit 5
No ratings yet
Unit 5
40 pages
View of Data
No ratings yet
View of Data
4 pages
Farmaan Frontend
No ratings yet
Farmaan Frontend
13 pages
Notes, DB Intro
No ratings yet
Notes, DB Intro
33 pages
Student Registration System Project Literature Review
100% (1)
Student Registration System Project Literature Review
8 pages
Short Answer Questions: 1. Define The Term Cardinality?
No ratings yet
Short Answer Questions: 1. Define The Term Cardinality?
7 pages
21-1124
No ratings yet
21-1124
6 pages
Course Generation As A Web-Service (CGWS) For E-Learning Systems
No ratings yet
Course Generation As A Web-Service (CGWS) For E-Learning Systems
6 pages
Towards Transforming Tabular Datasets Into
No ratings yet
Towards Transforming Tabular Datasets Into
10 pages
Efficient Searching On Data Using Forward Search
No ratings yet
Efficient Searching On Data Using Forward Search
8 pages
Book Exercises NayelliAnswers
No ratings yet
Book Exercises NayelliAnswers
3 pages
Proj 4262
No ratings yet
Proj 4262
11 pages
CSE Idatabase Management System Report PDF
No ratings yet
CSE Idatabase Management System Report PDF
18 pages
Answers
No ratings yet
Answers
5 pages
Chapter 1: Databases and Database Users
No ratings yet
Chapter 1: Databases and Database Users
26 pages
Annotating Search Results
No ratings yet
Annotating Search Results
14 pages
Micro Project Report: (Your Teacher Name)
No ratings yet
Micro Project Report: (Your Teacher Name)
15 pages
Data Structures and DBMS For CAD Systems - A Review
No ratings yet
Data Structures and DBMS For CAD Systems - A Review
9 pages
Thesis On Query Optimization in Distributed Database
100% (1)
Thesis On Query Optimization in Distributed Database
6 pages
Information Retrieval
No ratings yet
Information Retrieval
9 pages
Data Structures I Essentials
From Everand
Data Structures I Essentials
Dennis Smolarski
No ratings yet
Data Mining Doubt Clearing Session Questions
No ratings yet
Data Mining Doubt Clearing Session Questions
12 pages
System Implementation of A Laundry Management System by Faris Azhar Khan Student ID: 201953080007
No ratings yet
System Implementation of A Laundry Management System by Faris Azhar Khan Student ID: 201953080007
10 pages
Resume Pratish Katiyar
No ratings yet
Resume Pratish Katiyar
1 page
Smart Draw! Doodle Recognition
No ratings yet
Smart Draw! Doodle Recognition
6 pages
Unit 1dbms
No ratings yet
Unit 1dbms
41 pages
Template Jurnal Kesehatan Prima 2020
No ratings yet
Template Jurnal Kesehatan Prima 2020
2 pages
Xi Ip Split-Up 2024-25 KVS Ro Guwahati Final-1
No ratings yet
Xi Ip Split-Up 2024-25 KVS Ro Guwahati Final-1
4 pages
OCR_of_Kannada_Characters_Using_Deep_Learning[1]
No ratings yet
OCR_of_Kannada_Characters_Using_Deep_Learning[1]
4 pages
Artificial Intelligence Overview
No ratings yet
Artificial Intelligence Overview
10 pages
Hashing
No ratings yet
Hashing
4 pages
Lesson D - 1 Ch04 Data Management Elements of The Database Environment
No ratings yet
Lesson D - 1 Ch04 Data Management Elements of The Database Environment
26 pages
Resume - Faaz Farooqui - AI_2024
No ratings yet
Resume - Faaz Farooqui - AI_2024
3 pages
Normalization 2
No ratings yet
Normalization 2
17 pages
(Unicode) Character
No ratings yet
(Unicode) Character
4 pages
INTERNSHIP
No ratings yet
INTERNSHIP
12 pages
Tutorial 3
No ratings yet
Tutorial 3
4 pages
Context-aware systems-A Literature Review 2023
No ratings yet
Context-aware systems-A Literature Review 2023
10 pages
House Price Prediction using AI
No ratings yet
House Price Prediction using AI
14 pages
PARNIT 05 PPT
No ratings yet
PARNIT 05 PPT
15 pages
General Java Tips
No ratings yet
General Java Tips
18 pages
Directory Management System
No ratings yet
Directory Management System
88 pages
Probabilistic Language Modeling Challenges
No ratings yet
Probabilistic Language Modeling Challenges
12 pages
AI Unit5 Ppts
No ratings yet
AI Unit5 Ppts
27 pages
Distributed Database Systems-Chhanda Ray
No ratings yet
Distributed Database Systems-Chhanda Ray
271 pages
Log
No ratings yet
Log
2 pages
What Is An Intelligent System-Paper-2022
100% (1)
What Is An Intelligent System-Paper-2022
16 pages
RM Ss Library and Information Science Paper I II
No ratings yet
RM Ss Library and Information Science Paper I II
17 pages
XML DTD
No ratings yet
XML DTD
12 pages
10 AI QP 27112024
No ratings yet
10 AI QP 27112024
7 pages
Research Comp Science 1819
No ratings yet
Research Comp Science 1819
22 pages

TextTech - Final Report - ACunningham

Uploaded by

TextTech - Final Report - ACunningham

Uploaded by

Text Technology - Summer 2023 Andrea Cunningham (3594623)

Text Encoding for Course Data

giving an overview of our project architecture followed by a discussion of this hypothetical

instructor, days, and start and end times).

an instructor’s name and the courses taught by said instructor.

Figure 1: An example of a user query and the respective search results.

What if one had used a graph database?

Amphibians” would be considered a unique parameter, as names are a more exclusive

You might also like