0% found this document useful (0 votes)
12 views

TextTech - Final Report - ACunningham

The document discusses using a graph database instead of a relational database for storing and querying course data. It describes how the current project uses a relational database to organize course information into tables but explores how a graph database could model the relationships between courses and their attributes. The graph structure may improve query performance and allow a larger dataset by reducing complexity compared to the relational structure.

Uploaded by

amariac810
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

TextTech - Final Report - ACunningham

The document discusses using a graph database instead of a relational database for storing and querying course data. It describes how the current project uses a relational database to organize course information into tables but explores how a graph database could model the relationships between courses and their attributes. The graph structure may improve query performance and allow a larger dataset by reducing complexity compared to the relational structure.

Uploaded by

amariac810
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Text Technology - Summer 2023 Andrea Cunningham (3594623)

Text Encoding for Course Data

Project Report

The objective of our project Text Encoding for Course Data was to give students the ability to

query for information about university courses. Such functions are useful for students when

planning their class schedules and sorting through candidates. Though we use a relational

database rendered by SQLite via Python, this paper explores the question of how using a graph

database might impact the data structure and functionality of our search engine. I will begin by

giving an overview of our project architecture followed by a discussion of this hypothetical

question.

Collect

We began by choosing a small database consisting of course data with various parameters that

would be relevant to students, such as the title,course number, location, etc. Our database is

comprised of course data from “Courses from Reed College '', which we found on the University

of Washington’s XML Data Repository1 (a collection of publicly available datasets used for

research purposes). The database specified the following course parameters: registration

number, subject, course number, section, title, units, instructor, days, place, and start and end

times. However, we tweak the parameter breadth during the encoding process.

Prepare

In order to read the dataset and create a database, we first converted our XML dataset to a

dictionary to be inserted into our database using SQLite functionalities (passing insert

statements as a parameter to the execute() method). The database was divided into three

tables: courses, time, and place. This is because the database’s original data structure had

already separated course information from metadata like time and place. Because the original
1
http://aiweb.cs.washington.edu/research/projects/xmltk/xmldata/data/courses/reed.xml
Text Technology - Summer 2023 Andrea Cunningham (3594623)

database contained eleven parameters per course, we decided only to encode eight of the most

crucial, relevant parameters from the database (registration number, subject, title, units,

instructor, days, and start and end times).

Access

Finally, we used XPath expressions in order to retrieve specific information from the

XML-encoded course data. However, in order to provide an interface for the user to enter their

queries, we built a web application via the Django framework. This way, a user can type their

query into the search bar and retrieve any of the eight parameters from the XML-encoded

course data file. Another useful feature of this framework is the ability to filter courses by the

time boundaries in which they occur—a sort of preservation of the database’s original metadata

structure. The web application then displays the results. An example of a query could simply be

the instructor’s name or the course subject, and returned would be a discrete list of courses

each with their eight attributes listed. Figure 1 shows an example of a search query containing

an instructor’s name and the courses taught by said instructor.

Figure 1: An example of a user query and the respective search results.


Text Technology - Summer 2023 Andrea Cunningham (3594623)

What if one had used a graph database?

A graph database would be constructed based on the relations between entities (in this case,

courses), whereas our relational database simply organizes entities and their parameters into

columns. The reason why a graph database might be conducive to our object is that many

entities have overlapping features, and the parameters of our dataset range from general to

unique. For example, the subject ‘CHEM’ would be considered a general parameter in that

many courses in the database could share this attribute; while the course title “Field Biology of

Amphibians” would be considered a unique parameter, as names are a more exclusive

category. Therefore, one could build a constellation of entities (courses) and their relations

(“Bacterial Pathogensis” and “Seminar in Biology” belong under “BIOL”). A benefit to such a

data structure is that it would reduce complexity. It would require much less code than it would

take to connect more entities with relations. Such would speed up the querying process. This

would also allow one to use a much larger dataset. Yet, a downside to this structure may be

sparseness of data. Some entities may not have enough relations with other entities, causing

outliers.

You might also like