0% found this document useful (0 votes)
75 views27 pages

SIT772 Database & Info Retrieval

This document appears to be a student assignment containing questions and answers about database and information retrieval topics. It includes definitions of terms, boolean queries involving terms from sample documents, and term frequency calculations. It also compares results from Google and Yahoo searches, noting that the results are mostly different between the two search engines due to differences in their ranking algorithms. Google is assessed to provide more relevant, reliable results due to its focus on quality over older, established sites.

Uploaded by

movie download
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views27 pages

SIT772 Database & Info Retrieval

This document appears to be a student assignment containing questions and answers about database and information retrieval topics. It includes definitions of terms, boolean queries involving terms from sample documents, and term frequency calculations. It also compares results from Google and Yahoo searches, noting that the results are mostly different between the two search engines due to differences in their ranking algorithms. Google is assessed to provide more relevant, reliable results due to its focus on quality over older, established sites.

Uploaded by

movie download
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

SIT772 Database and Information Retrieval

Name of the Student

Name of the University

Authors note
Table of Contents

Question 1.1...............................................................................................................................3

Question 1.2:..............................................................................................................................6

Question 1.3:..............................................................................................................................6

Question 1.4:..............................................................................................................................8

Question 2:.................................................................................................................................9

Question 2.1:..........................................................................................................................9

Question 2.2:........................................................................................................................17

Question 2.3:........................................................................................................................21

Question 2.4:........................................................................................................................21

Question 2.5:........................................................................................................................21

Question 3:...............................................................................................................................24

Question 3.1:........................................................................................................................24

Question 3.2:........................................................................................................................25

Question 3.3:........................................................................................................................27

Question 3.4:........................................................................................................................27

References:...............................................................................................................................29
Question 1.1

No Term Doc

1 data 1,2,3

2 science 1

3 field 1

4 scientific 1

5 method 1,2

6 process 1,2,3

7 algorithm 1

8 system 1,2,3

9 extract 1

10 knowledge 1

11 mining 2

12 discover 2

13 pattern 2

14 large 2

15 involve 2

16 database 2

17 information 3

18 Study 3
19 network 3
20 hardware 3
21 software 3
22 people 3

ID Term Doc1 Doc2 Doc3


1 algorithm 1 0 0
2 data 1 2 1
3 database 0 1 0
4 discover 0 1 0
5 extract 1 0 0
6 field 1 0 0
7 hardware 0 0 1
8 information 0 0 1
9 Involve 0 1 0
10 knowledge 1 0 0
11 Large 0 1 0
12 Method 1 1 0
13 Mining 0 1 0
14 Network 0 0 1
15 Pattern 0 1 0
16 People 0 0 1
17 Process 1 1 1
18 Science 1 0 0
19 scientific 1 0 0
20 Software 0 0 1
21 Study 0 0 1
22 System 1 1 1
Question 1.2:

Term DocID Frequencies


Algorithm 1 1
Data 1 4
Data 2 4
Data 2 4
Data 3 4
Database 2 1
Discover 2 1
Extract 1 1
Field 1 1
Hardwar 3 1
Information 3 1
Involve 2 1
Knowledge 1 1
Large 2 1
Method 1 2
Method 2 2
Mining 2 1
Network 3 1
Pattern 2 1
People 3 1
Process 1 3
Process 2 3
Process 3 3
Science 1 1
Scientific 1 1
Software 3 1
Study 3 1
System 1 3
System 2 3
System 3 3

Question 1.3:

Boolean Query 1:

data AND ( system AND method):

data doc1 doc2 doc3


doc1,
doc2
method doc1 doc2
Boolean Query 2:

data AND method:

data doc1 doc2 doc3


doc1,
doc2
method doc1 doc2

Boolean Query 3:

system OR (process AND data

system doc1 doc2 doc3

process doc1 doc2 doc3 doc1,doc2,doc3

method doc1 doc2 doc3

Question 1.4:

Terms Query Doc1 Doc2 Doc3


algorithm 1
data 1 1 2 1
database 1
discover 1
extract 1
field 1
hardware 1
information 1
involve 1
knowledge 1
large 1
method 1 1 1
mining 1
network 1
pattern 1
people 1
process 1 1 1
science 1
scientific 1

term query document product

  tf df idf tf wf  

is 0 5000 2.3 0 11 0.41 0


to 1 50000 1.3 1.3 00 0 0
data 1 10000 2.0 2.0 11 0.41 0.82
science 1 1000 3.0 3.0 22 0.82 2.46

Question 2:

The selected search engines are:

1. www.google.com.au

2. https://au.yahoo.com/

Question 2.1:

Google:

Yahoo:
From the screenshots it can be seen that both the search engines have given different results

than each other. Only one link is common from there, otherwise each results came are

different. From the selected search engines google is best as the algorithm of google is very

much known to people as well as it is way better than any other search engines. The main

reason behind that is google can provide quality content over pages and links which are well

established unlike yahoo that prefer old as well as well established sites till now. Google offer

relevant results to the users as well as it very much reliable to them (Bult et al., 2019).

Google can provide the features Google instant that is for more faster and quicker results

which can come without pressing enter. However both yahoo and google provide different

benefits and both the engines are having various algorithms and ranking systems. It can be

recommended that from these 2 engines google has given better result than yahoo.

Results for query1:

Result 1: Relevant

URL: https://in.reuters.com/finance

Result 2: Irrelevant
URL: https://in.reuters.com/article/us-global-oil/oil-slips-as-surge-in-virus-cases-
cloud-demand-recovery-idINKBN26J01K

Result 3: Relevant

URL: https://in.reuters.com/finance/markets/us

Result 4: Irrelevant

URL: https://in.reuters.com/quote/.BSESN

Result 5: Relevant

URL: https://www.reuters.com/article/us-usa-education-technology-
idUSN2547885520080707

Result 6: Irrelevant

URL:

https://books.google.com.au/books?
id=0ayMDwAAQBAJ&pg=PA160&lpg=PA160&dq=Online+e
ducation+and+innovation+site+www.reuters.com+technologies+leadershipi&source=bl&ots=te1
gW5rY1-&sig=ACfU3U0_ljTYOQg_Kd4t-
KqDgt2luceogQ&hl=en&sa=X&ved=2ahUKEwjdptL9m-
_pAhWRILcAHcLwA2MQ6AEwAHoECAgQAQ#v=onepage&q=Online%20education%20and
%20in novation%20site%20www.reuters.com%20technologies%20leadershipi&f=false

Result 7: Relevant

URL: https://in.reuters.com/article/a2-milk-company-outlook/update-2-nzs-a2-milk-forecasts-
weaker-revenue-on-disruption-to-chinese-sales-shares-plunge-idINL4N2GO0BN

Result 8: Irrelevant

URL:
https://in.reuters.com/video/technology

Result 9: Irrelevant

URL:

https://in.reuters.com/article/afghanistan-women-education/coal-miners-daughter-comes-top-
in-afghan-university-entrance-exam-idINKCN26G2FI

Result 10: Irrelevant

URL: https://www.reuters.com/innovative-universities-2019
Query 2 in Search Engine 1:

Result 1: Relevant
URL: https://in.reuters.com/article/afghanistan-women-mothers/in-the-name-
of-the-mother-afghan-woman-wins-recognition-sparks-taliban-opposition-
idINKCN26E0VU

Result 2: Relevant

URL: https://www.reuters.com/article/us-education-courses-online/getting-the-most-
out-of-an-online-education-idUSBRE89I17120121019

Result 3: Irrelevant

URL: https://in.reuters.com/video/watch/ginsburg-
through-the-eyes-of-her-former-idRCV008RVZ?
chan=37foz4nc

Result 4: Relevant

URL: https://www.reuters.com/article/us-usa-education-technology-
idUSN2547885520080707
.

Result 5: Irrelevant

URL: https://www.reuters.com/innovative-universities-2019

Result 6: Irrelevant

URL: https://www.reuters.com/article/us-amers-reuters-ranking-innovative-univ/reuters-top-
100-the-worlds-most-innovative-universities-2018-idUSKCN1ML0AZ

Result 7: Irrelevant

URL: https://in.reuters.com/article/us-russia-pollution-nornickel/nornickel-co-owner-says-
thawing-permafrost-one-reason-behind-spill-idINKBN26H0OD

Result 8: Irrelevant

URL:
https://in.reuters.com/article/us-climate-change-protests/worlds-youth-rallies-against-climate-
change-idINKCN26G10M
Result 9: Irrelevant

URL: https://blogs.thomsonreuters.com/legal-uk/2019/12/12/celebrating-innovative-law-
teachers-teaching-law-with-technology-prize-2020/

Result 10: Irrelevant

URL:

https://in.reuters.com/article/us-climate-change-arctic-veteran-activis/inspired-by-thunberg-
veteran-climate-activist-logs-arctic-meltdown-idINKCN26G1LS

Question 2.2:

Search Engine 2

Result 1: Relevant

URL: https://www.reuters.com/article/us-usa-education-online-idUSTRE59047Z20091001

Result 2: Relevant

URL: https://www.reuters.com/article/us-health-coronavirus-education-pearson-

Result 3: Relevant

URL: https://www.reuters.com/article/usa-education-online-idUSN1721594720091001

Result 4: Irrelevant

URL: https://www.reuters.com/article/us-china-edtech-yuanfudao-idUSKBN21I07Y

Result 5: Irrelevant

URL: https://www.reuters.com/article/us-heath-coronavirus-usa-education-idUSKBN21408T
Result 6: Irrelevant

URL: https://www.reuters.com/article/us-health-coronavirus-philippines-educat-
idUSKBN21A0YC

Result 7: Irrelevant

URL: https://uk.reuters.com/article/us-healthcare-coronavirus-technology/children-at-risk-as-
pandemic-pushes-them-online-warns-u-n-agency-idUKKBN22H2EZ
Result 8: Irrelevant

URL: https://www.reuters.com/article/us-amers-reuters-ranking-innovative-univ-
idUSKCN1ML0AZ

Result 9: Irrelevant

URL: https://www.reuters.com/

Result 10: Relevant

URL: https://www.forbes.com/sites/brandonbusteed/2019/03/05/online-education-from-good- to-


better-to-best/#2dc49b646912

Search Engine 2

Result 1: Relevant

URL: https://www.reuters.com/article/us-usa-education-online-idUSTRE59047Z20091001

Result 2: Relevant

URL: https://www.reuters.com/article/us-health-coronavirus-education-pearson-
idUSKBN21D384

Result 3: Irrelevant

URL: https://www.reuters.com/article/us-heath-coronavirus-usa-education-idUSKBN21408T

Result 4: Relevant

URL: https://uk.reuters.com/article/us-healthcare-coronavirus-
technology/children-at-risk-as-pandemic-pushes-them-online-warns-u-n- URL:
agency-idUKKBN22H2EZ https://new
s.cgtn.com
/news/202
0-04-
Result 5: Relevant 25/Increas
ed-demand-motivates-online-education-innovation- Question
PYVdW9pnuo/index.html
2.3:
Result 6: Irrelevant

URL: https://innovation.thomsonreuters.com/en.html

Result 7: Irrelevant

URL: https://www.reuters.com/companies/VVV

Result 8: Relevant

URL: https://www.thetechedvocate.org/edtech-innovation-in-online-
education/

Question
Result 9: Irrelevant
2.4:
URL: https://www.reuters.com/innovative-universities-europe-
2017/methodology

Result 10: Irrelevant

URL: https://in.reuters.com/

Question

2.5:

In the present

world of

Internet the

valuable most
thing is the search engine. The search engines can provide most relevant

search results for their utilization. From the selected search engines google

is best as the algorithm of google is very much known to people as well as

it is way better than any other search engines. The main reason behind that

is google can provide quality content over pages and links which are well

established unlike yahoo that prefer old as well as well established sites till

now (Cochrane et al., 2017). Google offer relevant results to the users as

well as it very much reliable to them. Google can provide the features

Google instant that is for more faster and quicker results which can come

without pressing enter. However both yahoo and google provide different

benefits and both the engines are having various algorithms and ranking

systems. However when it is the matter of SEO this is very much

important that the user will need to concentrate on all search engines.

Search Engine 1:

Hits1-10
Precision 1/1 2/2 2/3 3/4 3/5 3/6 3/7 3/8
Recall 1/12 2/12 2/12 3/12 3/12 3/12 3/12 3/12

Hits11-20
Precision 3/11 4/12 5/13 6/14 6/15 6/16 6/17 6/18
Recall 3/12 4/12 5/12 6/12 6/12 6/12 6/12 6/12
7/19 7/20
7/12 7/12

Search Engine 2:

Hits1-10
Precision 1/1 2/2 2/3 3/4 4/5 4/6 4/7 5/8 5/9 5/10
Recall 1/12 2/12 2/12 3/12 4/12 4/12 4/12 5/12 5/12 5/12

Hits11-20
Precision 5/11 5/12 6/13 6/14 6/15 7/16 7/17 7/18 7/19 7/20
Recall 5/12 5/12 6/12 6/12 6/12 7/12 7/12 7/12 7/12 7/12
Running head: SIT772 Database and Information Retrieval

Question 3:

Question 3.1:

The indexing can make columns faster for making query by creating pointers, in where the

data and information will be stored within the database (Gaulton et al., 2017). If one table has

been ordered alphabetically by searching for a particular thing, which can happen very much

faster as the users can skip looking for the data in several rows and columns. The index is

actually a structure which can hold the field in index is sorting as well as the pointer from

every records to the corresponding records in the original table where the data is stored

actually. The indexes can be utilized in things such as list of contacts where the information

and data will be stored physically for adding the contact information of the people. However

this is very much easier to find the things when the are stored in alphabetical orders (Gordon

et al., 2017). There are clustered indexes which are basically unique indexes per table that

can utilize the primary key for organizing the data which is within the table. The clustered

index can ensure that the primary keys are stored in the increasing order which is also the

order that is hold by the table in memory.

 Clustered indexes do not have to be explicitly declared.

 Created the time when the table has been created.

 Use the primary key which will be sorted in the ascending order.

Example:
22

SIT772 Database and Information Retrieval

Question 3.2:

Doing queries in the relational databases by utilizing the schema cognizant languages such as

SQL as well as querying document collections through typing some arbitrary key words are

the extreme ends of the columns among the unstructured and structured data access. The

index is actually a structure which can hold the field in index is sorting as well as the pointer

from every records to the corresponding records in the original table where the data is stored

actually. The indexes can be utilized in things such as list of contacts where the information

and data will be stored physically for adding the contact information of the people. However

this is very much easier to find the things when the are stored in alphabetical orders (Groom

et al., 2016). Two forces can bridge the extremes. The relation databases have been

increasingly web enabled, they also required to be manipulated and accessed by non experts

who are not ware to know enough about the schema. Second, Web documents are evolving

from flat text files through HTML and SGML to XML, adding mark ups and embedded

schema information.
23

SIT772 Database and Information Retrieval


24

SIT772 Database and Information Retrieval


Question 3.3:

The web search engines are having their ancestors in the information retrieval. To evaluate

those systems recall as well as and precision are actually the most used measures. The search

engines can provide most relevant search results for their utilization. From the selected search

engines google is best as the algorithm of google is very much known to people as well as it

is way better than any other search engines. The main reason behind that is google can

provide quality content over pages and links which are well established unlike yahoo that

prefer old as well as well established sites till now. Google offer relevant results to the users

as well as it very much reliable to them. Google can provide the features Google instant that

is for more faster and quicker results which can come without pressing enter (Oughtred et al.,

2019). Therefore, the researchers can publish the works on the comparison of search engines

utilization precision as the main evaluation measure [Winship95, ChuRos96, ClaWil97,

DingMarch96, LeiSri97], through evaluating only the ranked hits which are highest.

Question 3.4:
25

SIT772 Database and Information Retrieval


The system structure concept design diagram is basically the formal name that has been

provided to the pictorial representation of the system architecture. This is basically a drawing,

map or rending which can visually describe at a very high level the particular of the system in

the questions. Structure diagrams are all diagrams that model a static component of a system

where data changes but not the structures of the elements and their relationships with each

other.

There are several layers in the architecture of the web application. They are:

Users: making requests to the internet servers as well as receivers will responses utilizing the

JavaServer Pages.

Web server: Hosts the various layers of the application that conform with the MVC.

- Presentation layer: the users can make interact with the application with the help of HTTP

requests as well as Reponses which are rendered in a browser (Robinson et al., 2020).

- Data layer: it can handle domain data as well as can provide persistence for the retrieval of

the services for the database.

Database: Where data is retrieved as well as persisted.

Web services: making interaction with the other applications.


26

SIT772 Database and Information Retrieval


References:

Bult, C.J., Blake, J.A., Smith, C.L., Kadin, J.A. and Richardson, J.E., 2019. Mouse genome

database (MGD) 2019. Nucleic acids research, 47(D1), pp.D801-D806.

Cochrane, G., Karsch-Mizrachi, I., Takagi, T. and Sequence Database Collaboration, I.N.,

2016. The international nucleotide sequence database collaboration. Nucleic acids

research, 44(D1), pp.D48-D50.

Gaulton, A., Hersey, A., Nowotka, M., Bento, A.P., Chambers, J., Mendez, D., Mutowo, P.,

Atkinson, F., Bellis, L.J., Cibrián-Uhalte, E. and Davies, M., 2017. The ChEMBL database in

2017. Nucleic acids research, 45(D1), pp.D945-D954.

Gordon, I.E., Rothman, L.S., Hill, C., Kochanov, R.V., Tan, Y., Bernath, P.F., Birk, M.,

Boudon, V., Campargue, A., Chance, K.V. and Drouin, B.J., 2017. The HITRAN2016

molecular spectroscopic database. Journal of Quantitative Spectroscopy and Radiative

Transfer, 203, pp.3-69.

Groom, C.R., Bruno, I.J., Lightfoot, M.P. and Ward, S.C., 2016. The Cambridge structural

database. Acta Crystallographica Section B: Structural Science, Crystal Engineering and

Materials, 72(2), pp.171-179.

Oughtred, R., Stark, C., Breitkreutz, B.J., Rust, J., Boucher, L., Chang, C., Kolas, N.,

O’Donnell, L., Leung, G., McAdam, R. and Zhang, F., 2019. The BioGRID interaction

database: 2019 update. Nucleic acids research, 47(D1), pp.D529-D541.

Robinson, J., Barker, D.J., Georgiou, X., Cooper, M.A., Flicek, P. and Marsh, S.G., 2020.

IPD-IMGT/HLA Database. Nucleic acids research, 48(D1), pp.D948-D955.


27

SIT772 Database and Information Retrieval

You might also like