SIT772 Database and Information Retrieval
Name of the Student
Name of the University
Authors note
Table of Contents
Question 1.1...............................................................................................................................3
Question 1.2:..............................................................................................................................6
Question 1.3:..............................................................................................................................6
Question 1.4:..............................................................................................................................8
Question 2:.................................................................................................................................9
Question 2.1:..........................................................................................................................9
Question 2.2:........................................................................................................................17
Question 2.3:........................................................................................................................21
Question 2.4:........................................................................................................................21
Question 2.5:........................................................................................................................21
Question 3:...............................................................................................................................24
Question 3.1:........................................................................................................................24
Question 3.2:........................................................................................................................25
Question 3.3:........................................................................................................................27
Question 3.4:........................................................................................................................27
References:...............................................................................................................................29
Question 1.1
No Term Doc
1 data 1,2,3
2 science 1
3 field 1
4 scientific 1
5 method 1,2
6 process 1,2,3
7 algorithm 1
8 system 1,2,3
9 extract 1
10 knowledge 1
11 mining 2
12 discover 2
13 pattern 2
14 large 2
15 involve 2
16 database 2
17 information 3
18 Study 3
19 network 3
20 hardware 3
21 software 3
22 people 3
ID Term Doc1 Doc2 Doc3
1 algorithm 1 0 0
2 data 1 2 1
3 database 0 1 0
4 discover 0 1 0
5 extract 1 0 0
6 field 1 0 0
7 hardware 0 0 1
8 information 0 0 1
9 Involve 0 1 0
10 knowledge 1 0 0
11 Large 0 1 0
12 Method 1 1 0
13 Mining 0 1 0
14 Network 0 0 1
15 Pattern 0 1 0
16 People 0 0 1
17 Process 1 1 1
18 Science 1 0 0
19 scientific 1 0 0
20 Software 0 0 1
21 Study 0 0 1
22 System 1 1 1
Question 1.2:
Term DocID Frequencies
Algorithm 1 1
Data 1 4
Data 2 4
Data 2 4
Data 3 4
Database 2 1
Discover 2 1
Extract 1 1
Field 1 1
Hardwar 3 1
Information 3 1
Involve 2 1
Knowledge 1 1
Large 2 1
Method 1 2
Method 2 2
Mining 2 1
Network 3 1
Pattern 2 1
People 3 1
Process 1 3
Process 2 3
Process 3 3
Science 1 1
Scientific 1 1
Software 3 1
Study 3 1
System 1 3
System 2 3
System 3 3
Question 1.3:
Boolean Query 1:
data AND ( system AND method):
data doc1 doc2 doc3
doc1,
doc2
method doc1 doc2
Boolean Query 2:
data AND method:
data doc1 doc2 doc3
doc1,
doc2
method doc1 doc2
Boolean Query 3:
system OR (process AND data
system doc1 doc2 doc3
process doc1 doc2 doc3 doc1,doc2,doc3
method doc1 doc2 doc3
Question 1.4:
Terms Query Doc1 Doc2 Doc3
algorithm 1
data 1 1 2 1
database 1
discover 1
extract 1
field 1
hardware 1
information 1
involve 1
knowledge 1
large 1
method 1 1 1
mining 1
network 1
pattern 1
people 1
process 1 1 1
science 1
scientific 1
term query document product
tf df idf tf wf
is 0 5000 2.3 0 11 0.41 0
to 1 50000 1.3 1.3 00 0 0
data 1 10000 2.0 2.0 11 0.41 0.82
science 1 1000 3.0 3.0 22 0.82 2.46
Question 2:
The selected search engines are:
1. www.google.com.au
2. https://au.yahoo.com/
Question 2.1:
Google:
Yahoo:
From the screenshots it can be seen that both the search engines have given different results
than each other. Only one link is common from there, otherwise each results came are
different. From the selected search engines google is best as the algorithm of google is very
much known to people as well as it is way better than any other search engines. The main
reason behind that is google can provide quality content over pages and links which are well
established unlike yahoo that prefer old as well as well established sites till now. Google offer
relevant results to the users as well as it very much reliable to them (Bult et al., 2019).
Google can provide the features Google instant that is for more faster and quicker results
which can come without pressing enter. However both yahoo and google provide different
benefits and both the engines are having various algorithms and ranking systems. It can be
recommended that from these 2 engines google has given better result than yahoo.
Results for query1:
Result 1: Relevant
URL: https://in.reuters.com/finance
Result 2: Irrelevant
URL: https://in.reuters.com/article/us-global-oil/oil-slips-as-surge-in-virus-cases-
cloud-demand-recovery-idINKBN26J01K
Result 3: Relevant
URL: https://in.reuters.com/finance/markets/us
Result 4: Irrelevant
URL: https://in.reuters.com/quote/.BSESN
Result 5: Relevant
URL: https://www.reuters.com/article/us-usa-education-technology-
idUSN2547885520080707
Result 6: Irrelevant
URL:
https://books.google.com.au/books?
id=0ayMDwAAQBAJ&pg=PA160&lpg=PA160&dq=Online+e
ducation+and+innovation+site+www.reuters.com+technologies+leadershipi&source=bl&ots=te1
gW5rY1-&sig=ACfU3U0_ljTYOQg_Kd4t-
KqDgt2luceogQ&hl=en&sa=X&ved=2ahUKEwjdptL9m-
_pAhWRILcAHcLwA2MQ6AEwAHoECAgQAQ#v=onepage&q=Online%20education%20and
%20in novation%20site%20www.reuters.com%20technologies%20leadershipi&f=false
Result 7: Relevant
URL: https://in.reuters.com/article/a2-milk-company-outlook/update-2-nzs-a2-milk-forecasts-
weaker-revenue-on-disruption-to-chinese-sales-shares-plunge-idINL4N2GO0BN
Result 8: Irrelevant
URL:
https://in.reuters.com/video/technology
Result 9: Irrelevant
URL:
https://in.reuters.com/article/afghanistan-women-education/coal-miners-daughter-comes-top-
in-afghan-university-entrance-exam-idINKCN26G2FI
Result 10: Irrelevant
URL: https://www.reuters.com/innovative-universities-2019
Query 2 in Search Engine 1:
Result 1: Relevant
URL: https://in.reuters.com/article/afghanistan-women-mothers/in-the-name-
of-the-mother-afghan-woman-wins-recognition-sparks-taliban-opposition-
idINKCN26E0VU
Result 2: Relevant
URL: https://www.reuters.com/article/us-education-courses-online/getting-the-most-
out-of-an-online-education-idUSBRE89I17120121019
Result 3: Irrelevant
URL: https://in.reuters.com/video/watch/ginsburg-
through-the-eyes-of-her-former-idRCV008RVZ?
chan=37foz4nc
Result 4: Relevant
URL: https://www.reuters.com/article/us-usa-education-technology-
idUSN2547885520080707
.
Result 5: Irrelevant
URL: https://www.reuters.com/innovative-universities-2019
Result 6: Irrelevant
URL: https://www.reuters.com/article/us-amers-reuters-ranking-innovative-univ/reuters-top-
100-the-worlds-most-innovative-universities-2018-idUSKCN1ML0AZ
Result 7: Irrelevant
URL: https://in.reuters.com/article/us-russia-pollution-nornickel/nornickel-co-owner-says-
thawing-permafrost-one-reason-behind-spill-idINKBN26H0OD
Result 8: Irrelevant
URL:
https://in.reuters.com/article/us-climate-change-protests/worlds-youth-rallies-against-climate-
change-idINKCN26G10M
Result 9: Irrelevant
URL: https://blogs.thomsonreuters.com/legal-uk/2019/12/12/celebrating-innovative-law-
teachers-teaching-law-with-technology-prize-2020/
Result 10: Irrelevant
URL:
https://in.reuters.com/article/us-climate-change-arctic-veteran-activis/inspired-by-thunberg-
veteran-climate-activist-logs-arctic-meltdown-idINKCN26G1LS
Question 2.2:
Search Engine 2
Result 1: Relevant
URL: https://www.reuters.com/article/us-usa-education-online-idUSTRE59047Z20091001
Result 2: Relevant
URL: https://www.reuters.com/article/us-health-coronavirus-education-pearson-
Result 3: Relevant
URL: https://www.reuters.com/article/usa-education-online-idUSN1721594720091001
Result 4: Irrelevant
URL: https://www.reuters.com/article/us-china-edtech-yuanfudao-idUSKBN21I07Y
Result 5: Irrelevant
URL: https://www.reuters.com/article/us-heath-coronavirus-usa-education-idUSKBN21408T
Result 6: Irrelevant
URL: https://www.reuters.com/article/us-health-coronavirus-philippines-educat-
idUSKBN21A0YC
Result 7: Irrelevant
URL: https://uk.reuters.com/article/us-healthcare-coronavirus-technology/children-at-risk-as-
pandemic-pushes-them-online-warns-u-n-agency-idUKKBN22H2EZ
Result 8: Irrelevant
URL: https://www.reuters.com/article/us-amers-reuters-ranking-innovative-univ-
idUSKCN1ML0AZ
Result 9: Irrelevant
URL: https://www.reuters.com/
Result 10: Relevant
URL: https://www.forbes.com/sites/brandonbusteed/2019/03/05/online-education-from-good- to-
better-to-best/#2dc49b646912
Search Engine 2
Result 1: Relevant
URL: https://www.reuters.com/article/us-usa-education-online-idUSTRE59047Z20091001
Result 2: Relevant
URL: https://www.reuters.com/article/us-health-coronavirus-education-pearson-
idUSKBN21D384
Result 3: Irrelevant
URL: https://www.reuters.com/article/us-heath-coronavirus-usa-education-idUSKBN21408T
Result 4: Relevant
URL: https://uk.reuters.com/article/us-healthcare-coronavirus-
technology/children-at-risk-as-pandemic-pushes-them-online-warns-u-n- URL:
agency-idUKKBN22H2EZ https://new
s.cgtn.com
/news/202
0-04-
Result 5: Relevant 25/Increas
ed-demand-motivates-online-education-innovation- Question
PYVdW9pnuo/index.html
2.3:
Result 6: Irrelevant
URL: https://innovation.thomsonreuters.com/en.html
Result 7: Irrelevant
URL: https://www.reuters.com/companies/VVV
Result 8: Relevant
URL: https://www.thetechedvocate.org/edtech-innovation-in-online-
education/
Question
Result 9: Irrelevant
2.4:
URL: https://www.reuters.com/innovative-universities-europe-
2017/methodology
Result 10: Irrelevant
URL: https://in.reuters.com/
Question
2.5:
In the present
world of
Internet the
valuable most
thing is the search engine. The search engines can provide most relevant
search results for their utilization. From the selected search engines google
is best as the algorithm of google is very much known to people as well as
it is way better than any other search engines. The main reason behind that
is google can provide quality content over pages and links which are well
established unlike yahoo that prefer old as well as well established sites till
now (Cochrane et al., 2017). Google offer relevant results to the users as
well as it very much reliable to them. Google can provide the features
Google instant that is for more faster and quicker results which can come
without pressing enter. However both yahoo and google provide different
benefits and both the engines are having various algorithms and ranking
systems. However when it is the matter of SEO this is very much
important that the user will need to concentrate on all search engines.
Search Engine 1:
Hits1-10
Precision 1/1 2/2 2/3 3/4 3/5 3/6 3/7 3/8
Recall 1/12 2/12 2/12 3/12 3/12 3/12 3/12 3/12
Hits11-20
Precision 3/11 4/12 5/13 6/14 6/15 6/16 6/17 6/18
Recall 3/12 4/12 5/12 6/12 6/12 6/12 6/12 6/12
7/19 7/20
7/12 7/12
Search Engine 2:
Hits1-10
Precision 1/1 2/2 2/3 3/4 4/5 4/6 4/7 5/8 5/9 5/10
Recall 1/12 2/12 2/12 3/12 4/12 4/12 4/12 5/12 5/12 5/12
Hits11-20
Precision 5/11 5/12 6/13 6/14 6/15 7/16 7/17 7/18 7/19 7/20
Recall 5/12 5/12 6/12 6/12 6/12 7/12 7/12 7/12 7/12 7/12
Running head: SIT772 Database and Information Retrieval
Question 3:
Question 3.1:
The indexing can make columns faster for making query by creating pointers, in where the
data and information will be stored within the database (Gaulton et al., 2017). If one table has
been ordered alphabetically by searching for a particular thing, which can happen very much
faster as the users can skip looking for the data in several rows and columns. The index is
actually a structure which can hold the field in index is sorting as well as the pointer from
every records to the corresponding records in the original table where the data is stored
actually. The indexes can be utilized in things such as list of contacts where the information
and data will be stored physically for adding the contact information of the people. However
this is very much easier to find the things when the are stored in alphabetical orders (Gordon
et al., 2017). There are clustered indexes which are basically unique indexes per table that
can utilize the primary key for organizing the data which is within the table. The clustered
index can ensure that the primary keys are stored in the increasing order which is also the
order that is hold by the table in memory.
Clustered indexes do not have to be explicitly declared.
Created the time when the table has been created.
Use the primary key which will be sorted in the ascending order.
Example:
22
SIT772 Database and Information Retrieval
Question 3.2:
Doing queries in the relational databases by utilizing the schema cognizant languages such as
SQL as well as querying document collections through typing some arbitrary key words are
the extreme ends of the columns among the unstructured and structured data access. The
index is actually a structure which can hold the field in index is sorting as well as the pointer
from every records to the corresponding records in the original table where the data is stored
actually. The indexes can be utilized in things such as list of contacts where the information
and data will be stored physically for adding the contact information of the people. However
this is very much easier to find the things when the are stored in alphabetical orders (Groom
et al., 2016). Two forces can bridge the extremes. The relation databases have been
increasingly web enabled, they also required to be manipulated and accessed by non experts
who are not ware to know enough about the schema. Second, Web documents are evolving
from flat text files through HTML and SGML to XML, adding mark ups and embedded
schema information.
23
SIT772 Database and Information Retrieval
24
SIT772 Database and Information Retrieval
Question 3.3:
The web search engines are having their ancestors in the information retrieval. To evaluate
those systems recall as well as and precision are actually the most used measures. The search
engines can provide most relevant search results for their utilization. From the selected search
engines google is best as the algorithm of google is very much known to people as well as it
is way better than any other search engines. The main reason behind that is google can
provide quality content over pages and links which are well established unlike yahoo that
prefer old as well as well established sites till now. Google offer relevant results to the users
as well as it very much reliable to them. Google can provide the features Google instant that
is for more faster and quicker results which can come without pressing enter (Oughtred et al.,
2019). Therefore, the researchers can publish the works on the comparison of search engines
utilization precision as the main evaluation measure [Winship95, ChuRos96, ClaWil97,
DingMarch96, LeiSri97], through evaluating only the ranked hits which are highest.
Question 3.4:
25
SIT772 Database and Information Retrieval
The system structure concept design diagram is basically the formal name that has been
provided to the pictorial representation of the system architecture. This is basically a drawing,
map or rending which can visually describe at a very high level the particular of the system in
the questions. Structure diagrams are all diagrams that model a static component of a system
where data changes but not the structures of the elements and their relationships with each
other.
There are several layers in the architecture of the web application. They are:
Users: making requests to the internet servers as well as receivers will responses utilizing the
JavaServer Pages.
Web server: Hosts the various layers of the application that conform with the MVC.
- Presentation layer: the users can make interact with the application with the help of HTTP
requests as well as Reponses which are rendered in a browser (Robinson et al., 2020).
- Data layer: it can handle domain data as well as can provide persistence for the retrieval of
the services for the database.
Database: Where data is retrieved as well as persisted.
Web services: making interaction with the other applications.
26
SIT772 Database and Information Retrieval
References:
Bult, C.J., Blake, J.A., Smith, C.L., Kadin, J.A. and Richardson, J.E., 2019. Mouse genome
database (MGD) 2019. Nucleic acids research, 47(D1), pp.D801-D806.
Cochrane, G., Karsch-Mizrachi, I., Takagi, T. and Sequence Database Collaboration, I.N.,
2016. The international nucleotide sequence database collaboration. Nucleic acids
research, 44(D1), pp.D48-D50.
Gaulton, A., Hersey, A., Nowotka, M., Bento, A.P., Chambers, J., Mendez, D., Mutowo, P.,
Atkinson, F., Bellis, L.J., Cibrián-Uhalte, E. and Davies, M., 2017. The ChEMBL database in
2017. Nucleic acids research, 45(D1), pp.D945-D954.
Gordon, I.E., Rothman, L.S., Hill, C., Kochanov, R.V., Tan, Y., Bernath, P.F., Birk, M.,
Boudon, V., Campargue, A., Chance, K.V. and Drouin, B.J., 2017. The HITRAN2016
molecular spectroscopic database. Journal of Quantitative Spectroscopy and Radiative
Transfer, 203, pp.3-69.
Groom, C.R., Bruno, I.J., Lightfoot, M.P. and Ward, S.C., 2016. The Cambridge structural
database. Acta Crystallographica Section B: Structural Science, Crystal Engineering and
Materials, 72(2), pp.171-179.
Oughtred, R., Stark, C., Breitkreutz, B.J., Rust, J., Boucher, L., Chang, C., Kolas, N.,
O’Donnell, L., Leung, G., McAdam, R. and Zhang, F., 2019. The BioGRID interaction
database: 2019 update. Nucleic acids research, 47(D1), pp.D529-D541.
Robinson, J., Barker, D.J., Georgiou, X., Cooper, M.A., Flicek, P. and Marsh, S.G., 2020.
IPD-IMGT/HLA Database. Nucleic acids research, 48(D1), pp.D948-D955.
27
SIT772 Database and Information Retrieval