0% found this document useful (0 votes)

8 views19 pages

BD0 Introduction 2per

The document outlines a course on biological databases, focusing on their structure, querying methods, and practical applications in biological analysis. It covers various database types, programming languages like SQL, R, and Python, and includes assessments and reading materials. The course aims to equip students with the skills to effectively utilize and understand biological databases across different domains.

Uploaded by

jacktnichols02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views19 pages

BD0 Introduction 2per

Uploaded by

jacktnichols02

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

19/09/2023

Introduction to
Biological Databases
Simon Tomlinson

Introduction
Biological databases are organized collections of biological data,
typically accessible by computational means. They are reservoirs
of biological knowledge, generally stable across time and often focused
on a particular biological domain.

• This course is about those databases, how data is stored, accessed

retrieved and converted to form that can be used in biological
analysis.

1
19/09/2023

Course In Detail
Content of the course would be in the areas of :-

• Different types of database from flat files to relational formats

• Searching and querying databases, SQL and XML interchange formats
• Advanced interfaces- web, software, mart, REST etc
• Correctness, database normalisation, performance, versioning etc

A survey of biological databases

• Example databases from 10 different biological domains
• Design principles for each database
• Query and data retrieval using SQL, R and Python
• Advanced topic- eg meta databases, big data

Databases Used on this Course

Databases from 10 different biological domains
• Genomic databases [eg Ensembl]
• Nucleic acid [eg Genbank]
• Pathway and metabolic databases [eg Reactome, KEGG]
• Taxonomic databases [eg Gene Ontology]
• Protein interaction databases [eg BioGrid]
• High-throughput sequencing databases [eg GEO, ArrayExpresss]
• Imaging databases [OMERO]
• Protein and proteomic databases [Swissprot]
• Model organism databases [eg Flybase]
• Genomic feature databases [eg JASPAR]
• Meta databases, large-scale databases and Big-data [eg Intermine]

2
19/09/2023

Programming Languages Used

• Databases can generally be explored through web interfaces, but this
works best for small queries eg single gene queries
• Bioinformaticians, Data Scientists or others performing data analysis
often need to work at a much larger scale
• So we also use programming languages to access databases which
offers the ability to perform more complex and large scale queries
and also directly integrate the results in the environment being used
• Examples with be using SQL, R and Python
• We will also discuss web technologies such as REST which act as interfaces
between the database
• No prior programming experience is required!!!

Reading List
Textbook
• No single textbook covers the whole course content. But SQL will be an important technology and
for this a useful book is
• Learning SQL: Generate, Manipulate, and Retrieve Data, Alan Beaulieu

General Reading
• Thessen, Anne E., and David J. Patterson. "Data issues in the life sciences." ZooKeys 150 (2011): 15.
• Sharma, Parva Kumar, and Inderjit Singh Yadav. "Biological databases and their application."
Bioinformatics. Academic Press, 2022. 17-31.
• Hassani-Pak, Keywan, and Christopher Rawlings. "Knowledge discovery in biological databases for
revealing candidate genes linked to complex phenotypes." Journal of integrative bioinformatics 14.1
(2017).

Specific, weekly, reading lists will be provided for every topic covered.

3
19/09/2023

Assessment
• In-course assessment (50%) and exam (50%)
• This is the second year of this course so there is one past exams.
Example exam questions will be provided later in the course and
there will also be a revision session

• You will have at least four weeks to complete the in-course

assessment. Detailed guidance will be provided in later weeks.

What Will You Learn On This Course?

• You will learn about the different databases
• You will learn about the different designs of these databases and now
to use this knowledge to exploit databases
• You will learn to query biological databases

On this course we try to explore the general concepts of biological

databases using the ten chosen examples. But we do this in such a way
as the skills learned on the course to be adapted to explore the many
thousands of other databases that are also available online

4
19/09/2023

Overlap to Existing ‘Database’ Courses

The focus of this course is very much on biological databases- their common design principles and
how we can extract knowledge given the design. Given this focus, the expected overlap to existing
‘database’ courses will be small, a maximum of 5-10%. A modest amount of course overlap is
desirable as it allows integration of knowledge between courses.

• Introduction to web site and database design for drug discovery (BICH11007, SBS)
Design a database and query it for drug data
• Molecular Modelling and Database Mining (PGBI11023, SBS)
Query a molecular modelling database
General database design
• Using R for Data Science, Functional Genomic Technologies
Querying and accessing data within databases using R
• Bioinformatics Programming and System Management
Building simple databases in SQL, query using Python

General Course Design

• Each week there will be a lecture and a practical session
• In the lecture we will review a topic and then this will be followed by
a practical session on this topic
• There will be a summary of each topic at the end of the topic
• Teaching material will be available on Learn

5
19/09/2023

Setup
• In the Introductory week(s) we will use web examples
• After that we will switch to using Unix and R. Server accounts will be
provided for this, but you can also use your own laptop.
• We will go through setup procedures next week.
• This week we will only use a web browser...

Attendance & Advice

• A register will be taken for attendance each week
• It is important to attend each class, but if some reason you cannot
attend make sure you catch up as soon as possible afterwards
• Learn the course contents as you go along week by week
• Remember that we have 30 hours of taught time but 70 hours set
aside for personal study on this course. Make use of this study time!
• Coursework deadlines can be difficult to manage. Make sure you give
an appropriate amount of time to each piece of coursework. If you
are struggling with coursework, let someone know.

6
19/09/2023

My Contact details
• I am the course organizer and the lecturer
• If you’d like to contact me please use email as I’m often not in my office. Please
put “BD” at the start of your email header for emails. This allows emails relating
to this course can easily be identified.

Dr Simon Tomlinson
Senior Lecturer/Group Leader
Centre for Regenerative Medicine
Institute for Regeneration and Repair
School of Biological Sciences
University of Edinburgh
email: [email protected]

An Example Database -Ensembl

• Available at www.ensembl.org

7
19/09/2023

The Ensembl Genomic Database

• This database is a collection of genomic information

• Basically genomic sequence is used to generate
an assembly
• Annotation is then mapped onto this assembly
• This information can then be queried...

Searching for a Gene

Mouse [An example

organism]

Trp53 [an example gene]

8
19/09/2023

Pick a Match...

Match (click)

Ensembl Record for Trp53 (top)

9
19/09/2023

Lower part

Gene model (the exon intron etc)

Assembly

Click on Any Gene

More detailed information

10
19/09/2023

Ensembl Has Extensive Help Available

Help from here

Seems Simple?
• Ensembl is actually an extremely complicated data resource
• It is probably the most complex data resource we will use on the course
• But the complexity is a bit hidden- you can perform simple queries
relatively easily and help is available
• The problem for us in bioinformatics or data science
• We need to work at genome scale- not look at single genes
• The complexities are very important!!

• Our purpose this week is not to fully understand Ensembl but to use
Ensembl to map out the challenges to understanding any database

11
19/09/2023

First detail- what is Ensembl?

• It is what you obtain when you access www.ensembl.org
• But is the page we access “the Ensembl database”? No!
• The web page is an interface to the underlying database
• So we can say we “accessed Ensembl from www.ensembl.org”? Yes!
• Ensembl is also a project that builds interfaces such as the web page
as well as maintaining the underlying database.

Simplified Ensembl Overall Design

Query Page Result Page Web pages served by the interface

Client side web, HTML, Javascript etc

Ensembl Web Interface Overall web interface/web site

SQL to the database and results back

MySQL This contains all the Ensembl data organized in

Ensembl several MySQL databases. We will return to MySQL
in a later class!

12
19/09/2023

Design is Modular- Adding Other Interfaces

Programming BioMart
Web Interface Direct SQL Interface
Languages

MySQL Ensembl Database

• Not every system offers such a rich set of interfaces, but Ensembl can be accessed in all of these ways
• In this design all the interfaces “see” exactly the same versions of the data stored in the database

Copies, Mirrors, Versions and Archives

• New versions of Ensembl are released periodically- today we are using Ensembl
Ensembl Release 110 (July 2023)
• Releases update the annotation, but also may bring in a new genome assembly
(fragments are assembled into genomic sequence and co-ordinates) and also new
software
• The underlying MySQL database can be copied to different locations and as long as
the versions match, query copies and results will match the main database queries
• Mirrors of the whole Ensembl site are available which duplicate all Emsembl at
another location https://www.ensembl.org/info/about/mirrors.html
• Ensembl has a range of archives-
https://www.ensembl.org/info/website/archives/index.html
These are working copies of earlier releases- so old annotation and software can
still be accessed if required

13
19/09/2023

But What About the Data?

• We cannot explore every possible source of data in Ensembl- this
would take a whole course in itself
• However, the complexity and richness of this resource is what makes
it worth the effort to be able to query in the first place
• So on the course we extract general principles from examples. If we
can understand how the system works for one query, we can make
similar queries. This approach is more powerful if we have some idea
as well as to how the overall system works.

Searching for a Gene -Revisited

• We put in “Mouse” as a species
• But what is “Mouse”?
• Obviously, the species but it
is not an exact scientific species
name.

14
19/09/2023

Ensembl Mouse- Strains & Similar Names

• “Mouse” is a short name for the
precise species name
• There is actually reference strain
CL57B6 that is used for the assembly
• Then other strain annotations are imported
on to the references
• Note that in our search, we would not have
searched “Mouse Limur” as this is not from
the species Mus musculus.

So precise names and their meaning is

very important if you want to get the correct
results!

We searched for the gene “Trp53”

• Trp53 is a standard gene name defined
by the mouse nomenclature committee
(http://www.informatics.jax.org/mgihome/nomen/
• This committee standardized the naming- so one gene
has one single name and only 1 gene has that name
• But in practice old names were still used and so genes
names also have ‘aliases’
• Trp53 is actually the orthologue of the human
cancer gene TP53 which still gets called by the alias p53.
• P53 refers to the protein for both mouse and human
• Note nomenclature is set by different committees in
mouse and human
• Note if you search for the protein/alias name as
gene name you find lots of related genes

15
19/09/2023

So in this simple query...

• I used a gene name that I knew matched the formal gene nomenclature for
mouse (note most labs still call this gene p53)
• I used a simple species name, knowing that it mapped to the
Mus musculus reference strain used by Ensembl (CL57BL6)

• All this seems ‘trivial’ because we could search through the list of matches
and pick out the “correct” one
• But suppose we have 10,000 queries to make automatically, if we want to
get the required results back, we need to use the correct query name
otherwise we risk pulling back the wrong gene information

Stable Identifiers
• You may notice that Ensembl calls the mouse Trp53 gene as
ENSMUSG00000059552. This identifier uniquely identifies this gene in Ensembl.
• This identifier is unique in Ensembl (although it may have different version
numbers)
• In a way we are thinking about this annotation in reverse to Ensembl’s design. In
Ensembl, it takes the assembly and maps annotation to this, creating gene
identifiers. Then these genes are mapped to known genes.
• Identifiers are fixed to “gene” sequences in the genome, but the gene names they
map to might change if knowledge grows or the nomenclature changes.
• So these stable gene IDs (related to accession numbers) are constant between
versions of ensembl and unique within the database. Ensembl has other IDs to
represent proteins or transcripts for example.

16
19/09/2023

Unique Identifiers and SQL queries

Query Page Result Page • Remember that Ensembl is built from a MySQL database
• Accessions in this case can act as SQL primary keys

• So we can uniquely identify records using the Gene ID primary key

• Query something like

Ensembl Web Interface
select * from genetable where GeneID= “ENSMUSG00000059552”

• The gene name is a foreign key in the record with the value of
“Trp53”
• We can use this key to go to the MGI nomenclature to get other
useful information such as aliases
• Of course, we can make these queries ourselves or they can be
MySQL
made through the web interface- but they all work in the same way
Ensembl

Ensembl-Gene Locations
• ENSMUSG00000059552 mapping to the Trp53 gene also has a
location in the genome 11:69471185-69482699:1
• This is on chromosome 11, starting 69471185 and ending 69482699
on the positive chromosomal strand
• Almost all genes can be mapped to the assembly to a unique location
• Other Ensembl feature, promoter, enhancer, gene, transcript etc can
be similarly mapped

chr11
69471185 69482699
*simplified model

17
19/09/2023

Biomart Ensembl Interface

• Biomart offers an interface to the Ensembl database that allows
detailed queries using multiple search terms- so we can search with a
list of genes for example see
http://www.ensembl.org/biomart/martview/

• So I have selected mouse genes &

the latest version of Ensembl
• Filters are used to restrict matches
to a list of IDs
• Attributes are what you’d like back
eg gene names or whatever
• Set the filters & Attributes and then
click the Results button to get a results
file to download and load into a
spreadsheet

Using Biomart to Annotate Genes

• Obtain their Gene name, Ensembl ID, the start, stop, chromosome and if they are protein or RNA
coding
• Compare to my table on the next page-are there any differences and why do you think this is?

ENSMUSG00000047751
ENSMUSG00000074637
ENSMUSG00000024406
ENSMUSG00000055148
Search with these IDs
ENSMUSG00000003032
ENSMUSG00000022346
ENSMUSG00000037169
Trp53
ENSMUSG00000105265

18
19/09/2023

1st Attempt
Ensembl Gene ID chr Gene Start (bp) Gene End (bp) Gene Biotype Gene Name
ENSMUSG00000003032 4 55527143 55532466 protein_coding Klf4
ENSMUSG00000022346 15 61985391 61990374 protein_coding Myc
ENSMUSG00000024406 17 35506018 35510776 protein_coding Pou5f1
ENSMUSG00000037169 12 12936096 12941914 protein_coding Mycn
ENSMUSG00000047751 7 139943789 139945112 protein_coding Utf1
ENSMUSG00000055148 8 72319033 72321656 protein_coding Klf2
ENSMUSG00000059552 11 69580359 69591873 protein_coding Trp53
ENSMUSG00000074637 3 34650005 34652461 protein_coding Sox2

2nd Attempt
Gene stable ID Gene start (bp) Gene end (bp) Chromosome Gene name Gene type
ENSMUSG00000003032 55527143 55532466 4 Klf4 protein_coding
ENSMUSG00000022346 61857240 61862223 15 Myc protein_coding
ENSMUSG00000024406 35816915 35821669 17 Pou5f1 protein_coding
ENSMUSG00000037169 12986094 12991915 12 Mycn protein_coding
ENSMUSG00000047751 139523702 139525025 7 Utf1 protein_coding
ENSMUSG00000055148 73072877 73075500 8 Klf2 protein_coding
ENSMUSG00000059552 69471185 69482699 11 Trp53 protein_coding
ENSMUSG00000074637 34704554 34706610 3 Sox2 protein_coding
ENSMUSG00000105265 34158419 34736768 3 Sox2ot lncRNA

2
80% (5)
2
8 pages
Atoms, Molecules & Elements Gr. 5-8
From Everand
Atoms, Molecules & Elements Gr. 5-8
George Graybill
No ratings yet
Clinical Laboratory Medicine: 2nd Edition
100% (2)
Clinical Laboratory Medicine: 2nd Edition
1,709 pages
MSC - Bioinformatics - Year1 Detailing by Bioinformatics Centre SPPU - 03082023
No ratings yet
MSC - Bioinformatics - Year1 Detailing by Bioinformatics Centre SPPU - 03082023
33 pages
Module 2 (Bioinformatics)
No ratings yet
Module 2 (Bioinformatics)
81 pages
Lecture 1-2 Intro
No ratings yet
Lecture 1-2 Intro
24 pages
Lecture1 BIMM143 Large
No ratings yet
Lecture1 BIMM143 Large
73 pages
BSC Syllabus
No ratings yet
BSC Syllabus
6 pages
1. Databases
No ratings yet
1. Databases
34 pages
FE_BME_400_BI_Week 05_Lec
No ratings yet
FE_BME_400_BI_Week 05_Lec
10 pages
"MBG1002 Biological Databases Week II
No ratings yet
"MBG1002 Biological Databases Week II
37 pages
bioinformatics
No ratings yet
bioinformatics
3 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
34 pages
Biological Databases_May2023
No ratings yet
Biological Databases_May2023
30 pages
WINSEM2021-22 BIY1012 ETH VL2021220501045 Reference Material I 11-01-2022 Ntroduction To Databases
No ratings yet
WINSEM2021-22 BIY1012 ETH VL2021220501045 Reference Material I 11-01-2022 Ntroduction To Databases
42 pages
BIMM143_exam_guidlines
No ratings yet
BIMM143_exam_guidlines
8 pages
Module1 Understanding Bioinformatics
No ratings yet
Module1 Understanding Bioinformatics
28 pages
Bif201 Biological-Databases TH 1.00 Ac16
No ratings yet
Bif201 Biological-Databases TH 1.00 Ac16
1 page
Bioinformatics lecture 1
No ratings yet
Bioinformatics lecture 1
48 pages
BBCS-185
No ratings yet
BBCS-185
126 pages
BIOINFORMATICS (FINAL)
No ratings yet
BIOINFORMATICS (FINAL)
41 pages
120-202 Lab 01 - Fall 2018
No ratings yet
120-202 Lab 01 - Fall 2018
13 pages
Bioinformatics PPT Section B Data Storage and Retrival Group 3
No ratings yet
Bioinformatics PPT Section B Data Storage and Retrival Group 3
36 pages
04 Computer Applications in Pharmacy Full Unit IV
No ratings yet
04 Computer Applications in Pharmacy Full Unit IV
14 pages
Sec1 Introduction to Bioinformatics
No ratings yet
Sec1 Introduction to Bioinformatics
20 pages
Bioinformatics: Nadiya Akmal Binti Baharum (PHD)
100% (1)
Bioinformatics: Nadiya Akmal Binti Baharum (PHD)
54 pages
BIF
No ratings yet
BIF
10 pages
Lecture1-1 525 W16 Large
No ratings yet
Lecture1-1 525 W16 Large
129 pages
Class04- Biological databases - 2022
No ratings yet
Class04- Biological databases - 2022
14 pages
PB Bioinfo L1 2023
No ratings yet
PB Bioinfo L1 2023
21 pages
Aula 1
No ratings yet
Aula 1
27 pages
Day 1
No ratings yet
Day 1
38 pages
Lesson 01 Intro DataBases V2
No ratings yet
Lesson 01 Intro DataBases V2
38 pages
Bioinformatics-Analyst-LSSSDC-Brochure_compressed
No ratings yet
Bioinformatics-Analyst-LSSSDC-Brochure_compressed
27 pages
Bioinfo Course Notes M1 2020 Dr Mbulli
No ratings yet
Bioinfo Course Notes M1 2020 Dr Mbulli
56 pages
Lecture 4 Biological Databases
No ratings yet
Lecture 4 Biological Databases
29 pages
And Applications, Third Edition, BPB Publications, New Delhi
No ratings yet
And Applications, Third Edition, BPB Publications, New Delhi
6 pages
BIOINFOMATICS - Information Sources and Applications
No ratings yet
BIOINFOMATICS - Information Sources and Applications
80 pages
Bioinformatics_Class_12_Presentation_Paragraph
No ratings yet
Bioinformatics_Class_12_Presentation_Paragraph
14 pages
Bioinfo U2 KD 2
No ratings yet
Bioinfo U2 KD 2
3 pages
Exploring Database and Analyzing Protein Sequence
No ratings yet
Exploring Database and Analyzing Protein Sequence
70 pages
Introduction A La Bioinformatique
100% (1)
Introduction A La Bioinformatique
165 pages
Tics - A Brief Introduction
No ratings yet
Tics - A Brief Introduction
4 pages
RAJU
No ratings yet
RAJU
24 pages
Man Chester Tics Brochure
No ratings yet
Man Chester Tics Brochure
6 pages
BMB402_502_Introduction_to_Bioinformatics_Syllabus_2025
No ratings yet
BMB402_502_Introduction_to_Bioinformatics_Syllabus_2025
11 pages
Syllabus 2010
No ratings yet
Syllabus 2010
38 pages
Biological Database ODL
No ratings yet
Biological Database ODL
21 pages
BTY301T Basics in Bioinformatics 14664::vikas Kaushik 3.0 0.0 0.0 3.0 Courses With Conceptual Focus
No ratings yet
BTY301T Basics in Bioinformatics 14664::vikas Kaushik 3.0 0.0 0.0 3.0 Courses With Conceptual Focus
7 pages
BCH 516-1
No ratings yet
BCH 516-1
32 pages
BFG Chapter1 Introduction v03
No ratings yet
BFG Chapter1 Introduction v03
26 pages
BIO310 Lecture-1
No ratings yet
BIO310 Lecture-1
15 pages
Introduction To Bioinformatics Angshuman Bagchi download
No ratings yet
Introduction To Bioinformatics Angshuman Bagchi download
52 pages
06 BT-45 Basics of Computer Applications
No ratings yet
06 BT-45 Basics of Computer Applications
9 pages
Fat Noews Docx (2)
No ratings yet
Fat Noews Docx (2)
2 pages
Open-Elective-III-Year-VI-Semester
No ratings yet
Open-Elective-III-Year-VI-Semester
16 pages
Lec2 Databases
No ratings yet
Lec2 Databases
135 pages
1.Databases
No ratings yet
1.Databases
10 pages
BIOINFORMATICS
100% (1)
BIOINFORMATICS
4 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
15 pages
Bio
No ratings yet
Bio
3 pages
Mastering Java through Biology: A Bioinformatics Project Book
From Everand
Mastering Java through Biology: A Bioinformatics Project Book
Peter Garst
3/5 (2)
2013 Pharmacy Law Review - Recent Cases
No ratings yet
2013 Pharmacy Law Review - Recent Cases
16 pages
Keto Extreme Reviews Is Keto Extreme Fat Burner South Africa Is Really Work
No ratings yet
Keto Extreme Reviews Is Keto Extreme Fat Burner South Africa Is Really Work
2 pages
Fetal Macrosomia Uptodate
100% (1)
Fetal Macrosomia Uptodate
22 pages
Questioniare For The Evaluation of Surveillance Systems
100% (1)
Questioniare For The Evaluation of Surveillance Systems
2 pages
Vascular Emergencies: Susan L. Drinkwater, Vikas A. Pandey, and Alun H. Davies
No ratings yet
Vascular Emergencies: Susan L. Drinkwater, Vikas A. Pandey, and Alun H. Davies
17 pages
Jennings Nursing Leadership Philosophy
No ratings yet
Jennings Nursing Leadership Philosophy
6 pages
Shubitz Family Clinic Info Sheet - 2016
No ratings yet
Shubitz Family Clinic Info Sheet - 2016
2 pages
ZAMPHIA-Final-Report 2.22.19
No ratings yet
ZAMPHIA-Final-Report 2.22.19
316 pages
Molecular Diagnostics Chapter 10 Flashcards Quizlet
No ratings yet
Molecular Diagnostics Chapter 10 Flashcards Quizlet
8 pages
Nurse Practitioner Resume Objective
100% (1)
Nurse Practitioner Resume Objective
5 pages
Sharda Enterprises
No ratings yet
Sharda Enterprises
19 pages
Clinical Nutrition The Interface Between Metabolism, Diet, and Disease
100% (9)
Clinical Nutrition The Interface Between Metabolism, Diet, and Disease
438 pages
Oncology Nurse Communication Barriers To Patient-Centered Care
No ratings yet
Oncology Nurse Communication Barriers To Patient-Centered Care
7 pages
Moclodemide
No ratings yet
Moclodemide
1 page
World Health Organization CHN Final 2
No ratings yet
World Health Organization CHN Final 2
29 pages
Varicella (Chicken Pox) : IAP UG Teaching Slides 2015 16
No ratings yet
Varicella (Chicken Pox) : IAP UG Teaching Slides 2015 16
15 pages
Efloresensi
No ratings yet
Efloresensi
55 pages
Med Surg (Oncology)
No ratings yet
Med Surg (Oncology)
169 pages
2016 NEW FEES DETAILS MD
No ratings yet
2016 NEW FEES DETAILS MD
1 page
Dentate PDI
No ratings yet
Dentate PDI
5 pages
Pneumonia Lobaris Pneumonia
No ratings yet
Pneumonia Lobaris Pneumonia
34 pages
RRL Antibiotic 2 PDF
No ratings yet
RRL Antibiotic 2 PDF
3 pages
An Expert Explanation by Dr. Janet Kukreja and Dr. Ashish Kamat
No ratings yet
An Expert Explanation by Dr. Janet Kukreja and Dr. Ashish Kamat
4 pages
Tech. Report
No ratings yet
Tech. Report
17 pages
Emergency Department
No ratings yet
Emergency Department
2 pages
Devil's Claw Powerpoint Presentation Edited
No ratings yet
Devil's Claw Powerpoint Presentation Edited
17 pages
Pharmacy Gazetted
No ratings yet
Pharmacy Gazetted
57 pages
NCM 113
No ratings yet
NCM 113
4 pages

BD0 Introduction 2per

Uploaded by

BD0 Introduction 2per

Uploaded by

19/09/2023

• This course is about those databases, how data is stored, accessed

• Different types of database from flat files to relational formats

A survey of biological databases

Databases Used on this Course

Programming Languages Used

• You will have at least four weeks to complete the in-course

What Will You Learn On This Course?

On this course we try to explore the general concepts of biological

Overlap to Existing ‘Database’ Courses

General Course Design

Attendance & Advice

An Example Database -Ensembl

The Ensembl Genomic Database

• This database is a collection of genomic information

Searching for a Gene

Mouse [An example

Trp53 [an example gene]

Ensembl Record for Trp53 (top)

Gene model (the exon intron etc)

Click on Any Gene

More detailed information

Ensembl Has Extensive Help Available

Help from here

First detail- what is Ensembl?

Simplified Ensembl Overall Design

Client side web, HTML, Javascript etc

Ensembl Web Interface Overall web interface/web site

SQL to the database and results back

MySQL This contains all the Ensembl data organized in

Design is Modular- Adding Other Interfaces

MySQL Ensembl Database

Copies, Mirrors, Versions and Archives

But What About the Data?

Searching for a Gene -Revisited

Ensembl Mouse- Strains & Similar Names

So precise names and their meaning is

We searched for the gene “Trp53”

So in this simple query...

Unique Identifiers and SQL queries

• So we can uniquely identify records using the Gene ID primary key

• Query something like

Biomart Ensembl Interface

• So I have selected mouse genes &

Using Biomart to Annotate Genes

You might also like