0% found this document useful (0 votes)

5 views15 pages

L01 Solved

Uploaded by

maria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views15 pages

L01 Solved

Uploaded by

maria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Computational Biology

Introduction to Computational Biology

2024/2025

LAB#1 – Biological Databases

The goal of this Practical Lesson is to introduce the different types of biological information
present in several resources and databases, namely (A) NCBI, (B) GenBank, (C) UniProt, (D)
BRENDA, and (E) KEGG.
These exercises should help you to browse the available information and guide you through the
main procedure. You are invited to further explore them and check additional features.

Discussion and Reflection:

• Students should discuss how these databases complement each other.
• Students should reflect on how these resources might be used in future computational
biology research.

NCBI
NCBI (National Center for Biotechnology Information) at https://www.ncbi.nlm.nih.gov/
offers a wide range of resources that are crucial for research in bioinformatics, genomics, and
other areas of computational biology. These resources collectively support a wide array of
bioinformatics tasks, from literature searches to complex genomic analyses.

Here is a brief overview of some key databases:

1. GenBank:
o Purpose: A comprehensive genetic sequence database and an annotated
collection of all publicly available DNA sequences. GenBank is part of the
International Nucleotide Sequence Database Collaboration, which comprises
the DNA DataBank of Japan (DDBJ), the European Molecular Biology
Laboratory (EMBL), and GenBank at NCBI.
o Use: Widely used by researchers for genetic analysis and comparison.
2. Gene:
o Purpose: A database focusing on genomes that have been completely
sequenced and that have an active research community to contribute gene-
specific data.
o Use: Explore gene-specific data, such as sequences, phenotypes, pathways, and
related genetic disorders.
3. Nucleotide:
o Purpose: A repository of nucleotide sequences from a variety of sources,
including GenBank, RefSeq, and others.
o Use: Retrieve DNA and RNA sequences for genes, genomes, and transcripts
across different species.
4. Protein:
o Purpose: A database that includes protein sequence records from a variety of
sources, including GenPept, RefSeq, Swiss-Prot, PIR, PRF, and PDB.

1/15
o Use: Access protein sequences, domain structures, functional annotations, and
links to related literature.
5. SNP (Single Nucleotide Polymorphism):
o Purpose: A database of genetic variation, including single nucleotide
polymorphisms and other variations.
o Use: Study genetic diversity, associations with diseases, and population
genetics.
6. dbGaP (Database of Genotypes and Phenotypes):
o Purpose: Contains data from studies that investigate the interaction between
genotypes and phenotypes in humans.
o Use: Access data for genetic association studies, including clinical information
and genotypic data.
7. Taxonomy:
o Purpose: A database that provides information on the classification and
nomenclature of organisms.
Use: Explore taxonomic data for various species, including evolutionary
o
relationships and hierarchy.
8. OMIM (Online Mendelian Inheritance in Man):
o Purpose: A catalog of human genes and genetic disorders.
o Use: Research information on the relationship between genes and diseases,
particularly for Mendelian disorders.
9. PubMed:
o Purpose: A comprehensive database of biomedical literature, including
articles from life sciences journals and online books.
o Use: Search for scientific papers, review articles, and clinical studies relevant
to a particular topic or gene.
10. Bookshelf:
o Purpose: A collection of freely accessible, full-text books and documents in
life science and healthcare.
o Use: Reference textbooks, reports, and guidelines for in-depth understanding
of biological concepts.

A Key tool:
1. BLAST (Basic Local Alignment Search Tool):
o Purpose: A tool for comparing nucleotide or protein sequences against
databases to find regions of similarity.
o Use: Identify homologous sequences, annotate genes, and find evolutionary
relationships.

2/15
Genomes
GenBank database is available at https://www.ncbi.nlm.nih.gov/genbank/ hosted by the
National Center for Biotechnology Information (NCBI), which also contains other relevant
databases.
I) GenBank data
Search for one specific ID, e.g., AB001981:
a. How many genes are contained in this entry?
b. For which organism?
c. What information is available in sections HEADER and FEATURES?

HEADER contains general information of the entry such as the organisms and publications;
FEATURES provides specific description of the DNA sequence, for example, CDS (Coding
DNA Sequence); and ORIGIN contains the actual nucleotide sequence.

ORGANISM Columba livia Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;

Euteleostomi; Archosauria; Dinosauria; Saurischia; Theropoda; Coelurosauria; Aves;
Neognathae; Columbiformes; Columbidae; Columba.
There are two CDS, so, two genes: alpha-A and alpha-D globin genes.
Click on the first CDS, which always starts by one START codon and end with a STOP condon
(see genetic code). What happens? Which interval is highlighted? Check the intron-exon (what
are these?) structure.

3/15
Source: Wikipedia.org

Click in “FASTA” at the top of the page. What is the structure of this file?

4/15
What happen to the genes? Which part of the original entry was converted?
The block ORIGIN was converted to FASTA format; we can no longer identify the gene
positions without further external information.
In the previous page, click in Send to:

Save the FASTA file to your desktop (or all the file if you chose format GenBank). Hint: change
the filename to AB001981.fasta.
Open with any text editor (be aware of possible problems with line and paragraph breaks!)
Finally, click on Graphics for an interactive visualization of the CDS. Explore the available
features such as zoom, and links for external information and tools.

5/15
II) GenBank search
Search entries that contain the terms “human” and “insulin”
How many results do you obtain? (Solution: 18111) Check the retrieved entries: do they all
correspond to human sequences? Or insulin? Give examples.
By default, all the terms are searched appearing in any entry. Solution for efficienty filtering
the results: use Advanced Search.

How many entries do you obtain now? (Solution: 5548)

Try to change the Field corresponding to “insulin”, e.g., “Keyword”, “Protein Name”, etc.
Which entry corresponds to the human insulin?

6/15
In FEATURES we can check additional information:
• SOURCE: /map="11p15.5" indicates that the sequence belongs to chromosome 11 and,
more precisely, in the short arm (p), within region/band 15 and sub-band 5.

https://ghr.nlm.nih.gov/primer/howgeneswork/genelocation

We can use directly the field codes in our “Search Builder”, available at:
https://www.ncbi.nlm.nih.gov/entrez/query/static/help/Summary_Matrices.html#Sear
ch_Fields_and_Qualifiers

7/15
We can combine logical operators NOT, AND, OR (Venn diagrams), but be careful not to
exclude potentially relevant hits.

III) Additional exercises

Find the following genes
1. Insulin of the Rat and of the Mouse
Hint: (rat [organism] OR mouse [organism]) AND insulin [keyword]
Select the following 4 genes and save the files:
1. Rat insulin-I (ins-1) gene
5,425 bp linear DNA
J00747.1 GI:204956
2. Rat insulin II gene (ins-2) with two introns
2,852 bp linear DNA
J00748.1 GI:204958
3. Mouse preproinsulin gene I
1,384 bp linear DNA
X04725.1 GI:52712
4. Mouse preproinsulin gene II
2,408 bp linear DNA
X04724.1 GI:52714

2. alpha-globin of organism Capra hircus

Hint: "capra hircus" [organism] AND "alpha globin" [title]
HBAI e HBAII

3. alpha-globin of all ruminants

ORGANISM Capra hircus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia; Eutheria; Laurasiatheria; Cetartiodactyla; Ruminantia; Pecora; Bovidae;
Caprinae; Capra.
Hint: Ruminantia [organism] AND “alpha globin” [title]
16 entries

4. Normal p53
Hint: p53 [protein name] AND human [organism]
Still too many? Add … NOT isolate [title]
Find only complete sequences: … AND complete [title]
Human p53 (TP53) gene, complete cds
20,303 bp linear DNA
Accession: U94788.1 GI: 3041866
Why is this gene important?
Note – You can also explore the Taxonomy Browser by clicking on the species.

V) External tools
Use http://www.bioinformatics.org/sms2/ to manipulate the previous data entry saved in the
File. Explore entries such as: Genbank to FASTA; Genbank Feature Extractor; Genbank Trans
Extractor.

8/15
Proteins and Metabolism
This part will guide you through the databases UniProt (http://www.uniprot.org/), BRENDA
(http://www.brenda-enzymes.info/), and KEGG (http://www.genome.jp/kegg/) that allow
searching for proteins (including enzymes) and metabolic pathways.

I) UniProt database
We will work with the UniProt Knowledge-base (UnitProtKB), in particular with Swiss-Prot,
whose entries are annotated and curated manually.
Search for “human insulin”. How many entries do you find? How many correspond to Swiss-
Prot (“show only reviewed”)? Which one is the “correct” one?
Like before, we should limit our search:
Restrict term “human” to organism; restrict term “insulin” to protein name (Reduces to
~203 entries)
In the “Query” box, exclude terms such as insulin-like and receptors using the Boolean search
as before: (protein_name:insulin) AND (organism_id:9606) NOT
(protein_name:receptor) NOT (protein_name:insulin-like) AND (reviewed:true)
(further reduces to ~16 entries).
Click on P01308 and explore that entry.
How many references do you find? Why is this protein extensively studied?
In section Function, see Gene Ontology (GO) annotations, in particular the table with entries
Molecular Function and Biological process. See also Subcellular location.
Where do you find this protein in the cell? Is there any relationship with its function?

Click on the positions 1-24: that region will appear highlighted:

Insulin is constituted by two chains (A and B), the other peptides are cut/eliminated.
There is also information about the protein structures: Helix (alpha-helix), Strand (beta-sheet)
and Turn (check by clicking in the structure to highlight and by moving the mouse).

9/15
Also provides links to other databases (Cross-references) such as GenBank (e.g. click on
J00265) and PDBe (e.g. click 1B9E)
Advanced Search
Find proteins secreted by cells, with experimental confidence level and further limiting your
searches:

(or)
(cc_scl_term_exp:SL-0243) AND (existence:1) AND (length:[1 TO 80]) AND
(fragment:false) AND (organism_id:9606)
Download results as FASTA formatted files.
>sp|A0A0C5B5G6|MOTSC_HUMAN Mitochondrial-derived peptide MOTS-c OS=Homo sapiens OX=9606 GN=MT-RNR1 PE=1 SV=1
MRWQEMGYIFYPRKLR
>sp|O15263|DFB4A_HUMAN Defensin beta 4A OS=Homo sapiens OX=9606 GN=DEFB4A PE=1 SV=1
MRVLYLLFSFLFIFLMPLPGVFGGIGDPVTCLKSGAICHPVFCPRRYKQIGTCGLPGTKC
CKKP
>sp|P00995|ISK1_HUMAN Serine protease inhibitor Kazal-type 1 OS=Homo sapiens OX=9606 GN=SPINK1 PE=1 SV=2
MKVTGIFLLSALALLSLSGNTGADSLGREAKCYNELNGCTKIYDPVCGTDGNTYPNECVL
CFENRKRQTSILIQKSGPC
>sp|P02808|STAT_HUMAN Statherin OS=Homo sapiens OX=9606 GN=STATH PE=1 SV=2
MKFLVFAFILALMVSMIGADSSEEKFLRRIGRFGYGYGPYQPVPEQPLYPQPYQPQYQQY
TF
>sp|P02814|SMR3B_HUMAN Submaxillary gland androgen-regulated protein 3B OS=Homo sapiens OX=9606 GN=SMR3B PE=1 SV=2
MKSLTWILGLWALAACFTPGESQRGPRGPYPPGPLAPPQPFGPGFVPPPPPPPYGPGRIP
PPPPAPYGPGIFPPPPPQP
>sp|P0DMC3|ELA_HUMAN Apelin receptor early endogenous ligand OS=Homo sapiens OX=9606 GN=APELA PE=1 SV=1
MRFQQFLFAFFIFIMSLLLISGQRPVNLTMRRKLRKHNCLQRRCMPLHSRVPFP
(…)

10/15
II) BRENDA
Search pyruvate decarboxylase.
Click on EC number (format EC A.B.C.D with hierarchical classification) and the symbols
https://www.brenda-enzymes.org/enzyme.php?ecno=4.1.1.1

Search by EC-1.2.3.4
Explore full hierarchy through Homeà Explorer à Enzyme Classification

11/15
Search for 2.7.1.1 – what type of enzymes have you found and what is their role? (Hint:
hexokinase)

III) KEGG database

Search “glycolysis” at http://www.genome.jp/kegg/
In KEGG PATHWAY the full Description and metabolic network (Pathway Map) appears –
click and chose organism (change pathway type button), e.g. Homo sapiens:

12/15
Each reaction has information about the corresponding enzyme(s), number in the edges, for
example 2.7.1.40:

13/15
…and also about the corresponding metabolites, for example Phosphoenolpyruvate:

In http://www.genome.jp/kegg/pathway.html click on 6. Human Diseases.

In colorectal cancer, the full interaction network is described:

14/15
Pathway entry has links to Disease:
Also available information about drugs in 7. Drug Development, for example, Penicillins
(http://www.genome.jp/kegg/pathway/map/map07011.html):

15/15

Bioinfo Lab Final
No ratings yet
Bioinfo Lab Final
49 pages
BCH 516-1
No ratings yet
BCH 516-1
32 pages
Bif501 Handouts PDF Bif
No ratings yet
Bif501 Handouts PDF Bif
197 pages
Manual
No ratings yet
Manual
68 pages
University of Okara: Name: Topic: Subject: Semester: Department
No ratings yet
University of Okara: Name: Topic: Subject: Semester: Department
29 pages
Bioinformatics Lab Guide
No ratings yet
Bioinformatics Lab Guide
14 pages
Bioinformatics: Intended Learning Outcomes
No ratings yet
Bioinformatics: Intended Learning Outcomes
9 pages
Bioinformatics
No ratings yet
Bioinformatics
55 pages
Lecture 3 Database
No ratings yet
Lecture 3 Database
81 pages
Bioinformatics Database Basics
No ratings yet
Bioinformatics Database Basics
18 pages
Bioinfi U3 Part - 1
No ratings yet
Bioinfi U3 Part - 1
4 pages
Module 2 (Bioinformatics)
No ratings yet
Module 2 (Bioinformatics)
81 pages
LO4 Access To Sequenced Data and Related Information
No ratings yet
LO4 Access To Sequenced Data and Related Information
11 pages
Bioinformatics and Functional Genomics 1st Edition Jonathan Pevsner Instant Download
100% (1)
Bioinformatics and Functional Genomics 1st Edition Jonathan Pevsner Instant Download
59 pages
Intro to Bioinformatics Course
No ratings yet
Intro to Bioinformatics Course
104 pages
Sec1 Introduction To Bioinformatics
No ratings yet
Sec1 Introduction To Bioinformatics
20 pages
4 Bioinformaticsdatabases
No ratings yet
4 Bioinformaticsdatabases
71 pages
Pharmacogenomics 002A Kashyap MK 06-09-2020
No ratings yet
Pharmacogenomics 002A Kashyap MK 06-09-2020
93 pages
Introduction To Databases
No ratings yet
Introduction To Databases
29 pages
Zoya Bioinformatics Assignment
No ratings yet
Zoya Bioinformatics Assignment
36 pages
Database
No ratings yet
Database
16 pages
Computational Biology B.Tech - Biotech (Vith Semester)
No ratings yet
Computational Biology B.Tech - Biotech (Vith Semester)
34 pages
120-202 Lab 01 - Fall 2018
No ratings yet
120-202 Lab 01 - Fall 2018
13 pages
Bio in For Matics
No ratings yet
Bio in For Matics
26 pages
BBL434 Lab1
No ratings yet
BBL434 Lab1
4 pages
Latthika
No ratings yet
Latthika
21 pages
CH12
No ratings yet
CH12
8 pages
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
No ratings yet
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
75 pages
Lab 1
No ratings yet
Lab 1
39 pages
Bioinformatics
100% (2)
Bioinformatics
104 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
61 pages
Bio PPT
No ratings yet
Bio PPT
35 pages
Bioinformatics PPT Section B Data Storage and Retrival Group 3
No ratings yet
Bioinformatics PPT Section B Data Storage and Retrival Group 3
36 pages
Genomics & Proteomics
No ratings yet
Genomics & Proteomics
22 pages
8024 Bio Info
No ratings yet
8024 Bio Info
28 pages
BI W2 Ex Ans
No ratings yet
BI W2 Ex Ans
9 pages
Lecture3 4
No ratings yet
Lecture3 4
73 pages
Coursera 14b Unit 1-Ncbi PDF
No ratings yet
Coursera 14b Unit 1-Ncbi PDF
5 pages
Anotacion de Genomas
No ratings yet
Anotacion de Genomas
84 pages
Bioinformatics 1
No ratings yet
Bioinformatics 1
37 pages
Bioinformatics Glossary
No ratings yet
Bioinformatics Glossary
4 pages
Introduction To NCBI Resources
No ratings yet
Introduction To NCBI Resources
39 pages
Tics - A Brief Introduction
No ratings yet
Tics - A Brief Introduction
4 pages
Vels University Bioinformatics Manual-2025 - Prakash Balu
No ratings yet
Vels University Bioinformatics Manual-2025 - Prakash Balu
37 pages
المحاضرة 2
No ratings yet
المحاضرة 2
16 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
30 pages
Lecture 3
No ratings yet
Lecture 3
55 pages
Tics and Homology Modeling
No ratings yet
Tics and Homology Modeling
36 pages
Entrez
No ratings yet
Entrez
46 pages
Bioinformatic Databases 2
No ratings yet
Bioinformatic Databases 2
28 pages
Bioinformatics Lab Assignment Group 3
No ratings yet
Bioinformatics Lab Assignment Group 3
7 pages
Bioinformatics Database and Applications
100% (3)
Bioinformatics Database and Applications
82 pages
"MBG1002 Biological Databases Week II
No ratings yet
"MBG1002 Biological Databases Week II
37 pages
#1 Pendahuluan
No ratings yet
#1 Pendahuluan
134 pages
Biological Databases: Notes Adapted From Lecture Notes of Dr. Larry Hunter at The University of Colorado
No ratings yet
Biological Databases: Notes Adapted From Lecture Notes of Dr. Larry Hunter at The University of Colorado
41 pages
Bi 5&10mark Q&A Mse 1
No ratings yet
Bi 5&10mark Q&A Mse 1
14 pages
Genomics
No ratings yet
Genomics
24 pages
Selected Topic in Cs 1
No ratings yet
Selected Topic in Cs 1
53 pages
SIX1008 Biocomputing - Group Project (UNIVERSITI MALAYA) 2019/2020
No ratings yet
SIX1008 Biocomputing - Group Project (UNIVERSITI MALAYA) 2019/2020
23 pages
Sequence Comparison Homology and Similarity
No ratings yet
Sequence Comparison Homology and Similarity
12 pages
Su6.6 PL
No ratings yet
Su6.6 PL
34 pages
Introduction To NCBI
No ratings yet
Introduction To NCBI
29 pages
Gaffer PCR 2024
No ratings yet
Gaffer PCR 2024
12 pages
Comparing Sybr Green Master Mixes Protocol
No ratings yet
Comparing Sybr Green Master Mixes Protocol
3 pages
BBL 434 - Bioinformatics: D. Sundar
100% (1)
BBL 434 - Bioinformatics: D. Sundar
22 pages
(Ebook) DNA Pharmaceuticals: Formulation and Delivery in Gene Therapy, DNA Vaccination and Immunotherapy by Martin Schleef ISBN 9783527311873, 3527311874 PDF Download
100% (4)
(Ebook) DNA Pharmaceuticals: Formulation and Delivery in Gene Therapy, DNA Vaccination and Immunotherapy by Martin Schleef ISBN 9783527311873, 3527311874 PDF Download
84 pages
Alpha Fold Infographic
No ratings yet
Alpha Fold Infographic
1 page
International Journal On Bioinformatics & Biosciences (IJBB)
No ratings yet
International Journal On Bioinformatics & Biosciences (IJBB)
2 pages
Shotgun
No ratings yet
Shotgun
2 pages
Bio 102 Practice Problems Recombinant DN
No ratings yet
Bio 102 Practice Problems Recombinant DN
5 pages
Rice Genome Engineering and Gene Editing Anindya Bandyopadhyay Roger Thilmony Available Instanly
100% (1)
Rice Genome Engineering and Gene Editing Anindya Bandyopadhyay Roger Thilmony Available Instanly
139 pages
Bioinformatics
No ratings yet
Bioinformatics
13 pages
Isolation of Bacmid DNA and Analysis ByPCR
No ratings yet
Isolation of Bacmid DNA and Analysis ByPCR
3 pages
Genetic Engineering Practice Sheet
No ratings yet
Genetic Engineering Practice Sheet
4 pages
A2 Gene Technology 1 N
No ratings yet
A2 Gene Technology 1 N
6 pages
Bioinformatics Methods Express 1st Edition Edition Paul Dear New Release 2025
No ratings yet
Bioinformatics Methods Express 1st Edition Edition Paul Dear New Release 2025
97 pages
iEMEA GGP Ovine 50K Technical Sheet
No ratings yet
iEMEA GGP Ovine 50K Technical Sheet
1 page
Introduction To Bioengineering
100% (1)
Introduction To Bioengineering
32 pages
Colony PCR
No ratings yet
Colony PCR
20 pages
Clustalw
No ratings yet
Clustalw
9 pages
Practice Bioinformatics - Sem2 - 2023-2024
No ratings yet
Practice Bioinformatics - Sem2 - 2023-2024
2 pages
University of Melbourne 2021 Profile
No ratings yet
University of Melbourne 2021 Profile
20 pages
Metagenomics
No ratings yet
Metagenomics
14 pages
Ultimate Summer Training On AI and ML in Biology
No ratings yet
Ultimate Summer Training On AI and ML in Biology
8 pages
Understanding Bioinformatics 1st Edition Marketa Zveibil PDF Download
100% (4)
Understanding Bioinformatics 1st Edition Marketa Zveibil PDF Download
47 pages
Week 12 AQA Questions
No ratings yet
Week 12 AQA Questions
9 pages
KAIST Daejeon Campus Graduate Programs
No ratings yet
KAIST Daejeon Campus Graduate Programs
2 pages
Chapter 1-Introduction To Molecular Genomics
No ratings yet
Chapter 1-Introduction To Molecular Genomics
35 pages

L01 Solved

Uploaded by

L01 Solved

Uploaded by

Computational Biology

Introduction to Computational Biology

LAB#1 – Biological Databases

Discussion and Reflection:

Here is a brief overview of some key databases:

ORGANISM Columba livia Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;

How many entries do you obtain now? (Solution: 5548)

III) Additional exercises

2. alpha-globin of organism Capra hircus

3. alpha-globin of all ruminants

Click on the positions 1-24: that region will appear highlighted:

III) KEGG database

In http://www.genome.jp/kegg/pathway.html click on 6. Human Diseases.

You might also like