Overview of Major Nucleic Acid Databases

Uploaded by

utkarsh Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views37 pages

Overview of Major Nucleic Acid Databases

Uploaded by

utkarsh Gupta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

Primary database

Nucleic acid sequence databases

EMBL
Genbank/NCBI
DDBJ

Protein sequence databases

Uniprot- Swissprot
TrEMBL
Iproclass
EMBL – European Molecular
Biology Laboratory
At EMBL laboratories 1980, Heidelberg,
Germany First DNA sequence database
Nucleotide sequence database from the
European Bioinformatics Institute (EBI)
It includes sequence from direct author
submissions and genome sequencing groups
and from the scientific literature and patent
applications
This database is produced in an international
collaboration with DDBJ and GenBank
Each of the three groups collect sequence
data world wide and all new database entries
are exchanged between the groups on a daily
basis
Taxonomic Division – each entry belongs
to exactly one taxonomic division
Code Division
PHG Bacteriophage
ENV Environmental sample
FUN Fungal
HUM Human
INV Invertebrate
MAM Mammals
VRT Vertebrate
MUS Mus musculus
PLN Plant
PRO Prokaryotes
ROD Rodent
SYN Synthetic
TGN Transgenic
UNC Un-classified
Structure of an entry
Each entry in the database is composed of
lines
each line begins with a two-character line
code
which indicates the type of the information
contained in the line
EMBL Structure
ID – identification
AC – Accession number
DT – date
DE – Description
KW – keyword
OS – organism species
OC – organism classification
OG – Organelle
RN – reference number
RP – reference position
RA – reference author
RT – reference title
RL – Reference location
DR – database cross reference
CC – comments
FH – feature header
FT – feature table
XX – spacer line
SQ – sequence header
//- termination line
Line structure
Each line begins with a two character line
type code
This code is always followed by three blanks
So the actual information in each line begins
in character position 6
ID – identification
First line of the entry
Format of the ID line is
<1>;<2>; <3>; <4>; <5>; <6>; <7>;
Primary accession number
Sequence version number
Topology ‘circular or linear’
Molecule type
Data class
Taxonomical division
Sequence length
E.g. ID M85050; SV 1; linear; mRNA; STD; INV; 1353
BP.
AC – accession number
Accession number lines lists the accession
numbers associated with the entry
E.g. AC M85050; s46826;
Secondary accession number is to allow
tracking of data.
DT – Date
Date line shows when an entry first appeared
in the database and when it was last updated
Each entry contains two DT lines
 DT DD-MON-YYYY Created
 DT DD-MON-YYYY updated

 E.g. DT 20-DEC-1990 (Rel. 26, Created)

 E.g. DT 25-MAR-2001 (Rel. 67, Last updated,

Version 33)
DE – Description
Lines contains general descriptive
information about the sequence stored
It includes
Designation of the genes for which the
sequence codes
The region of the genome from which it is
derived
E.g. DE Human hemoglobin DNA with a
deletion causing Indian delta-beta thalassemia.
KW – Keywords
Used to generate cross reference indexes of the
sequence based on the function, structural and
other categories
E.g. KW hemoglobin.
OS – Organism species
Line specifies the preferred scientific name of
the organism
OS Genus Species (name)
E.g. OS Pseudoterranova decipiens (cod worm)
E.g. OS Homo Homosapiens Human
OC – organism classification
Line contains the taxonomical classification of the source
organism
The classification is listed top-down as nodes in a taxonomic
tree
OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata;
Euteleostomi; Mammalia;
OC Eutheria; Euarchontoglires; Primates; Haplorrhini;
Catarrhini; Hominidae;
OC Homo.
OG – Organelle
Line indicates the sub-cellular location of non-nuclear
sequences
E.g. OG Lung
The reference
RN, RP, RA, RT, RL
DR - Database cross reference
Line cross references to other database which
contains information related to the entry
FH – Feature Header
Key and Location
FT – Feature Table
Source - organism name
CDS – Coding sequence
mRNA – messenger RNA
SQ – sequence header
Line marks the beginning of the sequence and
gives summary of the sequence
E.g. SQ Sequence 2337 BP; 942 A; 462 C; 401
G; 529 T; 3 other;
// - Terminator
Terminator end of the entry
NCBI – National Center for
Biotechnology Information
Claude Pepper established the NCBI on
November 4, 1988 as a division of the
National Library of Medicine(NLM) at the
National Institute of Health
Nucleotide database – GenBank
The DNA database from NCBI, incorporates
sequences from publicly available sources
From direct author submissions and large
scale sequencing projects
Sequences data are submitted to GenBank
from individual scientists from around the
world and large centers involved in the
Human Genome project
Genbank is an international collaborative
project with partners located at
European Bioinformatics Institute in the
United Kingdom and
National Institute of Genetics in Japan
The increasing size of the database, have
made it convenient to split Genbank into
smaller, discrete divisions
Division Sequence Subset

PRI Primate
ROD Rodent
MAM Mammalian
VRT Vertebrate
INV Invertebrate
PLN Plant, Fungal, Algae
BCT Bacteria
RNA Structural RNA
VRL Viral
PHG Bacteriophage
SYN Synthetic
UNA Unannotated
EST Expressed Sequence
Tag
PAT Patent
STS Sequence Tagged sites
GSS Genome Survey
Sequence
HTG High Throughput
Genomic sequence
ENV Environmental
sampling sequence
The structure of GenBank
Entries
Each entry consists of
a number of keywords
Relevant associated sub-keywords and
a optional feature table
Its end is indicated by a “//” terminator
The positioning of these elements on any given
line is important
Keywords begin in column 1,
Sub-keywords begin in column 3 and
A code defining part of the feature table begin in
column 6
Any line beginning with a blank character is
considered a continuation from the keyword
or sub-keyword
Keywords includes LOCUS, DEFEINITION,
ACCESSION, NID, KEYWORDS, SOURCES
REFERENCE, FEATURES, BASE COUNT
AND ORIGIN
Locus
Includes a short label for the entry that may suggest
the function of the sequence
E.g. HUMCYCLOX
suggest a human cyclooxygenase
Cyclooxygenase (COX) is an enzyme that is responsible
for formation of important biological mediators called
prostanoids
Other relevant facts
Number of bases
Source of sequence data(mRNA)
Section of database (PRI) and
Date of submission
Definition
Contains a concise description of the sequence
(in this example Homosapiens cyclooxygenase)
Accession
Gives a accession number, a unique constant
code assigned to each entry
NID
Supplies a nucleotide identifier (g181253)
Keywords
Introduces a list of short phrases, assigned by
the author, describing gene products and other
relevant information about the entry
In this example cyclooxygenase, prostagladin.
Source
Provides information on the tissue from which
the data have been derived (here umblicalvein)
Sub-keyword ORGANISM illustrate the
biological classification of source organism
Here homosapiens, Eukaryotes etc
Reference
Indicate the portion of sequence data to
which the cited literature refers
Sub-keywords : Authors, title & Journal
provide a structure for the citation
MEDLINE is a pointer to an online medical
literature information resources, which
allows the abstract of the given article to be
viewed.
Features
It describes properties of the sequence
indetail
‘db-xref’ links to other database
Taxon:9606 - a taxonomic database
PID:g181254 - a protein sequence database
5’ – untranslated region (UTR)
CDS – coding sequence
3’ – untranslated region (UTR)
polyA signal – poly adenylation sequence
Base count
Provides the frequency of occurrence of the
different base types in sequence
E.g. 1010A, 712C, 633G and 1032T
Origin
Location of the first base of the sequence with
in the genome
Entry is terminated by the // marker
DDBJ – DNA Data Bank of Japan
DNA data bank of Japan began in 1986 at the National
Institute of Genetics(NIG) with the endorsement of the
ministry of Education, Science, sports and Culture
DDBJ has been functioning as one of the international
DNA databases including EBI in Europe and NCBI in
USA
DDBJ collaborating with two databank through
exchanging data and information on internet
By regularly holding two meetings
The International DNA databanks Advisory meeting
The International DNA databanks collaborative meeting
Structure of the DDBJ file is exactly same as the
Genbank file format
Contains Keywords, subkeywords, feature table
and terminator
SAKURA is a nucleotide sequence data submission
system through the WWW server at DDBJ
Using this system you can interactively enter and
submit nucleotide sequences, functions and
features of the sequences
MGS – Mass Genome Submission for Genome
sequences
Entrez
NIH
NCBI

•Submissions GenBank •Submissions

•Updates •Updates

EMBL
DDBJ
CIB EBI

NIG •Submissions
•Updates SRS

getentry EMBL
37

Introduction to Biological Databases
No ratings yet
Introduction to Biological Databases
73 pages
Classification of Biological Databases
No ratings yet
Classification of Biological Databases
50 pages
Introduction to Sequence Databases
No ratings yet
Introduction to Sequence Databases
71 pages
Bioinformatics Database Overview
No ratings yet
Bioinformatics Database Overview
18 pages
Database
No ratings yet
Database
40 pages
Composite Databases in Bioinformatics
No ratings yet
Composite Databases in Bioinformatics
34 pages
Bioinformatics Databases Overview
100% (4)
Bioinformatics Databases Overview
82 pages
Bioinformatics: DNA & Protein Databases
No ratings yet
Bioinformatics: DNA & Protein Databases
53 pages
Applied Bioinformatics in Genetics
No ratings yet
Applied Bioinformatics in Genetics
36 pages
Introduction to Bioinformatics Basics
No ratings yet
Introduction to Bioinformatics Basics
35 pages
Overview of Biological Databases
No ratings yet
Overview of Biological Databases
50 pages
Lecture1 BIOF242 Shuvadeep
No ratings yet
Lecture1 BIOF242 Shuvadeep
38 pages
Understanding Biological Databases in Bioinformatics
No ratings yet
Understanding Biological Databases in Bioinformatics
28 pages
Overview of NCBI Biological Databases
No ratings yet
Overview of NCBI Biological Databases
35 pages
Bioinformatics Database Overview
No ratings yet
Bioinformatics Database Overview
81 pages
GenBank Sequence Formats Overview
No ratings yet
GenBank Sequence Formats Overview
43 pages
Nucleotide Sequence Analysis Tools
No ratings yet
Nucleotide Sequence Analysis Tools
75 pages
Applications of Bioinformatics in Biology
No ratings yet
Applications of Bioinformatics in Biology
9 pages
Primary and Secondary Databases in Bioinformatics
No ratings yet
Primary and Secondary Databases in Bioinformatics
19 pages
Database Retrieval Systems Overview
No ratings yet
Database Retrieval Systems Overview
46 pages
Coursera 14b Unit 1-Ncbi PDF
No ratings yet
Coursera 14b Unit 1-Ncbi PDF
5 pages
Overview of Primary and Secondary Databases
No ratings yet
Overview of Primary and Secondary Databases
79 pages
Overview of Biological Databases
No ratings yet
Overview of Biological Databases
35 pages
BI W2 Ex Ans
No ratings yet
BI W2 Ex Ans
9 pages
Overview of Biological Databases
No ratings yet
Overview of Biological Databases
29 pages
Understanding EMBL Format in JEMBOSS
No ratings yet
Understanding EMBL Format in JEMBOSS
13 pages
NCBI Data Retrieval and Resources Guide
No ratings yet
NCBI Data Retrieval and Resources Guide
24 pages
Introduction to Bioinformatics Basics
No ratings yet
Introduction to Bioinformatics Basics
20 pages
Bioinformatics Lab Notebook Overview
No ratings yet
Bioinformatics Lab Notebook Overview
27 pages
Overview of Biological Databases
No ratings yet
Overview of Biological Databases
16 pages
Overview of Biological Databases
No ratings yet
Overview of Biological Databases
8 pages
Bioinformatics Practical Notebook
No ratings yet
Bioinformatics Practical Notebook
29 pages
Overview of GenBank Database
No ratings yet
Overview of GenBank Database
14 pages
Bion For Matics
No ratings yet
Bion For Matics
48 pages
Overview of Bioinformatics Databases
No ratings yet
Overview of Bioinformatics Databases
105 pages
Nucleic Acid Databases
No ratings yet
Nucleic Acid Databases
37 pages
Introduction to Bioinformatics Basics
No ratings yet
Introduction to Bioinformatics Basics
28 pages
Bioinformatics: An Overview of Techniques
100% (1)
Bioinformatics: An Overview of Techniques
41 pages
Lecture 2
No ratings yet
Lecture 2
62 pages
Bioinformatics: Genomic Databases & Tools
No ratings yet
Bioinformatics: Genomic Databases & Tools
81 pages
NCBI Handbook
No ratings yet
NCBI Handbook
391 pages
Biological Databases - Sharma&MunjalBook
No ratings yet
Biological Databases - Sharma&MunjalBook
17 pages
Biotech Document Databases Overview
No ratings yet
Biotech Document Databases Overview
49 pages
Pharmacogenomics and Bioinformatics Overview
No ratings yet
Pharmacogenomics and Bioinformatics Overview
93 pages
Overview of NCBI Biological Databases
No ratings yet
Overview of NCBI Biological Databases
41 pages
Introduction to Biological Databases
No ratings yet
Introduction to Biological Databases
15 pages
Seminar Bioinformatics
No ratings yet
Seminar Bioinformatics
13 pages
Lec 3 Terms and Definitions in Bioinformatics
No ratings yet
Lec 3 Terms and Definitions in Bioinformatics
8 pages
Introduction to Bioinformatics Concepts
No ratings yet
Introduction to Bioinformatics Concepts
22 pages
Overview of Biological Databases
No ratings yet
Overview of Biological Databases
24 pages
Overview of Biological Databases
No ratings yet
Overview of Biological Databases
25 pages
Overview of Biological Databases
No ratings yet
Overview of Biological Databases
47 pages
Understanding Bioinformatics and Its Applications
No ratings yet
Understanding Bioinformatics and Its Applications
26 pages
Bioinformatics Lab: BLAST Analysis
No ratings yet
Bioinformatics Lab: BLAST Analysis
6 pages
Overview of Biological Databases
No ratings yet
Overview of Biological Databases
15 pages
Overview of Biological Databases
No ratings yet
Overview of Biological Databases
26 pages
Bioinformatics Overview for Engineers
No ratings yet
Bioinformatics Overview for Engineers
30 pages
Quiz on Plant Reproduction
No ratings yet
Quiz on Plant Reproduction
6 pages
Key Concepts in Biological Psychology
No ratings yet
Key Concepts in Biological Psychology
1 page
Overview of Respiratory and Circulatory Systems
No ratings yet
Overview of Respiratory and Circulatory Systems
8 pages
Overview of Zoology and Cell Biology
No ratings yet
Overview of Zoology and Cell Biology
4 pages
The Analysis Techniques of Amino Acid and Protein in Food and Agricultural Products
No ratings yet
The Analysis Techniques of Amino Acid and Protein in Food and Agricultural Products
8 pages
Understanding Nucleic Acids and Their Structure
No ratings yet
Understanding Nucleic Acids and Their Structure
21 pages
Gene Therapy Advances for Butterfly Children
No ratings yet
Gene Therapy Advances for Butterfly Children
3 pages
Erythroblast Development Overview
No ratings yet
Erythroblast Development Overview
2 pages
Understanding Codominance in Genetics
No ratings yet
Understanding Codominance in Genetics
5 pages
Jasmonic Acid Signaling in Plants
No ratings yet
Jasmonic Acid Signaling in Plants
15 pages
Future of Human Evolution Insights
No ratings yet
Future of Human Evolution Insights
2 pages
Reproduction in Plants: A Comprehensive Guide
No ratings yet
Reproduction in Plants: A Comprehensive Guide
2 pages
History of Drug Discovery and Development
No ratings yet
History of Drug Discovery and Development
11 pages
ELISA Protocol for Antibody Detection
No ratings yet
ELISA Protocol for Antibody Detection
3 pages
Alkaline Phosphatase Isoenzymes Overview
No ratings yet
Alkaline Phosphatase Isoenzymes Overview
32 pages
Exploring Nanotechnology and Its Applications
No ratings yet
Exploring Nanotechnology and Its Applications
8 pages
Silverzone Olympiad Mock Test Class 10
No ratings yet
Silverzone Olympiad Mock Test Class 10
12 pages
Avian Immune System Insights
No ratings yet
Avian Immune System Insights
14 pages
Vision Restoration in Retinal Degeneration
No ratings yet
Vision Restoration in Retinal Degeneration
14 pages
Agarose Gel Electrophoresis Overview
79% (14)
Agarose Gel Electrophoresis Overview
7 pages
Introduction to Biochemistry Basics
No ratings yet
Introduction to Biochemistry Basics
7 pages
Lee Et Al. 2019
No ratings yet
Lee Et Al. 2019
10 pages
Role of Root Exudates in Bioremediation
No ratings yet
Role of Root Exudates in Bioremediation
8 pages
Cell Division and Cycle Overview
No ratings yet
Cell Division and Cycle Overview
2 pages
OCSEF 2026 Project Categories
No ratings yet
OCSEF 2026 Project Categories
1 page
CSEC Biology Study Notes Guide
No ratings yet
CSEC Biology Study Notes Guide
102 pages
Genetics and Inheritance Overview
71% (7)
Genetics and Inheritance Overview
60 pages
Anthrax Diagnosis and Microbiological Tests
No ratings yet
Anthrax Diagnosis and Microbiological Tests
26 pages
Urea Cycle: Biochemical Process Explained
No ratings yet
Urea Cycle: Biochemical Process Explained
8 pages
Understanding Plant Programmed Cell Death
100% (1)
Understanding Plant Programmed Cell Death
5 pages

Overview of Major Nucleic Acid Databases

Uploaded by

Overview of Major Nucleic Acid Databases

Uploaded by

Primary database

Nucleic acid sequence databases

Protein sequence databases

 E.g. DT 20-DEC-1990 (Rel. 26, Created)

 E.g. DT 25-MAR-2001 (Rel. 67, Last updated,

•Submissions GenBank •Submissions

You might also like