0% found this document useful (0 votes)
5 views

Bioinfo U2 KD 2

The document provides an overview of biological databases, which are organized collections of biological data essential for research in fields like genomics and proteomics. It classifies these databases into types such as sequence, genomic, protein structure, pathway, and literature databases, and discusses their retrieval systems that facilitate data access and analysis. Key features of retrieval systems include advanced search capabilities, data annotation, integration with other tools, user-friendly interfaces, and data visualization.

Uploaded by

simidas653
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Bioinfo U2 KD 2

The document provides an overview of biological databases, which are organized collections of biological data essential for research in fields like genomics and proteomics. It classifies these databases into types such as sequence, genomic, protein structure, pathway, and literature databases, and discusses their retrieval systems that facilitate data access and analysis. Key features of retrieval systems include advanced search capabilities, data annotation, integration with other tools, user-friendly interfaces, and data visualization.

Uploaded by

simidas653
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Unit 2.

Databases in Bioinformatics: Introduction, Biological Databases, Classification format


of Biological Databases, Biological Database Retrieval System.

Biological Databases: The collection of biological data on a computer which can be


manipulated to appear in varying arrangements and subsets is regarded as a database. The
biological information can be stored in different databases. Each database has its website with
unique navigation tools. A biological database is a large, organized body of persistent data,
usually associated with computerized software designed to update, query, and retrieve
components of the data stored within the system. A simple database might be a single file
containing many records, each of which includes the same set of information. The chief
objective of the development of a database is to organize data in a set of structured records to
enable easy retrieval of information.

Thus, biological databases are organized collections of biological information that are
stored in a digital format. They are essential resources for researchers and scientists to access,
analyze, and share data in areas such as genomics, proteomics, molecular biology, and more.
These databases facilitate the management of large volumes of biological data and are widely
used to store sequences of DNA, RNA, proteins, metabolic pathways, and other biological data
types. Biological databases come in various types, each with its own purpose and scope, and
can be accessed and queried to retrieve relevant biological information for research and
analysis.

Biological databases can be classified based on the type of data they store or the purpose they
serve. The major types include:

1. Sequence Databases: These contain nucleotide and protein sequence information.


Examples include GenBank (for DNA sequences) and UniProt (for protein sequences).
2. Genomic Databases: They focus on the study of genomes and their annotations.
Annotation is the process of adding biological information to a genome sequence. This
process helps researchers understand the sequence and its contents. Examples include
Examples of genomic databases

 GenBank: An NIH database that contains all publicly available DNA sequences
 dbGaP: A database that stores data and results from studies that investigate the
interaction between genotype and phenotype in humans
 NCBI genome: A collection of genome sequences, assemblies, and mapped
annotations
 UCSC Genome Browser: A database that contains genetic information for
vertebrate model organisms
 RefSeq: A collection of human gene-specific reference genomic sequences
 Ensembl: A genome browser for vertebrate genomes that supports research in
evolution, comparative genomics, and more
3. Protein Structure Databases: These store 3D structural information of proteins. The
Protein Data Bank (PDB) is a widely used example.
4. Pathway and Interaction Databases: It is a collection of information about the
molecular interactions and relationships between different biological components
within a cell, essentially mapping out the complex pathways that govern cellular
processes, often with a focus on specific biological functions. KEGG (Kyoto
Encyclopedia of Genes and Genomes) is a knowledge base for systematic analysis of
gene functions, linking genomic information with higher order functional information.
5. Literature Databases: These focus on storing scientific literature, publications, and
annotations. PubMed is a well-known example for biomedical research.

Classification Format of Biological Databases: Biological databases can be classified in


several ways, depending on their structure, type of data, and use cases.

1. Based on Data Types:


o Primary Databases: Contain raw data, such as sequence data (e.g., GenBank).
Primary databases are also called as archival databases. They are experimentally
derived data such as nucleotide sequence, protein sequence or macromolecular
structure. Experimental results are submitted directly into the database by
researchers, and the data are essentially archival in nature. Once given a
database accession number, the data in primary databases are never changed.
They form part of the scientific record.

o Secondary Databases: Secondary databases comprise data derived from the


results of analysing primary data. Secondary databases often draw upon
information from numerous sources, including other databases (primary and
secondary), controlled vocabularies, and the scientific literature. They are
highly curated, often using a complex combination of computational algorithms
and manual analysis and interpretation to derive new knowledge from the public
record of science. (e.g., UniProt).
2. Based on Data Representation:
o Flat File Databases: Store data in a simple text-based format with rows and
columns. A "flat file database" is a simple type of database that stores data in a
plain text file, where each line represents a single record with fields separated
by delimiters like commas or tabs, essentially acting like a single table with
rows and columns, making it easy to read and manipulate with basic text editors
(e.g., FASTA, CSV).
o Relational Databases: Use tables and relationships between them (e.g.,
MySQL, Oracle) to store complex data in structured formats.
o NoSQL Databases: NoSQL databases are a type of database that store large
amounts of data in unstructured or semi-structured formats. They are designed
to be scalable and prioritize performance and availability. (e.g., MongoDB,
Neo4j).
3. Based on the Access Model:
o Public Databases: Open access to data, freely available to everyone (e.g.,
PubMed, GenBank).
o Private Databases: Restricted access, often for proprietary or confidential data.

Biological Database Retrieval System: A Biological Database Retrieval System (BDRS) is a


tool or framework designed to retrieve biological information from a database. These systems
allow users to query databases, search for relevant data, and extract specific information to
meet their research needs. Key features of a Biological Database Retrieval System include:

1. Search Capabilities: These systems provide advanced search functionalities to retrieve


data based on keywords, sequence similarity, gene/protein names, and other biological
parameters.
2. Data Annotation: BDRS often include annotation tools that provide detailed
explanations and interpretations of the data, such as the functional role of genes or
proteins.
3. Integration with Other Tools: A good retrieval system can be integrated with other
bioinformatics tools for further analysis, such as sequence alignment, pathway
mapping, or structural modeling.
4. User Interface: BDRS should offer a user-friendly interface that can accommodate
both experienced researchers and those new to bioinformatics.
5. Data Visualization: Some systems also include visualization tools that help in
interpreting complex data, such as gene expression profiles or protein structures.

Popular Biological Database Retrieval Systems include tools like BLAST (Basic Local
Alignment Search Tool), Ensembl, UCSC Genome Browser, and Gene Ontology.

The advantage of these retrieval systems is that they not only return matches to a query but
also provide handy pointers to additional important information in related databases. In using
any of these systems, queries can be as simple as entering the accession number of a newly
published sequence than or as complex as searching multiple database fields for specific terms.
Depending on the type of data at hand, there are two basic ways of searching: Using
descriptive words to search - text databases and using a nucleotide or protein sequence to search
- Sequence databases.

You might also like