0% found this document useful (0 votes)

17 views43 pages

02-B-Sequence Presentation and File Formats

The document provides an overview of various sequence presentation and file formats, focusing on GenBank and SwissProt records. It details the structure of sequence records, including fields such as locus name, accession numbers, definitions, and features like genes and coding sequences. Additionally, it outlines the format and components of PDB records, emphasizing the importance of unique identifiers and the organization of sequence data.

Uploaded by

wasilicharles

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views43 pages

02-B-Sequence Presentation and File Formats

Uploaded by

wasilicharles

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Sequence presentation and file formats

Wilson Nandolo
[email protected]
+265993375505
GenBank
sequence format
GenBank
sequence
format
• 1 The LOCUS
field consists of
five different
subfields
described below:
• 1a - 1e

4
1a Locus Name
• The locus name is a tag for
grouping similar sequences.
• The first two or three letters
usually designate the organism.
• In this case HS stands for
Homo sapiens
• The last several characters are
associated with another group
designation, such as gene
product.
• In this example, the last
three digits represent the
gene symbol, HFE.
• Currently, the only requirement
for assigning a locus name to a
record is that it is unique.

5
1b number of base-
pairs

6
1c Molecule Type -
Type of molecule
that was
sequenced

• All sequence data in an entry must be of the same

type.

7
1d GenBank
Division

• There are different GenBank divisions

• In this example, PRI stands for primate sequences.
• Some other divisions include ROD (rodent sequences) MAM
(other mammal sequences) PLN (plant, fungal, and algal
sequences), and BCT (bacterial sequences)

8
1e Modification Date
- Date of most recent
modification made
to the record

• The date of first public release is not available in the sequence

record.
• This information can be obtained only by contacting NCBI at
[email protected].
9
2 DEFINITION - Brief
description of the
sequence
• The description may include source organism name, gene or
protein name, or designation as untranscribed or untranslated
sequences (e.g., a promoter region).
• For sequences containing a coding region (CDS), the definition
field may also contain a “completeness” qualifier such as
"complete CDS" or "exon 1."
10
3 ACCESSION -
Unique identifier
assigned to a
complete sequence
record

• This number never changes, even if the record is modified.

• An accession number is a combination of letters and numbers
that are usually in the format of one letter followed by five digits
(e.g., M12345) or two letters followed by six digits (e.g.,
AC123456).

11
4 VERSION - Identification
number assigned to a
single, specific sequence
in the database

• This number is in the format “accession.version.”

• If any changes are made to the sequence data, the version part of the
number will increase by one.
• For example U12345.1 becomes U12345.2.
• A version number of Z92910.1 for this HFE sequence indicates that the
sequence data has not been altered since its original submission.

12
5 GI - Also a
sequence
identification
number

• Whenever a sequence is changed, the version number is

increased and a new GI is assigned.
• If a nucleotide sequence record contains a protein translation of
the sequence, the translation will have its own GI number
13
The RefSeq Accession number format
and molecule types

Accession Molecule type

NC_xxxxxx Complete genomic molecule
NG_xxxxxx Genomic region
NM_xxxxxx mRNA
NP_xxxxxx Protein
NR_xxxxxx RNA
NT_xxxxxx computed Genomic contig
XM_xxxxxx computed mRNA
XP_xxxxxx computed Protein
6 KEYWORDS - A keyword
can be any word or
phrase used to describe
the sequence

• Keywords are not taken from a controlled vocabulary.

• Notice that in this record the keyword, "haemochromatosis,"
employs British spelling, rather than the American
"hemochromatosis."
• Many records have no keywords.
• A period is placed in this field for records without keywords.
15
7 SOURCE

• Usually contains an abbreviated or common

name of the source organism

16
8 ORGANISM-The
scientific name (usually
genus and species) and
phylogenetic lineage
• See the NCBI Taxonomy Homepage for more information about
the classification scheme used to construct taxonomic lineages.

17
9 REFERENCE - Citations of
publications by sequence
authors that support
information presented in the
sequence record
• Several references may be included in one record
• References are automatically sorted from the oldest to the newest.
• Cited publications are searchable by author, article or publication title,
journal title, or MEDLINE unique identifier (UID).
• The UID links the sequence record to the MEDLINE record.
18
9 REFERENCE

• If the REFERENCE
TITLE contains the
words "Direct
Submission," contact
information for the
submitter(s) is
provided.

19
The
FEATURES
table
• A feature is simply an annotation that
describes a portion of the sequence.
• Each feature includes a location
(sequence location or interval) and one
or several qualifiers.
• Clicking on the feature name will open
a record for the sequence interval
identified in the feature location.
• A list of features can be found in
• http://www.ncbi.nlm.nih.gov/collab/FT/

20
SOURCE - An obligatory feature

• The source gives the length of the entire

sequence, the scientific name of the
source organism, and the Taxon ID
number.

• Other types of information that the

submitter may include in this field are
chromosome number, map location,
clone, and strain identification.

21
GENE

• Sequence portion that

delineates the
beginning and end of
a gene

22
EXON

• Sequence segment that

contains an exon.
• Exons may contain portions of
5' and 3’ UTRs (untranslated
regions).
• The name of the gene to which
the exon belongs and exon
number are provided.

23
CDS - Sequence of nucleotides
that code for amino acids of
the protein product (coding
sequence)

• The CDS begins with the first nucleotide

of the start codon and ends with the third
nucleotide of the stop codon.
• This feature includes the translation into
amino acids and may also contain gene
name, gene product function, link to
protein sequence record, and cross-
references to other database entries.

24
INTRON

• Transcribed but spliced-out

parts.
• Intron number is shown

25
polyA_signal - Identifies
the sequence portion
required for endonuclease
cleavage of an mRNA
transcript

• Consensus sequence for the

polyA signal is AATAAA.

26
BASE COUNT &
ORIGIN

• Base Count gives the total

number of adenine (A), cytosine
(C), guanine (G), and thymine
(T) bases in the sequence.
• Origin contains the sequence
data, which begins on the line
immediately below the field title.

27
Sequence formats:
FASTA format
SwissProt records
⚫ ID identification line

− ID ENTRY_NAME DATA_CLASS; MOLECULE_TYPE;

SEQUENCE_LENGTH.
ID CRAM_CRAAB STANDARD; PRT; 46 AA.
− Format for the ENTRY_NAME:
− NAME_SPECIES ( 10 characters)
here: Crambin (Crambe abyssinica)
− For number of organisms (16) SPECIES has a recognizable
name:
HUMAN, MOUSE, CHICK, BOVIN, YEAST, ECOLI….

− N.B. The ID can change, e.g. serotonine receptors have got a

new nomenclature
SwissProt records
⚫ AC accession number
AC P01542;
AC is unique:
Name, sequence, everything can change but AC stays the same

⚫ DT deposition date
DT 21-JUL-1986 (Rel. 01, Created)
DT 30-MAY-2000 (Rel. 39, Last sequence update)
DT 30-MAY-2000 (Rel. 39, Last annotation update)
1) You can not see what the last annotation update was
2) No depositor record (Implicit: author of first reference)
SwissProt records
⚫ DE description
DE CRAMBIN.
DE 6-phosphofructo-2-kinase 1 (EC 2.7.1.105)
(Phosphofructokinase 2 I)
1) General descriptive information
2) Free-format

⚫ GN gene name
GN THI2.
⚫ OS & OC & OG
⚫ OS Crambe abyssinica (Abyssinian crambe).
OC Eukaryota; Viridiplantae;
Embryophyta;Tracheophyta;Spermatophyta;
OC Magnoliophyta; eudicotyledons; Rosidae; eurosids II;
Brassicales;
OC Brassicaceae; Crambe.
⚫ Organism Species; Organism Classification; OrGanelle
SwissProt records
⚫ RN References
RN [1]
RP SEQUENCE.
RX MEDLINE; 82046542.
RA Teeter M.M., Mazer J.A., L'Italien J.J.;
RT "Primary structure of the hydrophobic plant protein crambin.";
RL Biochemistry 20:5437-5443(1981).

⚫ CC Comments or notes
CC -!- FUNCTION: THE FUNCTION OF THIS HYDROPHOBIC PLANT SEED
PROTEIN
CC IS NOT KNOWN.
CC -!- MISCELLANEOUS: TWO ISOFORMS EXISTS, A MAJOR FORM PL
(SHOWN HERE)
CC AND A MINOR FORM SI.
CC -!- SIMILARITY: BELONGS TO THE PLANT THIONIN FAMILY.
SwissProt records
⚫ DR Database Cross Reference
DR PIR; A01805; KECX.
DR PDB; 1CRN; 16-APR-87.
DR PDB; 1CBN; 31-JAN-94.
DR PDB; 1CCM; 31-OCT-93.
DR PDB; 1CCN; 31-JAN-94.
DR PDB; 1CNR; 31-AUG-94.
DR PDB; 1AB1; 12-AUG-97.
DR INTERPRO; IPR001010; -.
DR PFAM; PF00321; plant_thionins; 1.
DR PRINTS; PR00287; THIONIN.
DR PROSITE; PS00271; THIONIN; 1.

⚫ KW Keyword
Not standardized (under control of depositor)
KW Thionin; 3D-structure.
SwissProt records
⚫ FT Feature table data
FT DISULFID 3 40
FT DISULFID 4 32
FT DISULFID 16 26
FT VARIANT 22 22 P -> S (IN ISOFORM SI).
FT VARIANT 25 25 L -> I (IN ISOFORM SI).
FT STRAND 2 3
FT HELIX 7 16
FT TURN 17 19
FT HELIX 23 30
FT TURN 31 31
FT STRAND 33 34
FT TURN 42 43
Feature table
⚫ Other features: post-translational modifications, binding sites, enzyme active
sites, local secondary structure or other characteristics reported in the cited
references. Sequence conflicts between references are also included.
FT CONFLICT 33 33 MISSING (IN REF. 2).
FT MUTAGEN 123 123 G->R,L,M: DNA BINDING LOST.
FT MOD_RES 11 11 PHOSPHORYLATION (BY PKC).
FT LIPID 1 1 MYRISTATE.
FT CARBOHYD 103 103 GLUCOSYLGALACTOSE.
FT METAL 87 87 COPPER (POTENTIAL).
FT BINDING 14 14 HEME (COVALENT).
FT PROPEP 27 28 ACTIVATION PEPTIDE.
FT DOMAIN 22 788 EXTRACELLULAR (POTENTIAL).
FT ACT_SITE 193 193 ACCEPTS A PROTON DURING CATALYSIS.
SwissProt records
⚫ SQ sequence header
SQ SEQUENCE 46 AA; 4736 MW; 919E68AF159EF722
CRC64;

⚫ Sequence data
TTCCPSIVAR SNFNVCRLPG TPEALCATYT GCIIIPGATC
PGDYAN

⚫ //
Termination line
PDB records
⚫ Filename= accession number= PDB Code
1) Filename is 4 positions (often 1 digit & 3 letters, e.g. 1CRN)
2) Be aware: 0HYK means entry HYK does not contain
coordinates

⚫ HEADER
describes molecule & gives deposition date
HEADER PLANT SEED PROTEIN 30-APR-81 1CRN 1CRND 1

⚫ CMPND
name of molecule
COMPND CRAMBIN 1CRN 4

⚫ SOURCE
organism
SOURCE ABYSSINIAN CABBAGE (CRAMBE ABYSSINICA) SEED 1CRN 5
PDB records
⚫ AUTHOR
AUTHOR W.A.HENDRICKSON,M.M.TEETER 1CRN 6

⚫ The depositor

⚫ JRNL
JRNL AUTH M.BLABER,X.-J.ZHANG,B.W.MATTHEWS 111L 10

JRNL TITL STRUCTURAL BASIS OF ALPHA-HELIX PROPENSITY AT TWO 111L 11

JRNL TITL 2 SITES IN T4 LYSOZYME 111L 12

JRNL REF SCIENCE V. 260 1637 1993 111L 13

JRNL REFN ASTM SCIEAS US ISSN 0036-8075 038 111L 14

⚫ REMARK
Not standardized: many different REMARK records & subrecords!
REMARK 1 REFERENCE 3 1CRNC 10
REMARK 1 AUTH M.M.TEETER,W.A.HENDRICKSON 1CRN 16
REMARK 1 TITL HIGHLY ORDERED CRYSTALS OF THE PLANT SEED PROTEIN 1CRN 17
REMARK 1 TITL 2 CRAMBIN 1CRN 18
REMARK 1 REF J.MOL.BIOL. V. 127 219 1979 1CRN 19
REMARK 1 REFN ASTM JMOBAK UK ISSN 0022-2836 070 1CRN 20
REMARK 2 1CRN 21
REMARK 2 RESOLUTION. 1.5 ANGSTROMS. 1CRN 22
PDB records
⚫ SEQRES
Sequence of protein;
Be aware: Not always all 3D-coordinates are present for all the amino acids in SEQRES!!
SEQRES 1 46 THR THR CYS CYS PRO SER ILE VAL ALA ARG SER ASN PHE 1CRN 51
SEQRES 2 46 ASN VAL CYS ARG LEU PRO GLY THR PRO GLU ALA ILE CYS 1CRN 52
SEQRES 3 46 ALA THR TYR THR GLY CYS ILE ILE ILE PRO GLY ALA THR 1CRN 53
SEQRES 4 46 CYS PRO GLY ASP TYR ALA ASN 1CRN 54

⚫ HET & FORMUL

⚫ metals, cofactors, ions, etc.

HET NAD A 1 44 NAD CO-ENZYME 4MDH 219
HET SUL A 2 5 SULFATE 4MDH 220
HET NAD B 1 44 NAD CO-ENZYME 4MDH 221
HET SUL B 2 5 SULFATE 4MDH 222
FORMUL 3 NAD 2(C21 H28 N7 O14 P2) 4MDH 223
FORMUL 4 SUL 2(O4 S1) 4MDH 224
FORMUL 5 HOH *471(H2 O1)
4MDH 225
PDB records
⚫ HELIX/SHEET/TURN
Secondary structure elements as provided by the crystallographer (subjective)
HELIX 1 H1 ILE 7 PRO 19 1 3/10 CONFORMATION RES 17,19 1CRN 55
SHEET 2 S1 2 CYS 32 ILE 35 -1 1CRN 58
TURN 1 T1 PRO 41 TYR 44 1CRN 59

⚫ SSBOND
disulfide bridges
SSBOND 1 CYS 3 CYS 40 1CRN 60
SSBOND 2 CYS 4 CYS 32 1CRN 61

⚫ CRYST1, ORIGX1, ORIGX2, ORIGX3, SCALE1, SCALE2, SCALE3

crystallographic parameters
CRYST1 40.960 18.650 22.520 90.00 90.77 90.00 P 21 2 1CRN 63
ORIGX1 1.000000 0.000000 0.000000 0.00000 1CRN 64
ORIGX2 0.000000 1.000000 0.000000 0.00000 1CRN 65
ORIGX3 0.000000 0.000000 1.000000 0.00000 1CRN 66
SCALE1 .024414 0.000000 -.000328 0.00000 1CRN 67
SCALE2 0.000000 .053619 0.000000 0.00000 1CRN 68
SCALE3 0.000000 0.000000 .044409 0.00000 1CRN 69
PDB records
⚫ ATOM
one line for each atom with its unique name and its x,y,z coordinates
ATOM 1 N THR 1 17.047 14.099 3.625 1.00 13.79 1CRN 70
ATOM 2 CA THR 1 16.967 12.784 4.338 1.00 10.80 1CRN 71
ATOM 3 C THR 1 15.685 12.755 5.133 1.00 9.19 1CRN 72
ATOM 4 O THR 1 15.268 13.825 5.594 1.00 9.85 1CRN 73
ATOM 5 CB THR 1 18.170 12.703 5.337 1.00 13.02 1CRN 74
ATOM 6 OG1 THR 1 19.334 12.829 4.463 1.00 15.06 1CRN 75
ATOM 7 CG2 THR 1 18.150 11.546 6.304 1.00 14.23 1CRN 76
ATOM 8 N THR 2 15.115 11.555 5.265 1.00 7.81 1CRN 77
ATOM 9 CA THR 2 13.856 11.469 6.066 1.00 8.31 1CRN 78
ATOM 10 C THR 2 14.164 10.785 7.379 1.00 5.80 1CRN 79
ATOM 11 O THR 2 14.993 9.862 7.443 1.00 6.94 1CRN 80

⚫ TER record terminates the amino acid chain

ATOM 325 OD1 ASN 46 11.982 4.849 15.886 1.00 11.00 1CRN 394
ATOM 326 ND2 ASN 46 13.407 3.298 15.015 1.00 10.32 1CRN 395
ATOM 327 OXT ASN 46 12.703 4.973 10.746 1.00 7.86 1CRN 396
TER 328 ASN 46 1CRN 397
PDB records
⚫ HETATM
atomic coordinate records for atoms within “HET & FORMUL”-lines (metals,
cofactors, ions, …) and for water molecules
HETATM 5158 AP NAD B 1 42.641 30.361 41.284 1.00
26.73 4MDH5495
HETATM 5159 AO1 NAD B 1 43.440 31.570 40.868 1.00
20.69 4MDH5496
HETATM 5160 AO2 NAD B 1 41.161 30.484 41.376 1.00
33.73 4MDH5497

HETATM 5207 O HOH 0 15.379 1.907 3.295 1.00

58.12 4MDH5544
HETATM 5208 O HOH 1 58.861 0.984 17.024 1.00
37.58 4MDH5545
HETATM 5209 O HOH 2 24.384 1.184 74.398 1.00
35.92 4MDH5546
End of presentation

4 Bioinformaticsdatabases
No ratings yet
4 Bioinformaticsdatabases
71 pages
Molecular Genetics - Lab Manual - 22 May 2021
No ratings yet
Molecular Genetics - Lab Manual - 22 May 2021
36 pages
Selected Topic in Cs 1
No ratings yet
Selected Topic in Cs 1
53 pages
Fat Noews
No ratings yet
Fat Noews
37 pages
Module 1 - Session 3 - Part 2
No ratings yet
Module 1 - Session 3 - Part 2
36 pages
Module in Tics
No ratings yet
Module in Tics
20 pages
Computational Biology B.Tech - Biotech (Vith Semester)
No ratings yet
Computational Biology B.Tech - Biotech (Vith Semester)
34 pages
Formats
No ratings yet
Formats
7 pages
2nd Lec Student Copy - 2
No ratings yet
2nd Lec Student Copy - 2
19 pages
Genes
No ratings yet
Genes
74 pages
Basic Bioinformatics
No ratings yet
Basic Bioinformatics
40 pages
Coursera 14b Unit 1-Ncbi PDF
No ratings yet
Coursera 14b Unit 1-Ncbi PDF
5 pages
Gene Identification - I: Shivani Chandra Birla Institute of Scientific Research
No ratings yet
Gene Identification - I: Shivani Chandra Birla Institute of Scientific Research
35 pages
GlOsario Bioinformatica
No ratings yet
GlOsario Bioinformatica
5 pages
Biological Databeses - Details
No ratings yet
Biological Databeses - Details
47 pages
Bioinformatics for Biochem Students
No ratings yet
Bioinformatics for Biochem Students
6 pages
Entrez
No ratings yet
Entrez
46 pages
Biological Databases
100% (1)
Biological Databases
39 pages
Biological Database 1
No ratings yet
Biological Database 1
50 pages
Lecture1 BIOF242 Shuvadeep
No ratings yet
Lecture1 BIOF242 Shuvadeep
38 pages
Factsheet: Genome Database
No ratings yet
Factsheet: Genome Database
4 pages
Lecture 3 Database
No ratings yet
Lecture 3 Database
81 pages
Bioinformaticpdf 1
No ratings yet
Bioinformaticpdf 1
21 pages
Lecture 2
No ratings yet
Lecture 2
36 pages
Terms 333
No ratings yet
Terms 333
18 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
66 pages
Bioinformatics Code & Format Guide
No ratings yet
Bioinformatics Code & Format Guide
53 pages
DNA Sequence Formats - Various Databases
No ratings yet
DNA Sequence Formats - Various Databases
5 pages
Anotacion de Genomas
No ratings yet
Anotacion de Genomas
84 pages
Bioinformatics: ABE 2007 Kent Koster Group 3
No ratings yet
Bioinformatics: ABE 2007 Kent Koster Group 3
43 pages
Bioinformatics: Intended Learning Outcomes
No ratings yet
Bioinformatics: Intended Learning Outcomes
9 pages
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
No ratings yet
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
50 pages
Bio PPT
No ratings yet
Bio PPT
35 pages
BTH 403-BTG407 Practical Session1
No ratings yet
BTH 403-BTG407 Practical Session1
12 pages
Search Sequence Database
No ratings yet
Search Sequence Database
6 pages
Bioinformaticpdf
No ratings yet
Bioinformaticpdf
23 pages
BI Lab Manual (18-19)
No ratings yet
BI Lab Manual (18-19)
21 pages
1 What Is Bioinformatics
No ratings yet
1 What Is Bioinformatics
34 pages
2 Practical of Basic Bioinformatics Module: 2.1. Uniprotkb/Swiss-Prot
No ratings yet
2 Practical of Basic Bioinformatics Module: 2.1. Uniprotkb/Swiss-Prot
12 pages
Coursera BioinfoMethods-I Lab01 PDF
No ratings yet
Coursera BioinfoMethods-I Lab01 PDF
22 pages
Practical Assignment IV
No ratings yet
Practical Assignment IV
3 pages
Module 2 (Bioinformatics)
No ratings yet
Module 2 (Bioinformatics)
81 pages
UBT518 702300054 Expt01
No ratings yet
UBT518 702300054 Expt01
13 pages
Bioinformatics Unit I
No ratings yet
Bioinformatics Unit I
6 pages
Bioinformatics Manual Updated
No ratings yet
Bioinformatics Manual Updated
48 pages
Quiz #3: Biochemical Engineering Fall 2003
No ratings yet
Quiz #3: Biochemical Engineering Fall 2003
5 pages
Bioinformatics Database Basics
No ratings yet
Bioinformatics Database Basics
18 pages
NEW BMS Software Requirement Specification1
No ratings yet
NEW BMS Software Requirement Specification1
135 pages
Biological Sequence Databases
No ratings yet
Biological Sequence Databases
35 pages
Lesson 3.1 DB-Updated
No ratings yet
Lesson 3.1 DB-Updated
50 pages
Bioinformatics Day3
No ratings yet
Bioinformatics Day3
4 pages
Gen Bank
No ratings yet
Gen Bank
8 pages
1 8 Genome 2
No ratings yet
1 8 Genome 2
36 pages
Bioinformatics Database and Applications
100% (3)
Bioinformatics Database and Applications
82 pages
UoG Bioinfomatics 3
No ratings yet
UoG Bioinfomatics 3
101 pages
L01 Solved
No ratings yet
L01 Solved
15 pages
Abnormal Child Psychology - 6th
No ratings yet
Abnormal Child Psychology - 6th
32 pages
BNSR300 Week 7 Leadership in Healthcare Paper
No ratings yet
BNSR300 Week 7 Leadership in Healthcare Paper
11 pages
Curriculum Vitae 2021 1 1
No ratings yet
Curriculum Vitae 2021 1 1
7 pages
SOP For Malaria Micros
No ratings yet
SOP For Malaria Micros
74 pages
06 235494 001 - Print
No ratings yet
06 235494 001 - Print
24 pages
5th Lizard With A Blue Tongue
No ratings yet
5th Lizard With A Blue Tongue
7 pages
Stanley and Campbell
No ratings yet
Stanley and Campbell
7 pages
Shoulder Workout PDF Builtwithscience - Com .01 PDF
71% (7)
Shoulder Workout PDF Builtwithscience - Com .01 PDF
7 pages
Sangkuriang English
No ratings yet
Sangkuriang English
3 pages
Best Source: Notes
No ratings yet
Best Source: Notes
23 pages
Lesson 4
No ratings yet
Lesson 4
5 pages
Echocardiography (Normal Values) - TECHmED
No ratings yet
Echocardiography (Normal Values) - TECHmED
30 pages
Ee Problem Solving Pogi
No ratings yet
Ee Problem Solving Pogi
123 pages
Senior Pet Nutrition and Management
100% (1)
Senior Pet Nutrition and Management
17 pages
30 Days 3x1-Mx Diet Recipes - Compressed
No ratings yet
30 Days 3x1-Mx Diet Recipes - Compressed
36 pages
Pink and Brown Simple Handwritten Portfolio Presentation
No ratings yet
Pink and Brown Simple Handwritten Portfolio Presentation
15 pages
Delhi Data
No ratings yet
Delhi Data
4 pages
Chargeback Presentation
No ratings yet
Chargeback Presentation
10 pages
Use of Rabbit Urine
No ratings yet
Use of Rabbit Urine
10 pages
MCR MUE OutpatientHospitalServices Eff 10-01-2024
No ratings yet
MCR MUE OutpatientHospitalServices Eff 10-01-2024
614 pages
TCW Module 3 Pre Final
No ratings yet
TCW Module 3 Pre Final
20 pages
Beneficial Role of Microrganisms
No ratings yet
Beneficial Role of Microrganisms
8 pages
Lab Report 1
No ratings yet
Lab Report 1
39 pages
Bambi Bucket Service Manual 5566 HL9800
No ratings yet
Bambi Bucket Service Manual 5566 HL9800
83 pages
Meditation As Medication, Where Are We
No ratings yet
Meditation As Medication, Where Are We
2 pages
Liquid Penetrant Testing Generic Procedure: Procedure No. BCE/LPT/001-2021/Rev.0
No ratings yet
Liquid Penetrant Testing Generic Procedure: Procedure No. BCE/LPT/001-2021/Rev.0
11 pages
EN-Vatech SCM
No ratings yet
EN-Vatech SCM
2 pages
Nurses' Role in Disaster Management
No ratings yet
Nurses' Role in Disaster Management
27 pages
Silo - Tips Beyond Buds Marijuana Extracts Hash Vaping Dabbing Edibles Medicines
100% (2)
Silo - Tips Beyond Buds Marijuana Extracts Hash Vaping Dabbing Edibles Medicines
26 pages

02-B-Sequence Presentation and File Formats

Uploaded by

02-B-Sequence Presentation and File Formats

Uploaded by

Sequence presentation and file formats

• All sequence data in an entry must be of the same

• There are different GenBank divisions

• The date of first public release is not available in the sequence

• This number never changes, even if the record is modified.

• This number is in the format “accession.version.”

• Whenever a sequence is changed, the version number is

Accession Molecule type

• Keywords are not taken from a controlled vocabulary.

• Usually contains an abbreviated or common

• The source gives the length of the entire

• Other types of information that the

• Sequence portion that

• Sequence segment that

• The CDS begins with the first nucleotide

• Transcribed but spliced-out

• Consensus sequence for the

• Base Count gives the total

− ID ENTRY_NAME DATA_CLASS; MOLECULE_TYPE;

− N.B. The ID can change, e.g. serotonine receptors have got a

JRNL TITL STRUCTURAL BASIS OF ALPHA-HELIX PROPENSITY AT TWO 111L 11

JRNL TITL 2 SITES IN T4 LYSOZYME 111L 12

JRNL REF SCIENCE V. 260 1637 1993 111L 13

JRNL REFN ASTM SCIEAS US ISSN 0036-8075 038 111L 14

⚫ HET & FORMUL

⚫ metals, cofactors, ions, etc.

⚫ CRYST1, ORIGX1, ORIGX2, ORIGX3, SCALE1, SCALE2, SCALE3

⚫ TER record terminates the amino acid chain

HETATM 5207 O HOH 0 15.379 1.907 3.295 1.00

You might also like