0% found this document useful (0 votes)
75 views

Ensembl Genome Database Project

The Ensembl genome database project provides a centralized genomic resource for researchers. It uses automated processes to annotate genes and integrate genomic and biological data across many species. Ensembl makes this genomic data freely available online through interactive browsers and APIs, and supports over 50,000 genomes.

Uploaded by

william919
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views

Ensembl Genome Database Project

The Ensembl genome database project provides a centralized genomic resource for researchers. It uses automated processes to annotate genes and integrate genomic and biological data across many species. Ensembl makes this genomic data freely available online through interactive browsers and APIs, and supports over 50,000 genomes.

Uploaded by

william919
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Ensembl genome database project

Ensembl genome database project is a scientific


project at the European Bioinformatics Institute, which Ensembl genome database project.
provides a centralized resource for geneticists,
molecular biologists and other researchers studying the
genomes of our own species and other vertebrates and
model organisms.[2][3][4] Ensembl is one of several
well known genome browsers for the retrieval of
genomic information.

Similar databases and browsers are found at NCBI and


the University of California, Santa Cruz (UCSC).

History
Content
The human genome consists of three billion base pairs,
which code for approximately 20,000–25,000 genes. Description Ensembl
However the genome alone is of little use, unless the Contact
locations and relationships of individual genes can be Research center
European Bioinformatics
identified. One option is manual annotation, whereby a
Institute
team of scientists tries to locate genes using
experimental data from scientific journals and public Primary citation Yates, et al. (2020)[1]
databases. However this is a slow, painstaking task. Access
The alternative, known as automated annotation, is to
use the power of computers to do the complex pattern- Website www.ensembl.org (http://ww
matching of protein to DNA.[5][6] The Ensembl project w.ensembl.org/)
was launched in 1999 in response to the imminent
completion of the Human Genome Project, with the initial goals of automatically annotate the human
genome, integrate this annotation with available biological data and make all this knowledge publicly
available.[2]

In the Ensembl project, sequence data are fed into the gene annotation system (a collection of software
"pipelines" written in Perl) which creates a set of predicted gene locations and saves them in a MySQL
database for subsequent analysis and display. Ensembl makes these data freely accessible to the world
research community. All the data and code produced by the Ensembl project is available to download,[7]
and there is also a publicly accessible database server allowing remote access. In addition, the Ensembl
website provides computer-generated visual displays of much of the data.

Over time the project has expanded to include additional species (including key model organisms such as
mouse, fruitfly and zebrafish) as well as a wider range of genomic data, including genetic variations and
regulatory features. Since April 2009, a sister project, Ensembl Genomes, has extended the scope of
Ensembl into invertebrate metazoa, plants, fungi, bacteria, and protists, focusing on providing taxonomic
and evolutionary context to genes, whilst the original project continues to focus on vertebrates.[8][9]
As of 2020, Ensembl supported over 50 000 genomes across both Ensembl and Ensembl Genomes
databases, adding some new innovative features such as Rapid Release (https://rapid.ensembl.org/index.ht
ml), a new website designed to make genome annotation data available more quickly to users, and
COVID-19 (https://covid-19.ensembl.org/index.html), a new website to access to SARS-CoV-2 reference
genome.

Displaying genomic data


Central to the Ensembl concept is the ability to automatically
generate graphical views of the alignment of genes and other
genomic data against a reference genome. These are shown as data
tracks, and individual tracks can be turned on and off, allowing the
user to customise the display to suit their research interests. The
interface also enables the user to zoom in to a region or move along
the genome in either direction. Gene SGCB aligned to the human
genome
Other displays show data at varying levels of resolution, from
whole karyotypes down to text-based representations of DNA and
amino acid sequences, or present other types of display such as trees of similar genes (homologues) across a
range of species. The graphics are complemented by tabular displays, and in many cases data can be
exported directly from the page in a variety of standard file formats such as FASTA.

Externally produced data can also be added to the display by uploading a suitable file in one of the
supported formats, such as BAM, BED, or PSL.

Graphics are generated using a suite of custom Perl modules based on GD, the standard Perl graphics
display library.

Alternative access methods


In addition to its website, Ensembl provides a REST API and a Perl API[10] (Application Programming
Interface) that models biological objects such as genes and proteins, allowing simple scripts to be written to
retrieve data of interest. The same API is used internally by the web interface to display the data. It is
divided in sections like the core API, the compara API (for comparative genomics data), the variation API
(for accessing SNPs, SNVs, CNVs..), and the functional genomics API (to access regulatory data). The
Ensembl website provides extensive information on how to install and use the API (http://www.ensembl.or
g/info/docs/api/index.html).

This software can be used to access the public MySQL database, avoiding the need to download enormous
datasets. The users could even choose to retrieve data from the MySQL with direct SQL queries, but this
requires an extensive knowledge of the current database schema.

Large datasets can be retrieved using the BioMart data-mining tool. It provides a web interface for
downloading datasets using complex queries.

Last, there is an FTP (http://ftp.ensembl.org/) server which can be used to download entire MySQL
databases as well some selected data sets in other formats.

Current species
The annotated genomes include most fully sequenced vertebrates and selected model organisms. All of
them are eukaryotes, there are no prokaryotes. As of 2022, there are 271 species registered, this
includes:[11]
Species
Chordata Angola colobus, black-capped squirrel
monkey, black snub-nosed monkey,
bonobo, bushbaby, capuchin,
chimpanzee, common marmoset,
Coquerel's sifaka, crab-eating macaque,
drill, human, macaque, mouse lemur,
Primates
gelada, gibbon, golden snub-nosed
monkey, gorilla, greater bamboo lemur,
green monkey, Ma's night monkey, olive
baboon, orangutan, pig-tailed macaque,
sooty mangabey, tarsier, Ugandan red
colobus

Scandentia tree shrew


Euarchontoglires
Algerian mouse, alpine marmot,
american beaver, arctic ground squirrel,
Brazilian guineapig, chinese hamster,
damaraland mole rat, daurian ground
squirrel, degu, eurasian red squirrel,
golden hamster, ground squirrel,
Glires
guineapig, kangaroo rat, lesser Egyptian
(Rodents +
jerboa, long-tailed chinchilla, mongolian
Lagomorphs)
gerbil, mouse, naked mole-rat, North
American deermouse, rat, pika, prairie
Mammalia vole, rabbit, Ryukyu mouse, shrew
mouse, steppe mouse, thirteen-lined
ground squirrel, Upper Galilee mountains
blind mole rat
Alpaca, american bison, american black
bear, american mink, Arabian camel,
asian black bear, beluga whale, blue
whale, chacoan peccary, California sea
lion, Canada lynx, cat, cow, dingo, dog,
dolphin, domestic yak, donkey, goat,
Laurasiatheria ferret, giant panda, greater horseshoe
bat, hedgehog, horse, leopard, lesser
hedgehog tenrec, lion, meerkat,
megabat, microbat, narwhal, polar bear,
pig, red fox, sheep, shrew, Siberian
musk deer, sperm whale, Siberian tiger,
vaquita, wild yak, yarkand deer

Afrotheria Elephant, hyrax, tenrec

Xenarthra Armadillo, sloth


Common wombat, koala, opossum,
Marsupialia
Tasmanian devil, wallaby

Monotremes Platypus

Argentine black and white tegu, blue-


ringed sea krait, central bearded dragon,
chinese softshell turtle, common
snapping turtle, common wall lizard,
desert tortoise, eastern brown snake,
Reptilia saltwater crocodile, Goode's thornscrub
tortoise, green anole, indian cobra,
komodo dragon, mainland tiger snake,
painted turtle, Pinta Island tortoise,
three-toed box turtle, tuatara, West
African mud turtle
African ostrich, bengalese finch, blue-
crowned manakin, blue tit, budgerigar,
burrowing owl, chicken, chicken (Red
junglefowl), chicken (maternal Broiler),
chicken (paternal White leghorn layer),
chilean tinamou, colared flycatcher,
common canary, common kestrel, dark-
eyed junco, duck, eastern buzzard,
eastern spot-billed duck, emu, eurasian
eagle-owl, eurasian sparrowhawk, golden
eagle, golden pheasant, golden-collared
manakin, gouldian finch, great tit, great
Birds
spotted kiwi, helmeted guineafowl, indian
peafowl, japanese quail, kakapo, little
spotted kiwi, mallard, medium ground
finch, muscovy duck, New Caledonian
crow, northern spotted owl, okarito brown
kiwi, oriental scops owl, pink-footed
goose, ring-necked pheasant, ruff,
rufous-capped babbler, silver-eye, small
tree finch, spoon-billed sandpiper, superb
fairywren, Swainson's thrush, swan
goose, turkey, white-throated sparrow,
yellow-billed amazon, zebu, zebra finch
Lissamphibia Leisan spiny toad, Xenopus tropicalis

Amazon molly, asian arowana, atlantic


cod, atlantic herring, atlantic salmon,
ballan wrasse, barramundi perch, bicolor
damselfish, blind barbel, blue tilapia,
blunt-snouted clingfish, brown trout,
Burton's mouthbrooder, channel bull
blenny, channel catfish, chinese
rmedaka, chinook salmon, climbing
perch, clown anemonefish, coelacanth,
coho salmon, common carp, denticle
herring, eastern happy, electric eel,
elephant shark, european bass, gilthead
bream, golden-line barbel, goldfish,
greater amberjack, guppy, hagfish,
horned golden-line barbel, huchen, indian
glassy fish, indian medaka, japanese
medaka, javanese ricefish, jewelled
blenny, large yellow croaker, live
Teleosts sharksucker, lumpfish, lyretail cichlid,
Makobe island chichlid, mangrove
rivulus, mexican tetra, Midas chichlid,
Monterrey platyfish, mummichog, Nile
tilapia, northern pike, ocean sunfish,
orange clownfish, orbiculate cardinalfish,
Paramormyrops kingsleyae,
Periophthalmus magnuspinnatus, pike-
perch, pinecone soldierfish, platyfish,
rainbow trout, red-bellied piranha,
reedfish, round goby, sailfin molly,
sheepshead minnow, shortfin molly,
Siamese fighting fish, spinny chromis,
spotted gar, swamp eel, tetraodon, three-
spined stickleback, tiger tail seahorse,
tongue sole, turbot, turquoise killfish,
western mosquitofish, yellowtail
amberjack, Takifugu rubripes (fugu),
zebrafish, zebra mbuna, zigzag eel

Cyclostomata Petromyzon marinus (sea lamprey)


Tunicates Ciona intestinalis, Ciona savignyi
Drosophila melanogaster (fruitfly),
Insects Anopheles gambiae (mosquito), Aedes
Invertebrates aegypti (mosquito)

Worms Caenorhabditis elegans

Saccharomyces cerevisiae (baker's


Yeast
yeast)

Open source/mirrors
All data part of the Ensembl project is open access and all software is open source, being freely available to
the scientific community, under a CC BY 4.0 license. Currently, Ensembl database website is mirrored at
four different locations worldwide to improve the service.

Official mirror sites

UK (Sanger Institute) (https://www.ensembl.org/index.html) ---- main website


US West (Amazon AWS) (https://uswest.ensembl.org/index.html) ---- Cloud-based mirror on West Coast of United
States

US East (Amazon AWS) (https://useast.ensembl.org/index.html) ---- Cloud-based mirror on East Coast of United
States

Asia (Amazon AWS) (https://asia.ensembl.org/index.html) ---- Cloud-based mirror in Singapore

See also
List of sequenced eukaryotic genomes
List of biological databases
Sequence analysis
Sequence profiling tool
Sequence motif
UCSC Genome Browser
ENCODE

References
1. Yates A. D.; et al. (January 2020). "Ensembl 2020" (https://www.ncbi.nlm.nih.gov/pmc/article
s/PMC7145704). Nucleic Acids Res. 48 (D1): D682–D688. doi:10.1093/nar/gkz966 (https://d
oi.org/10.1093%2Fnar%2Fgkz966). PMC 7145704 (https://www.ncbi.nlm.nih.gov/pmc/article
s/PMC7145704). PMID 31691826 (https://pubmed.ncbi.nlm.nih.gov/31691826).
2. Hubbard, T. (1 January 2002). "The Ensembl genome database project" (https://www.ncbi.nl
m.nih.gov/pmc/articles/PMC99161). Nucleic Acids Research. 30 (1): 38–41.
doi:10.1093/nar/30.1.38 (https://doi.org/10.1093%2Fnar%2F30.1.38). PMC 99161 (https://w
ww.ncbi.nlm.nih.gov/pmc/articles/PMC99161). PMID 11752248 (https://pubmed.ncbi.nlm.ni
h.gov/11752248).
3. Flicek P, Amode MR, Barrell D, et al. (November 2010). "Ensembl 2011" (https://www.ncbi.nl
m.nih.gov/pmc/articles/PMC3013672). Nucleic Acids Res. 39 (Database issue): D800–
D806. doi:10.1093/nar/gkq1064 (https://doi.org/10.1093%2Fnar%2Fgkq1064).
PMC 3013672 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3013672). PMID 21045057
(https://pubmed.ncbi.nlm.nih.gov/21045057).
4. Flicek P, Aken BL, Ballester B, et al. (January 2010). "Ensembl's 10th year" (https://www.ncb
i.nlm.nih.gov/pmc/articles/PMC2808936). Nucleic Acids Res. 38 (Database issue): D557–
62. doi:10.1093/nar/gkp972 (https://doi.org/10.1093%2Fnar%2Fgkp972). PMC 2808936 (htt
ps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2808936). PMID 19906699 (https://pubmed.ncb
i.nlm.nih.gov/19906699).
5. Davis, Charles Patrick (29 March 2021). "Medical definition of Genome Annotation" (https://
web.archive.org/web/20210614173351/https://www.medicinenet.com/genome_annotation/d
efinition.htm). Archived from the original (https://www.medicinenet.com/genome_annotation/
definition.htm) on 14 June 2021. Retrieved 7 August 2022.
6. Curwen, Val; Eyras, Eduardo; Andrews, T. Daniel; Clarke, Laura; Mongin, Emmanuel;
Searle, Steven M. J.; Clamp, Michele (May 2004). "The Ensembl automatic gene annotation
system" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC479124). Genome Research. 14 (5):
942–950. doi:10.1101/gr.1858004 (https://doi.org/10.1101%2Fgr.1858004). ISSN 1088-9051
(https://www.worldcat.org/issn/1088-9051). PMC 479124 (https://www.ncbi.nlm.nih.gov/pmc/
articles/PMC479124). PMID 15123590 (https://pubmed.ncbi.nlm.nih.gov/15123590).
7. Ruffier, Magali; Kähäri, Andreas; Komorowska, Monika; Keenan, Stephen; Laird, Matthew;
Longden, Ian; Proctor, Glenn; Searle, Steve; Staines, Daniel; Taylor, Kieron; Vullo,
Alessandro; Yates, Andrew; Zerbino, Daniel; Flicek, Paul (January 2017). "Ensembl core
software resources: storage and programmatic access for DNA sequence and genome
annotation" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5467575). Database. 2017 (1):
bax020. doi:10.1093/database/bax020 (https://doi.org/10.1093%2Fdatabase%2Fbax020).
PMC 5467575 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5467575). PMID 28365736
(https://pubmed.ncbi.nlm.nih.gov/28365736).
8. Hubbard, T. J. P.; Aken, B. L.; Ayling, S.; Ballester, B.; Beal, K.; Bragin, E.; Brent, S.; Chen, Y.;
Clapham, P.; Clarke, L.; Coates, G. (January 2009). "Ensembl 2009" (https://www.ncbi.nlm.ni
h.gov/pmc/articles/PMC2686571). Nucleic Acids Research. 37 (Database issue): D690–
697. doi:10.1093/nar/gkn828 (https://doi.org/10.1093%2Fnar%2Fgkn828). ISSN 1362-4962
(https://www.worldcat.org/issn/1362-4962). PMC 2686571 (https://www.ncbi.nlm.nih.gov/pm
c/articles/PMC2686571). PMID 19033362 (https://pubmed.ncbi.nlm.nih.gov/19033362).
9. Howe, Kevin L.; Contreras-Moreira, Bruno; De Silva, Nishadi; Maslen, Gareth; Akanni,
Wasiu; Allen, James; Alvarez-Jarreta, Jorge; Barba, Matthieu; Bolser, Dan M.; Cambell,
Lahcen; Carbajo, Manuel (8 January 2020). "Ensembl Genomes 2020-enabling non-
vertebrate genomic research" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6943047).
Nucleic Acids Research. 48 (D1): D689–D695. doi:10.1093/nar/gkz890 (https://doi.org/10.10
93%2Fnar%2Fgkz890). ISSN 1362-4962 (https://www.worldcat.org/issn/1362-4962).
PMC 6943047 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6943047). PMID 31598706
(https://pubmed.ncbi.nlm.nih.gov/31598706).
10. Stabenau A, McVicker G, Melsopp C, Proctor G, Clamp M, Birney E (February 2004). "The
Ensembl Core Software Libraries" (http://genome.cshlp.org/content/14/5/929.full). Genome
Research. 14 (5): 929–933. doi:10.1101/gr.1857204 (https://doi.org/10.1101%2Fgr.185720
4). PMC 479122 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC479122). PMID 15123588
(https://pubmed.ncbi.nlm.nih.gov/15123588).
11. "Species List" (https://uswest.ensembl.org/info/about/species.html). uswest.ensembl.org.
Retrieved 5 August 2022.
External links
Official website (http://www.ensembl.org/)
Vega (http://vega.sanger.ac.uk)
Pre-Ensembl (http://pre.ensembl.org)
Ensembl genomes (http://www.ensemblgenomes.org)
UCSC Genome Browser (http://genome.ucsc.edu)
NCBI (https://www.ncbi.nlm.nih.gov/)
Ensembl: Browsing chordate genomes on EBI Train OnLine (http://www.ebi.ac.uk/training/on
line/course/ensembl-browsing-chordate-genomes)

Retrieved from "https://en.wikipedia.org/w/index.php?title=Ensembl_genome_database_project&oldid=1149604964"

You might also like