0% found this document useful (0 votes)

46 views26 pages

Retrieve GenBank Sequences with R

Uploaded by

Pedro Augusto Freire

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views26 pages

Retrieve GenBank Sequences with R

Uploaded by

Pedro Augusto Freire

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Getting Sequences from GenBank using R-packages

• Open R and select the working directory where you want to output sequence files

Misc>Change Working Directory>select a folder (e.g., R_class_winter_2015)

Alternatively:

setwd("/Users/jcsantos/Desktop/R_class_winter_2015/1_getting_sequences_from_GenBank")!
!
#I open terminal and drag the folder to it to get the path. Then, copy and paste.!

• We need to install and load the following packages:

[Link]("ape")!
[Link]("seqinr")!
!
library(ape) #this is a general R-package for phylogenetics and comparative methods!
library("seqinr") #this is an specialized package for nucleotide sequence management!

• Let’s check that our packages have been loaded correctly

!
sessionInfo()!
34
Getting Sequences from GenBank using R-packages

• Let's use 'ape' to read the sequence from GenBank this with the function: ?[Link]!

• This function connects to the GenBank database, and reads nucleotide sequences using
accession numbers given as arguments.

• Usage (do not run)

[Link]([Link], [Link] = [Link], [Link] = FALSE)!

#[Link]: a vector of mode character giving the accession numbers.!

#[Link]: the names to give to each sequence; by default the accession numbers.!
#[Link]: a logical whether to return the sequences as an object "DNAbin”.!

• Let's read the casque-headed lizard (Basiliscus basiliscus) RAG1 sequence JF806202
!
seq_1_DNAbin <- [Link]("JF806202") #save as DNAbin object:!
attr(seq_1_DNAbin, "species") #to get the specie name of the sequence!
seq_1_DNAbin$JF806202!
str(seq_1_DNAbin) # we get the structure of the object!
!
#save as character object:!
!
seq_1_character <- [Link]("JF806202", [Link] = TRUE)!
seq_1_character #this is not a very nice format!

35
Read sequences using accession numbers
• Create a vector of GenBank accession numbers that we want

lizards_accession_numbers <- c("JF806202", "HM161150", "FJ356743", "JF806205", !

"JQ073190", "GU457971", "FJ356741", "JF806207",!
"JF806210", "AY662592", "AY662591", "FJ356748", !
"JN112660", "AY662594", "JN112661", "HQ876437", !
"HQ876434", "AY662590", "FJ356740", "JF806214", !
"JQ073188", "FJ356749", "JQ073189", "JF806216", !
"AY662598", "JN112653", "JF806204", "FJ356747", !
"FJ356744", "HQ876440", "JN112651", "JF806215",!
"JF806209") !
#create a vector a GenBank accession numbers!

• Get those sequences and save them in a single DNAbin object:!

!
lizards_sequences <- [Link](lizards_accession_numbers) #read sequences and place
them in a DNAbin object!
!
lizards_sequences #a brief summary of what is in the object, including base composition!
!
!
str(lizards_sequences) #a list of the DNAbin elements with length of the sequences!
#notice the one of the attributes is the species names!

36
Read sequences and create a fasta file format

• Lets explore more the DNAbin object:

attributes(lizards_sequences) #see the list of attributes and contents !

!
names(lizards_sequences) #the accession numbers!
!
attr(lizards_sequences, "species") # we get the species list. Notice this !
# attr is slightly different function!

• However, it is hard remember which accession number corresponds to which species.

So we can use the previous information to create first a vector with such information

lizards_sequences_GenBank_IDs <- paste(attr(lizards_sequences, "species"), names

(lizards_sequences), sep ="_RAG1_") !
!
## build a character vector with the species, GenBank accession numbers, and gene!
## name "_RAG1_” this is its common abbreviation: recombination activating protein 1!
## notice the use of the paste function: textA, textB, textC!
## results in: textAtextCtextB!
!
lizards_sequences_GenBank_IDs #a more informative vector of names for our sequences!

37
Write a fasta file format
• Let’s write sequences to a text file in fasta format using [Link](). However, only
accession numbers are included.

?[Link] # This function writes in a file a list of DNA sequences in sequential,

interleaved, or FASTA format.!
!
### we are going to write in fasta format!
!
[Link](lizards_sequences, file ="lizard_fasta_1.fasta", format = "fasta", append =
FALSE, nbcol = 6, colsep = " ", colw = 10)!
!
########### Some relevant arguments for [Link]()!
!
#x: a list or a matrix of DNA sequences.!
!
#file: a file name specified to contain our sequences!
!
#format: Three choices are possible: "interleaved", "sequential", or "fasta", or any
#unambiguous abbreviation of these.!
!
#append: a logical, if TRUE the data are appended to the file without erasing the data
#possibly existing in the file, otherwise the file is overwritten (FALSE the default).!
!
#nbcol: a numeric specifying the number of columns per row (6 by default)!
!
#colsep: a character used to separate the columns (a single space by default).!
!
#colw: a numeric specifying the number of nucleotides per column (10 by default).!
###########! 38
Write a fasta file format
• Lets explore our recently created file ‘lizard_fasta_1.fasta’. Drag and drop this file in the
text editor

• This file has our sequences, but we only have the accession numbers
39
Rewrite a fasta file format with more information
• Read our fasta file using the seqinr package

lizard_seq_seqinr_format <- [Link](file = "lizard_fasta_1.fasta", seqtype = "DNA",

[Link] = TRUE, forceDNAtolower = FALSE)!
!
lizard_seq_seqinr_format #this shows different form to display the same sequence !
#information !
!
• Rewrite our fasta file using the name vector that we created previously
!
[Link](sequences = lizard_seq_seqinr_format, names = lizards_sequences_GenBank_IDs,
nbchar = 10, [Link] = "lizard_seq_seqinr_format.fasta")!
!
#Suggestion: Do not rearrange, delete or add sequenced to the fasta file, as the
function will assign the names in the order provided in the file and the name vector!
!
• Let’s check our new fasta file ‘lizard_seq_seqinr_format.fasta’
!

40
Get sequences without using accession numbers

• We can use a package that use an API (application programming interface) to interact
with the NCBI website.

More info in: [Link]

[Link] ("rentrez")!
library (rentrez)!
!
• Let’s get some lizard sequences
!
lizard <- "Basiliscus basiliscus[Organism]" #We want a character vector!
!
#nucleotide database (nuccore) and retmax determines the max number!
lizard_search <- entrez_search(db="nuccore", term=lizard, retmax=40) !
lizard_search!
lizard_search$ids #gives you the NCBI ids!
!
!
#gets your sequences as a character vector!
lizard_seqs <- entrez_fetch(db="nuccore", id=lizard_search$ids, rettype="fasta")!
lizard_seqs!

41
Get sequences without using accession numbers

• Lets get our Basiliscus basiliscus RAG 1 sequence

!
Bbasiliscus_RAG1 <- "Basiliscus basiliscus[Organism] AND RAG1[Gene]”!
!
Bbasiliscus_RAG1_search <- entrez_search(db="nuccore", term=Bbasiliscus_RAG1, retmax=10) !
#nucleotide database (nuccore) and retmax determines no more than 10 access numbers to
return!
!
Bbasiliscus_RAG1_search$ids #gives you the NCBI ids!
!
Bbasiliscus_RAG1_seqs <- entrez_fetch(db="nuccore", id=Bbasiliscus_RAG1_search$ids,
rettype="fasta")!
!
Bbasiliscus_RAG1_seqs #notice \n (new line) delimiter. Other common delimiters are \r !
#(carriage return) and \t (tab).!
!
write(Bbasiliscus_RAG1_seqs, "Bbasiliscus_RAG1.fasta", sep="\n") #gets sequence to a
file!
!
• We can read our fasta file using seqinr package
!
Bbasiliscus_RAG1_seqinr_format <- [Link](file = "Bbasiliscus_RAG1.fasta", seqtype =
"DNA", [Link] = TRUE, forceDNAtolower = FALSE)!
!
Bbasiliscus_RAG1_seqinr_format # you can also check the .fasta file in the working
folder!
!
42
Example: Accessing Cytochrome B Sequences

• We can use the ‘rentrez’ package to get lots of sequences using taxonomic
classifications for specific markers
!
Liolaemus_CYTB <- "Liolaemus[Organism] AND CYTB[Gene]” !
!
#This is a well-studied gene from this genus of South American lizards!
!
Liolaemus_CYTB_search <- entrez_search(db="nuccore", term=Liolaemus_CYTB, retmax=100) !
!
Liolaemus_CYTB_search #There are 2539 sequences that match this query !
!
!
• Let’s adjust the search and fetch all sequences of of sequences using taxonomic
classifications for specific markers!
!
!
Liolaemus_CYTB_search_2 <- entrez_search(db="nuccore", term=Liolaemus_CYTB, retmax=2539)!
!
Liolaemus_CYTB_search_2$ids #gives you the NCBI ids!
!
Liolaemus_CYTB_seqs <- entrez_fetch(db="nuccore", id=Liolaemus_CYTB_search_2$ids ,
rettype="fasta")!
!
#we get an error “client error: (414) Request-URI Too Long”. We are asking too many
sequences!

43
Example: Accessing Cytochrome B Sequences

• Lets adjust the search and fetch by smaller chunks so we can get the first 1500
sequences!
!
Liolaemus_CYTB_seqs_part_1 <- entrez_fetch(db="nuccore", id=Liolaemus_CYTB_search_2$ids
[1:500] , rettype="fasta")!
!
Liolaemus_CYTB_seqs_part_2 <- entrez_fetch(db="nuccore", id=Liolaemus_CYTB_search_2$ids
[501:1000] , rettype="fasta")!
!
Liolaemus_CYTB_seqs_part_3 <- entrez_fetch(db="nuccore", id=Liolaemus_CYTB_search_2$ids
[1001:1500] , rettype="fasta")!
!
!
• Lets write as single file by appending all 3 chucks of sequences
!
write(Liolaemus_CYTB_seqs_part_1, "Liolaemus_CYTB_seqs.fasta", sep="\n")!
!
write(Liolaemus_CYTB_seqs_part_2, "Liolaemus_CYTB_seqs.fasta", sep="\n", append = TRUE)
#it gets the sequences to the same file by changing the logical argument of append from
#the default FALSE to TRUE (i.e., can abbreviate TRUE with T or other unambiguous
#abbreviation)!
!
write(Liolaemus_CYTB_seqs_part_3, "Liolaemus_CYTB_seqs.fasta", sep="\n", append = TRUE) !
#you will get a 1.3 Mb file with all 1500 sequences!

44
Example: Accessing Cytochrome B Sequences

• We can read our fasta file using the seqinr package and rename the sequences
!
Liolaemus_CYTB_seqs_seqinr_format <- [Link](file = "Liolaemus_CYTB_seqs.fasta",
seqtype = "DNA", [Link] = TRUE, forceDNAtolower = FALSE)!
!
Liolaemus_CYTB_seqs_seqinr_format!
!
Liolaemnus_CYTB_names <- attr(Liolaemus_CYTB_seqs_seqinr_format, "name")!
!
Liolaemnus_CYTB_names <- gsub("\\..*","", Liolaemnus_CYTB_names) !
!
#eliminate characters after "." using ?gsub (Pattern Matching and Replacement)!
!
Liolaemnus_CYTB_names <- gsub("^.*\\|", "", Liolaemnus_CYTB_names) !
!
#eliminate characters before "|" using ?gsub (Pattern Matching and Replacement)!
!
Liolaemnus_CYTB_names!
!
!

45
Example: Accessing Cytochrome B Sequences

• We can read our fasta file using ape package to get accession numbers and species
names
!
Liolaemus_CYTB_seqs_ape_format <- [Link](Liolaemnus_CYTB_names)!
!
attr(Liolaemus_CYTB_seqs_ape_format, "species") !
#to get the species names of the sequence!
!
names(Liolaemus_CYTB_seqs_ape_format)!
!
Liolaemus_CYTB_seqs_GenBank_IDs <- paste(attr(Liolaemus_CYTB_seqs_ape_format,
"species"), names(Liolaemus_CYTB_seqs_ape_format), sep="_CYTB_") !
## build a vector object with the species, GenBank accession numbers, and type of gene!
!
Liolaemus_CYTB_seqs_GenBank_IDs #vector of names to add to sequences!
!
# Read our fasta file 'Liolaemus_CYTB_seqs.fasta' using seqinr package!
!
Liolaemus_CYTB_seqs_seqinr_format <- [Link](file = "Liolaemus_CYTB_seqs.fasta",
seqtype = "DNA", [Link] = TRUE, forceDNAtolower = FALSE)!
!
# Rewrite our fasta file using the name vector that we created previously!
!
[Link](sequences = Liolaemus_CYTB_seqs_seqinr_format, names =
Liolaemus_CYTB_seqs_GenBank_IDs, nbchar = 10, [Link] =
"Liolaemus_CYTB_seqs_seqinr_format.fasta”)!

46
47
Alignment and Simultaneous Tree Estimation

• We are going to use SATe-2 (SATé - Simultaneous Alignment and Tree Estimation)
!
!
URL: [Link]
!
!

48
Alignment and Simultaneous Tree Estimation

• From the Developers’ webpage (University of Kansas: Jiaye Yu, Mark Holder, Jeet
Sukumaran, Siavash Mirarab, and Jamie Oaks):

SATé is a software package for inferring a sequence alignment and phylogenetic tree.
The iterative algorithm involves repeated alignment and tree searching operations. The
original data set is divided into smaller subproblems by a tree-based decomposition.
These sub-problems are aligned and further merged for phylogenetic tree inference.

Currently, the following tools are supported, and are bundled with the SATé distribution:

ClustalW 2.0.12 (sequence alignment program)

MAFFT 6.717 (sequence alignment program)
MUSCLE 3.7 (sequence alignment program)
OPAL 1.0.3 (sequence alignment program)

PRANK 100311 (phylogeny-aware alignment program)

RAxML 7.2.6 (phylogeny estimator program)

FastTree 2.1.4 (phylogeny estimator program)
!
49
SATe-2 needs Python 2.7 (Upgrade Python Instructions)
• MAC OS: Open terminal (go the HD>Applications>Utilities>Terminal)

• MAC OS: Check your version of Python

python --version!
!
• MAC OS: if necessary upgrade python to 2.7 as required by SATe-II
!
[Link]

50
Install SATe-2
• Download SATe-II precompiled from UT-Austin website:

[Link]

***For those more adventurous you can download the command based 'SATe-II' from:

[Link]

download: [Link]

Follow the instructions in the main webpage

51
Preparing FASTA filed for SATe-2
• Download the FASTA files from the course website

Liolaemus_CYTB.fasta
Lizard_RAG1.fasta

• Create two output folders for the alignment results in your desktop and place the fasta
files in the corresponding one

folder: Liolaemus_CYTB
folder: Lizards_RAG1

52
Running SATe-2
• Open SATe-II GUI by clicking on the executable on the program folder:

• Explore the console and the options in the SATe-II GUI version:

53
Running SATe-2
• Explore the options in the SATe-II GUI version:

External Tools:
Aligner: [ClustalW2, MAFFT, PRANK, OPAL]
Merger: [MUSCLE, OPAL]
Tree Estimator: [RAXML, FASTTREE]
Model: [RAxML-options: GTRCAT, GTRGAMMA, GTRGAMMAI;
FASTTREE-options: GTR+G20, GTR+CAT, JC+G20, JC+CAT ]

Sequences and Tree:

Sequence file ...: [This is the folder where our fasta file resides]
Multi-locus Data [option]
Data Type: [DNA, RNA, Protein]
Initial Aligment [option]
Tree file (optional): [Provide if you have an initial phylogeny associated with the sequences]

Workflow Settings:
Algorithm [option] Two-Phase (not SATe-II)
Post-Processing [option] Extra RAxML Search

54
Running SATe-2
• Explore the options in the SATe-II GUI version:

Job Settings:
Job Name: [give a name for the job]
Output Dir.: [Select the corresponding directory for the output aligment]
CPU(s) Available: [It will depend on your computer]
Max. Memory (MB): [It will depend on your computer]

SATe-II Settings
Quick Set: [Presets: SATe-II_fast, SATe-II_ML, SATe-II_simple, custom]
Max. Subproblem:
Percentage [default 50]
Size [default 10]
Decomposition:
Centroid (fast) or Longest (slow)
Apply Stop Rule: [options]
Stopping Rule: Blind Mode Enabled
Time Limit (hr) [default 24 hours]
Iteration limit [default 1 iterations]
Return: [Default are Final or Best alignment]

55
Running SATe-2: Select the Following Options
External Tools:
Aligner: [MAFFT]
Merger: [MUSCLE]
Tree Estimator: [RAXML]
Model: [GTRGAMMAI]

Sequences and Tree:

Sequence file ...: [Liolaemus_CYTB.fasta]
Data Type: [DNA]
Tree file (optional): [None]

Workflow Settings:
Algorithm: [None] Two-Phase (not SATe-II)
Post-Processing: [None] Extra RAxML Search

Job Settings:
Job Name: [Liolaemus_CYTB_alignment]
Output Dir.: [Liolaemus_CYTB] Select the corresponding directory for the output alignment
CPU(s) Available: [2] It will depend on your computer
Max. Memory (MB): [1000] It will depend on your computer

SATe-II Settings
Quick Set: [SATe-II_fast]
Iteration Limit: [3]
Leave other options unchanged
56
Running SATe-2

57
Running SATe-2

• Explore the output in a text editor. The alignment is located in these .aln files in fasta
format:

satejob.marker001.Liolaemus_CYTB.aln

Repeat the same process with

the Lizard_RAG1.fasta file

58
Mesquite: Visually explore the alignments

• Download mesquite:

[Link]

Introduction to Bioinformatics Course
No ratings yet
Introduction to Bioinformatics Course
35 pages
R Bioconductor RNA-Seq Analysis Guide
No ratings yet
R Bioconductor RNA-Seq Analysis Guide
4 pages
Bioinformatics Exercises: Phylogenetic Trees
No ratings yet
Bioinformatics Exercises: Phylogenetic Trees
8 pages
BioPython Tools for Bioinformatics
No ratings yet
BioPython Tools for Bioinformatics
5 pages
R Textbook Full
No ratings yet
R Textbook Full
96 pages
NCBI Database Exploration Guide
No ratings yet
NCBI Database Exploration Guide
4 pages
Bioinformatics: An Overview of Techniques
100% (1)
Bioinformatics: An Overview of Techniques
41 pages
Intro to Bioinformatics Course Notes
No ratings yet
Intro to Bioinformatics Course Notes
56 pages
Biopython Org DIST Docs Tutorial Tutorial HTML
No ratings yet
Biopython Org DIST Docs Tutorial Tutorial HTML
267 pages
The Bioinformatics Toolbox Extends MATLAB
No ratings yet
The Bioinformatics Toolbox Extends MATLAB
19 pages
Adegenet R Package Tutorial
No ratings yet
Adegenet R Package Tutorial
63 pages
Introduction to Sequence Databases
No ratings yet
Introduction to Sequence Databases
71 pages
Bioinformatics Course: DNA Sequence Analysis
No ratings yet
Bioinformatics Course: DNA Sequence Analysis
17 pages
Bioinformatics and Cladogram Construction
No ratings yet
Bioinformatics and Cladogram Construction
10 pages
Bioinformatics Practical Labs Overview
No ratings yet
Bioinformatics Practical Labs Overview
7 pages
Overview of Biological Databases
No ratings yet
Overview of Biological Databases
50 pages
R Tutorial for Gene Expression Analysis
No ratings yet
R Tutorial for Gene Expression Analysis
11 pages
Biopython Tutorial
100% (1)
Biopython Tutorial
26 pages
RIP Tutorials Bioinformatics
No ratings yet
RIP Tutorials Bioinformatics
19 pages
PGLS Analysis in R for Primate Data
No ratings yet
PGLS Analysis in R for Primate Data
22 pages
R Programming Basics for Biologists
No ratings yet
R Programming Basics for Biologists
29 pages
Bioinformatics Databases Overview
100% (4)
Bioinformatics Databases Overview
82 pages
Introduction to Bioinformatics Concepts
No ratings yet
Introduction to Bioinformatics Concepts
12 pages
Bioinformatics Database Overview
No ratings yet
Bioinformatics Database Overview
18 pages
Bioinformatics Resources Overview
No ratings yet
Bioinformatics Resources Overview
55 pages
BI W2 Ex Ans
No ratings yet
BI W2 Ex Ans
9 pages
Gene Identification Techniques Overview
No ratings yet
Gene Identification Techniques Overview
35 pages
Bioinformatics Practical Report by Zainab Sohail
No ratings yet
Bioinformatics Practical Report by Zainab Sohail
29 pages
DNA Analysis of Frog Traits and Genetics
No ratings yet
DNA Analysis of Frog Traits and Genetics
4 pages
R Basics: A Simple Tutorial Guide
No ratings yet
R Basics: A Simple Tutorial Guide
15 pages
Exploring NCBI: Bioinformatics Lab 1
No ratings yet
Exploring NCBI: Bioinformatics Lab 1
22 pages
Biological vs Technical Variability in Genomics
No ratings yet
Biological vs Technical Variability in Genomics
5 pages
Introduction to Bioinformatics Basics
No ratings yet
Introduction to Bioinformatics Basics
35 pages
DNA Sequence Analysis in Bioinformatics
No ratings yet
DNA Sequence Analysis in Bioinformatics
38 pages
Bioinformatics Laboratory Manual
No ratings yet
Bioinformatics Laboratory Manual
68 pages
Reading Data into R: Assignments Guide
No ratings yet
Reading Data into R: Assignments Guide
2 pages
Bi Workbook
No ratings yet
Bi Workbook
13 pages
Introduction to Bioinformatics Basics
No ratings yet
Introduction to Bioinformatics Basics
7 pages
Hack the Genome: Biomolecular Cryptology
No ratings yet
Hack the Genome: Biomolecular Cryptology
49 pages
Comparing DNA Sequences To Understand Evolutionary Relationships With Blast
No ratings yet
Comparing DNA Sequences To Understand Evolutionary Relationships With Blast
3 pages
Bioinformatics Exercises on TIGR and BLAST
100% (1)
Bioinformatics Exercises on TIGR and BLAST
6 pages
Constructing Cladograms with Bioinformatics
No ratings yet
Constructing Cladograms with Bioinformatics
7 pages
Introduction to Bioinformatics Basics
No ratings yet
Introduction to Bioinformatics Basics
20 pages
PAM vs BLOSUM: Key Differences Explained
75% (4)
PAM vs BLOSUM: Key Differences Explained
9 pages
Bioinformatics: DNA & Protein Databases
No ratings yet
Bioinformatics: DNA & Protein Databases
53 pages
Bioinformatics Assignment Guide: Tools & Genes
No ratings yet
Bioinformatics Assignment Guide: Tools & Genes
9 pages
Bioinformatics Tools Overview at NYU
No ratings yet
Bioinformatics Tools Overview at NYU
50 pages
Essential R Commands for Data Analysis
No ratings yet
Essential R Commands for Data Analysis
2 pages
Coursera BioinfoMethods-I Lab01 PDF
No ratings yet
Coursera BioinfoMethods-I Lab01 PDF
22 pages
Biological Data Search and Analysis Guide
No ratings yet
Biological Data Search and Analysis Guide
12 pages
Basics in Bioinformatics Course Plan
No ratings yet
Basics in Bioinformatics Course Plan
5 pages
Exploring Ensembl Genome Browser
No ratings yet
Exploring Ensembl Genome Browser
4 pages
Creating a Sequence Dictionary in Bioinformatics
No ratings yet
Creating a Sequence Dictionary in Bioinformatics
5 pages
Introduction to R for STAT 540
No ratings yet
Introduction to R for STAT 540
6 pages
NGS File Handling in Python Guide
No ratings yet
NGS File Handling in Python Guide
18 pages
Biopython Tutorial and Cookbook Guide
No ratings yet
Biopython Tutorial and Cookbook Guide
237 pages
Software Quality Engineering Overview
No ratings yet
Software Quality Engineering Overview
36 pages
JavaScript TDD and Unit Testing Guide
No ratings yet
JavaScript TDD and Unit Testing Guide
78 pages
Array
No ratings yet
Array
15 pages
Comprehensive HTML Tags List
No ratings yet
Comprehensive HTML Tags List
5 pages
CSC 23 Computer Science Theory Exam
No ratings yet
CSC 23 Computer Science Theory Exam
11 pages
Aws Cli Book
No ratings yet
Aws Cli Book
24 pages
Machine Learning in Test Automation
No ratings yet
Machine Learning in Test Automation
38 pages
Cloud Security Lab Manual for CSE
No ratings yet
Cloud Security Lab Manual for CSE
53 pages
Parameter Modes in PL/SQL Procedures
No ratings yet
Parameter Modes in PL/SQL Procedures
8 pages
BCA Operating System Exam Paper 12806
No ratings yet
BCA Operating System Exam Paper 12806
2 pages
Integrating SQL Server with WhatsApp API
No ratings yet
Integrating SQL Server with WhatsApp API
4 pages
Understanding Python For Loops
No ratings yet
Understanding Python For Loops
1 page
Structure of Software Requirements Specification
No ratings yet
Structure of Software Requirements Specification
12 pages
Mzuzu Diocese Computer Studies Exam 2021
No ratings yet
Mzuzu Diocese Computer Studies Exam 2021
10 pages
Oracle Apps Interface Types and Steps
No ratings yet
Oracle Apps Interface Types and Steps
3 pages
ServiceNow Developer Resume - Vemula Divya
No ratings yet
ServiceNow Developer Resume - Vemula Divya
3 pages
Visual Basic I.R. Remote Control Code
No ratings yet
Visual Basic I.R. Remote Control Code
24 pages
AI-Powered Resume Parser Project
No ratings yet
AI-Powered Resume Parser Project
48 pages
Computerized Hostel Info System in VB
No ratings yet
Computerized Hostel Info System in VB
56 pages
Python Bitwise and Logical Operators
No ratings yet
Python Bitwise and Logical Operators
22 pages
Blockchain Smart Contracts for E-commerce
No ratings yet
Blockchain Smart Contracts for E-commerce
9 pages
Mobile Banking App Features Overview
No ratings yet
Mobile Banking App Features Overview
1 page
Beginner's Guide to Git Version Control
No ratings yet
Beginner's Guide to Git Version Control
10 pages
Understanding Node.js REPL and Globals
No ratings yet
Understanding Node.js REPL and Globals
3 pages
Understanding Inheritance in C++
No ratings yet
Understanding Inheritance in C++
11 pages
VEST 4.0 User's Manual
No ratings yet
VEST 4.0 User's Manual
59 pages
PFF
No ratings yet
PFF
9 pages
Arduino LED Control and Monitoring
No ratings yet
Arduino LED Control and Monitoring
6 pages
Coding Unit Plan for Beginners
No ratings yet
Coding Unit Plan for Beginners
3 pages
SQL Employee Update and Query Analysis
No ratings yet
SQL Employee Update and Query Analysis
22 pages

Retrieve GenBank Sequences with R

Uploaded by

Retrieve GenBank Sequences with R

Uploaded by

Getting Sequences from GenBank using R-packages

Misc>Change Working Directory>select a folder (e.g., R_class_winter_2015)

• We need to install and load the following packages:

• Let’s check that our packages have been loaded correctly

• Usage (do not run)

#[Link]: a vector of mode character giving the accession numbers.!

lizards_accession_numbers <- c("JF806202", "HM161150", "FJ356743", "JF806205", !

• Get those sequences and save them in a single DNAbin object:!

• Lets explore more the DNAbin object:

attributes(lizards_sequences) #see the list of attributes and contents !

• However, it is hard remember which accession number corresponds to which species.

lizards_sequences_GenBank_IDs <- paste(attr(lizards_sequences, "species"), names

?[Link] # This function writes in a file a list of DNA sequences in sequential,

lizard_seq_seqinr_format <- [Link](file = "lizard_fasta_1.fasta", seqtype = "DNA",

More info in: [Link]

• Lets get our Basiliscus basiliscus RAG 1 sequence

ClustalW 2.0.12 (sequence alignment program)

PRANK 100311 (phylogeny-aware alignment program)

RAxML 7.2.6 (phylogeny estimator program)

• MAC OS: Check your version of Python

Follow the instructions in the main webpage

Sequences and Tree:

Sequences and Tree:

Repeat the same process with

You might also like