0% found this document useful (0 votes)
30 views

Cipres in Kepler: An Integrative Workflow Package For Streamlining Phylogenetic Data Analyses

CIPRes in Kepler is a workflow package that streamlines phylogenetic data analyses. It integrates data integration, analysis, and visualization steps into larger automated scientific processes using Kepler, an open-source scientific workflow system. CIPRes provides actors for common phylogenetic tasks and demo workflows for aligning sequences, building trees, and visualizing results. Future work includes adding more actors and database support to process larger datasets.

Uploaded by

Jover Yoker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

Cipres in Kepler: An Integrative Workflow Package For Streamlining Phylogenetic Data Analyses

CIPRes in Kepler is a workflow package that streamlines phylogenetic data analyses. It integrates data integration, analysis, and visualization steps into larger automated scientific processes using Kepler, an open-source scientific workflow system. CIPRes provides actors for common phylogenetic tasks and demo workflows for aligning sequences, building trees, and visualizing results. Future work includes adding more actors and database support to process larger datasets.

Uploaded by

Jover Yoker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 16

CIPRes in Kepler:

An integrative workflow package for


streamlining phylogenetic data analyses

Zhijie Guan1, Alex Borchers1, Timothy McPhillips2,


Shirley Cohen3, Mark A. Miller1, Ilkay Altintas1

1San Diego Supercomputer Center, UCSD


2University of California, Davis

3University of Pennsylvania

biology.sdsc.edu
What is a Scientific Workflow?
 Combination of
 data integration, analysis, and visualization steps
 larger, automated "scientific process"
 Mission of scientific workflow systems
 Promote “scientific discovery” by providing tools and methods to
generate scientific workflows
 Create an extensible and customizable graphical user interface
for scientists from different scientific domains
 Support computational experiment creation, execution, sharing,
reuse and provenance
 Design frameworks which define efficient ways to connect to the
existing data and integrate heterogeneous data from multiple
resources
 Make technology useful through user’s monitor!!!

biology.sdsc.edu
Promoter Identification Workflow

Source: Matt Coleman (LLNL)

biology.sdsc.edu
A Workflow for Phylogeny Analysis

biology.sdsc.edu
Kepler is a Scientific Workflow System
www.kepler-project.org

 … and a cross-project collaboration


 June 2, 2006 Beta release
 Builds upon the Ptolemy II: A software system
used for prototyping engineering
open-source system
KEPLER:
Ptolemy II A platform to design and
execute Scientific Workflows
framework
KEPLER = “Ptolemy II + X” for
Scientific Workflows

biology.sdsc.edu
Some Kepler Contributors

Ptolemy II

Griddles
SKIDL
Resurgence SRB
NLADR Contributor names and
Other contributors: funding info are at the
- Chesire (UK Text Mining Center)
- DART (Great Barrier Reef, Australia)
LOOKING Kepler website!!

- National Digital Archives + UCSD-TV (US)


-…

biology.sdsc.edu
A co-development in KEPLER: GEON
Dataset Generation & Registration
% Makefile
$> ant run

SQL database access (JDBC)

biology.sdsc.edu
Phylogeny Analysis Workflows

Local Disk
Phylogeny Tree
Analysis Visualization
Multiple
Sequence
Alignment

biology.sdsc.edu
Kepler Workflow: Actors
 Actor
 Encapsulation of parameterized
actions
 Interface defined by ports and
parameters

 Port
 Communication between input and
output data
 The place where data get in/out

 Model of computation
 Flow of control
Actor-Oriented Design  Sequential / parallel execution
 Implementation is a framework

biology.sdsc.edu
CIPRes Workflow: Actors

Input Port: Data Matrix


Nexus File Content Tree
Taxa Info
Output Ports:

biology.sdsc.edu
Some actors in place for…
• Generic Web Service Client and Web Service Harvester
• Customizable RDBMS query and update
• Command Line wrapper tools (local, ssh, scp, ftp, etc.)
• Some Grid actors-Globus Job Runner, GridFTP-based file access, Proxy Certificate Generator
• SRB support
• Native R and Matlab support
• Interaction with Nimrod and APST
• Communication with ORBs through actors and services
• Imaging, Gridding, Vis Support
• Textual and Graphical Output
• …more generic and domain-oriented actors…

biology.sdsc.edu
CIPRes Workflow
Actor:

GUIGen: Parameter Setting

Choose the input file


Run ClustalW

Channel: Convey the data

Get the subset


of the aligned
sequences

Run PAUP for Tree


Inference
Read the tree
Parse the tree
Results:
Display the tree

biology.sdsc.edu
CIPRes Workflows: Demo
 Read Sequences  Multiple Sequence
Alignment  Display the Alignment
 Matrix Alignment  Tree Inference 
Consensus Tree  Tree Visualization

biology.sdsc.edu
Summary
 Kepler is good at:
 Integrating data, programs, and computing resources
 Capturing your ideas and realizing them
 Supporting computational experiment creation,
execution, sharing, and reuse
 Quickly prototyping scientific workflows
 Building streamlining applications
 Visual programming language
 Don’t write your application, “draw”/compose it
 Cipres-Kepler package can be used to build
scientific workflows for phylogenetic data analyses

biology.sdsc.edu
Future Work
 Cipres-Kepler can help you
 There is (always) a lot more to work on:
 More actors for phylogeny analyses
 Automatically generating actors based on CORBA
services
 Database (TreeBase) support to store large amounts
of data
 More computing power for large dataset processing
 Need your collaboration:
 Sharing experiences
 Teaching each other the domain knowledge
 Locating a specific problem and solving it

biology.sdsc.edu
Questions?

Zhijie Guan
[email protected]
1-858-822-3620
www.sdsc.edu

Cipres-Kepler Release:
ftp://ftp.sdsc.edu/outgoing/borchers/cipresReleases/20060621/cipresKepler_Dist.tgz

biology.sdsc.edu

You might also like