Cipres in Kepler: An Integrative Workflow Package For Streamlining Phylogenetic Data Analyses
Cipres in Kepler: An Integrative Workflow Package For Streamlining Phylogenetic Data Analyses
3University of Pennsylvania
biology.sdsc.edu
What is a Scientific Workflow?
Combination of
data integration, analysis, and visualization steps
larger, automated "scientific process"
Mission of scientific workflow systems
Promote “scientific discovery” by providing tools and methods to
generate scientific workflows
Create an extensible and customizable graphical user interface
for scientists from different scientific domains
Support computational experiment creation, execution, sharing,
reuse and provenance
Design frameworks which define efficient ways to connect to the
existing data and integrate heterogeneous data from multiple
resources
Make technology useful through user’s monitor!!!
biology.sdsc.edu
Promoter Identification Workflow
biology.sdsc.edu
A Workflow for Phylogeny Analysis
biology.sdsc.edu
Kepler is a Scientific Workflow System
www.kepler-project.org
biology.sdsc.edu
Some Kepler Contributors
Ptolemy II
Griddles
SKIDL
Resurgence SRB
NLADR Contributor names and
Other contributors: funding info are at the
- Chesire (UK Text Mining Center)
- DART (Great Barrier Reef, Australia)
LOOKING Kepler website!!
biology.sdsc.edu
A co-development in KEPLER: GEON
Dataset Generation & Registration
% Makefile
$> ant run
biology.sdsc.edu
Phylogeny Analysis Workflows
Local Disk
Phylogeny Tree
Analysis Visualization
Multiple
Sequence
Alignment
biology.sdsc.edu
Kepler Workflow: Actors
Actor
Encapsulation of parameterized
actions
Interface defined by ports and
parameters
Port
Communication between input and
output data
The place where data get in/out
Model of computation
Flow of control
Actor-Oriented Design Sequential / parallel execution
Implementation is a framework
biology.sdsc.edu
CIPRes Workflow: Actors
biology.sdsc.edu
Some actors in place for…
• Generic Web Service Client and Web Service Harvester
• Customizable RDBMS query and update
• Command Line wrapper tools (local, ssh, scp, ftp, etc.)
• Some Grid actors-Globus Job Runner, GridFTP-based file access, Proxy Certificate Generator
• SRB support
• Native R and Matlab support
• Interaction with Nimrod and APST
• Communication with ORBs through actors and services
• Imaging, Gridding, Vis Support
• Textual and Graphical Output
• …more generic and domain-oriented actors…
biology.sdsc.edu
CIPRes Workflow
Actor:
biology.sdsc.edu
CIPRes Workflows: Demo
Read Sequences Multiple Sequence
Alignment Display the Alignment
Matrix Alignment Tree Inference
Consensus Tree Tree Visualization
biology.sdsc.edu
Summary
Kepler is good at:
Integrating data, programs, and computing resources
Capturing your ideas and realizing them
Supporting computational experiment creation,
execution, sharing, and reuse
Quickly prototyping scientific workflows
Building streamlining applications
Visual programming language
Don’t write your application, “draw”/compose it
Cipres-Kepler package can be used to build
scientific workflows for phylogenetic data analyses
biology.sdsc.edu
Future Work
Cipres-Kepler can help you
There is (always) a lot more to work on:
More actors for phylogeny analyses
Automatically generating actors based on CORBA
services
Database (TreeBase) support to store large amounts
of data
More computing power for large dataset processing
Need your collaboration:
Sharing experiences
Teaching each other the domain knowledge
Locating a specific problem and solving it
biology.sdsc.edu
Questions?
Zhijie Guan
[email protected]
1-858-822-3620
www.sdsc.edu
Cipres-Kepler Release:
ftp://ftp.sdsc.edu/outgoing/borchers/cipresReleases/20060621/cipresKepler_Dist.tgz
biology.sdsc.edu