Skip to content

axchen7/h-vcsmc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Introduction

This repository contains code for the paper "Variational Combinatorial Sequential Monte Carlo for Bayesian Phylogenetics in Hyperbolic Space" by Alex Chen, Philippe Chlenski, Kenneth Munyuza, Antonio Khalil Moretti, Christian A. Naesseth, and Itsik Pe'er.

The paper has been accepted to AISTATS 2025.

Setup

First, install the required packages:

pip install -e .

Next, create a Wandb account to save run metrics. Link your account to the CLI by running:

wandb login

Example Runs

Notes:

  • --q-matrix: specifies the Q matrix to use.
    • jc69: fixed JC69 Q matrix.
    • stationary: one global Q matrix with each entry free.
    • mlp_factorized: An MLP maps embeddings to holding times and stationary probabilities, forming the Q matrix. More memory efficient than mlp_dense.
    • mlp_dense: An MLP maps embeddings to all entries of the Q matrix.
  • --lookahead-merge: performs H-VNCSMC if --hyperbolic is set, or VNCSMC otherwise.
  • --hash-trick: memoizes compute over tree topologies to speed up computation. Only applies when --hyperbolic is set, and essentially required if --lookahead-merge is set.
  • --checkpoint-grads: use gradient checkpointing to reduce memory usage.

Run H-VCSMC on primates using K=512 and a learned Q matrix:

python -m scripts.train.hyp_train --lr 0.01 --epochs 200 --k 512 --q-matrix mlp_dense --hash-trick data/primates.phy

Run H-VNCSMC (nested proposal) on primates using K=16 and a learned Q matrix:

python -m scripts.train.hyp_train --lr 0.01 --epochs 200 --k 16 --q-matrix mlp_dense --lookahead-merge --hash-trick data/primates.phy

Run H-VNCSMC on a larger benchmark dataset (DS1) using K=16 and a factorized Q matrix:

python -m scripts.train.hyp_train --lr 0.05 --epochs 200 --k 16 --q-matrix mlp_factorized --lookahead-merge --hash-trick data/hohna/DS1.phy

Run H-VNCSMC on benchmark datasets DS1-DS7, with deferred branch sampling to better learn the embeddings:

python -m scripts.benchmarks.hyp_smc_benchmark --q-matrix mlp_factorized

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published