This repository contains code for the paper "Variational Combinatorial Sequential Monte Carlo for Bayesian Phylogenetics in Hyperbolic Space" by Alex Chen, Philippe Chlenski, Kenneth Munyuza, Antonio Khalil Moretti, Christian A. Naesseth, and Itsik Pe'er.
The paper has been accepted to AISTATS 2025.
First, install the required packages:
pip install -e .Next, create a Wandb account to save run metrics. Link your account to the CLI by running:
wandb loginNotes:
--q-matrix: specifies the Q matrix to use.jc69: fixed JC69 Q matrix.stationary: one global Q matrix with each entry free.mlp_factorized: An MLP maps embeddings to holding times and stationary probabilities, forming the Q matrix. More memory efficient thanmlp_dense.mlp_dense: An MLP maps embeddings to all entries of the Q matrix.
--lookahead-merge: performs H-VNCSMC if--hyperbolicis set, or VNCSMC otherwise.--hash-trick: memoizes compute over tree topologies to speed up computation. Only applies when--hyperbolicis set, and essentially required if--lookahead-mergeis set.--checkpoint-grads: use gradient checkpointing to reduce memory usage.
Run H-VCSMC on primates using K=512 and a learned Q matrix:
python -m scripts.train.hyp_train --lr 0.01 --epochs 200 --k 512 --q-matrix mlp_dense --hash-trick data/primates.phyRun H-VNCSMC (nested proposal) on primates using K=16 and a learned Q matrix:
python -m scripts.train.hyp_train --lr 0.01 --epochs 200 --k 16 --q-matrix mlp_dense --lookahead-merge --hash-trick data/primates.phyRun H-VNCSMC on a larger benchmark dataset (DS1) using K=16 and a factorized Q matrix:
python -m scripts.train.hyp_train --lr 0.05 --epochs 200 --k 16 --q-matrix mlp_factorized --lookahead-merge --hash-trick data/hohna/DS1.phyRun H-VNCSMC on benchmark datasets DS1-DS7, with deferred branch sampling to better learn the embeddings:
python -m scripts.benchmarks.hyp_smc_benchmark --q-matrix mlp_factorized