Re: [R-sig-phylo] Model-averaging corHMM models?

O'Meara, Brian C via R-sig-phylo Fri, 15 Nov 2024 08:02:38 -0800

2024 has been weird – but in one of the happier examples of that, I find myself 
agreeing with Liam AND James. 😉


I’d been implicitly thinking about model averaging the rates first, then using 
this for ancestral state recon: what’s the average rate for 0A->0B, given (most 
generally) some models have no hidden state (so no 0A->0B possible), some have 
just A and B, some have A, B, C, and D (with no guarantee that 0B in one model 
corresponds to 0B in another). But doing the recon in each model first, then 
collapsing the hidden states so we just look at likelihood of 0 or 1 at a node 
and weight these across the model weights, does indeed make a lot of sense and 
is a much better way (which I should have remembered, as this is how it works 
in hisse plotting already). I think it still leaves difficulties if one wants 
to estimate probability of hidden state “B” at a node across models since that 
state isn’t the same “thing” across models of different complexity unlike the 
observed state, but fortunately few should have this use case.

Best,
Brian

_________________________________________

Brian O’Meara
He/Him
Professor, Dept. of Ecology & Evolutionary Biology
University of Tennessee, Knoxville


From: James Boyko <[email protected]>
Date: Friday, November 15, 2024 at 7:37 AM
To: [email protected] <[email protected]>
Cc: O'Meara, Brian C <[email protected]>, Rafael S Marcondes 
<[email protected]>, r-sig-phylo <[email protected]>, Jeremy 
Michael Beaulieu <[email protected]>
Subject: Re: [R-sig-phylo] Model-averaging corHMM models?
Ya I agree with Liam’s assessment. This is what we’ve done in our past 
empirical applications with hidden states: combine hidden state probabilities 
so that we only have the observed states, and then average across models across 
those. As Brian says, it’s not really possible to average hidden state 
information because of the label switching problem. Imagine fitting an 
identical HMM twice to the same data. There is no difference in the likelihood 
if you were to switch rate class labels. Eg if in fit 1: rate class A is fast 
transition rates and rate class B was slow rates, now swap. There is no 
difference in the likelihood if now rate class A is slow and rate class B is 
fast. The labels are arbitrary and it means that we can’t identify which rate 
classes should be grouped together during model averaging. Hence why we 
summarize to observed states before averaging.

I recently added a “model_list” class to corhmm so it’ll be easy to make this 
an exported function. (Can also add Brian’s redundancy detection). I’ll try to 
get to it this weekend, but shoot me an email if you need help coding this 
sooner.

Best,
James

On Fri, Nov 15, 2024 at 5:50 AM Liam J. Revell 
<[email protected]<mailto:[email protected]>> wrote:

Dear list.

I'm going to make a general pitch for model averaging ancestral state 
reconstruction -- without solving Rafa's problem that this is not yet 
implemented in corHMM.

Setting aside, for a moment, hidden-rate models -- imagine a set of standard Mk 
model for a binary trait. There are (reasonably) four such models: an ER model, 
an ARD model, a 0 → 1 model, and a 1 → 0 model.

Now imagine further an unlikely, but possible, scenario in which all of our 
weight of evidence fell on the 0 → 1 and a 1 → 0 models, but 51% landed on the 
former and 49% on the latter. In the former model our marginal reconstruction 
at the root will be unambiguously 0 (assuming both 0 & 1 are observed at the 
tips). Clearly, however, the chances that the root node is 0 or 1 are closer to 
50:50 than 100:0 (since under our nearly equally well-supported 1 → 0 model the 
same node is unambiguously 1). The same dilemma holds for more realistic 
scenarios -- indeed, I've seen it in empirical data!


Lastly, here's a suggestion on how to do model averaging with hidden states: 
first "merge" hidden levels, and then model-average using Akaike weights (or 
whatever your preferred approach is to model averaging). That is, if a node has 
a marginal scaled likelihood of "0A" of 0.4 and "0B" of 0.3 (for observed 
states 0/1 and hidden levels A/B), then the total probability that that node 
was in observed state "0" is just 0.4 + 0.3 = 0.7. This is implemented natively 
in phytools::ancr; however, I don't recall if I built it to support hidden 
states. (For ancestral state reconstruction of a "fitHRM" object in phytools, 
however, it is possible to "hide" hidden states after reconstruction, then 
model-averaging, including across hidden & "non-hidden" state models becomes 
quite easy.)

Hope this is helpful.

Best wishes. Liam
Liam J. Revell
Professor of Biology, University of Massachusetts Boston
Web: http://faculty.umb.edu/liam.revell/
Book: Phylogenetic Comparative Methods in 
R<https://press.princeton.edu/books/phylogenetic-comparative-methods-in-r> 
(Princeton University Press, 2022)

On 11/14/2024 1:19 PM, O'Meara, Brian C via R-sig-phylo wrote:

CAUTION: EXTERNAL SENDER



Hi, Rafa.



One thing to look at is whether the models are effectively identical; if the 3 
state is 2 AICc worse than the 2 state it could be that it’s just adding a 
single extra parameter with no improvement (i.e, the likelihoods might be the 
same). In some of our other software we now have code to detect functionally 
redundant models and remove this redundancy, but I don’t remember whether it’s 
in corHMM yet.



Assuming the models are somewhat different, much as I like model averaging I’d 
use the one with lower AICc for ancestral state reconstruction (and yep, glad 
you brought up all the uncertainties with anc recon). With hisse, with model 
averaging one can basically model the turnover, net diversification, and other 
diversification-flavored rates: this edge is 0A in this model, and 0C in that 
model, but one can still just take the turnover rate from each. It’s partly a 
benefit from hisse’s relative inattention to the variation in transition rates. 
But corHMM is all about the transition rates: in one model, we have rates 
0A->1A, 0B->1B, while in the other we have 0A->1A, 0B->1B, and 0C->1C: it’s not 
that the overall rate of 0->1 is the unweighted mean of the 0*->1* rates in 
each model, as state B might be really rare in one model and common in another. 
One can think about what happens at equilibrium with the estimated rate matrix, 
or use the simmap functionality in corHMM to look at empirical frequencies of 
going from 0->1, but it’s an area where I think we need to think a bit more 
about how to summarize and present results (for one thing, I think summarizing 
by wait times (reciprocal of rates) rather than rates can be a better way to 
summarize, so an ephemeral state with a high outgoing rate that a species is 
expected to occupy for only a short time does not have a huge impact on the 
average parameter).



I hope this helps; I’m CCing Jeremy Beaulieu and James Boyko in case they’re 
not seeing R-sig-phylo posts.



Best,

Brian



_________________________________________



Brian O’Meara

He/Him

Professor, Dept. of Ecology & Evolutionary Biology

University of Tennessee, Knoxville





From: R-sig-phylo 
<[email protected]><mailto:[email protected]> 
on behalf of Rafael S Marcondes 
<[email protected]><mailto:[email protected]>

Date: Thursday, November 14, 2024 at 12:11 PM

To: r-sig-phylo <[email protected]><mailto:[email protected]>

Subject: [R-sig-phylo] Model-averaging corHMM models?

Hi all,



I have two corHMM models within 2 AICc units of each other. They differ in

the number of hidden states (2 versus 3). I'm trying to decide how to

proceed from here and wondering if model-averaging would be the way to go.

I'm interested in drawing biological conclusions based on

the parameter estimates, and in doing ancestral character estimation

(strictly for visualization purposes only--I'm well aware of the caveats of

over-interpreting ACE states).



corHMM doesn't seem to have an in-built model averaging utility. But by

analogy with the one in hisse (modelAveRates), it seems as if it's just a

matter of averaging the parameter estimates based on model AIC weights. Is

it really that simple? How would I deal with the param present in one model

and not the other (related to the additional hidden state)? And then would

it be appropriate to run ACE using the averaged rate parameters? Or would

it be more appropriate to perform ACE separately for each model and then

average the node likelihoods instead?



Thanks,



-Rafa



*--*

*Rafael S. Marcondes, Ph.D. (he/him)*

(The R in Rafael is pronounced like the h in "hat")

*https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2F2.zoppoz.workers.dev%3A443%2Fhttps%2Fwww.rafaelmarcondes.com%2F&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364613426%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=sVqB7KbRcrZl1BVDOD7SV%2BhfGZunxII4uaKNqK%2FTE68%3D&reserved=0<https://www.rafaelmarcondes.com/>
 
<https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2F2.zoppoz.workers.dev%3A443%2Fhttps%2Fwww.rafaelmarcondes.com%2F&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364653476%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=vHeIemksgxArMdqW9je6Mzd%2FaF7m4yd9m8BQMkR45Aw%3D&reserved=0><https://www.rafaelmarcondes.com/>*

Faculty Fellow in EEB

Department of BioSciences

Rice University

Houston TX 77005





*"Eu quase que nada não sei. Mas desconfio de muita coisa"*

*"I almost don't know nothing. But I suspect many things"*

  -João Guimarães Rosa, Brazilian novelist

(Portuguese original and free English translation by me)



        [[alternative HTML version deleted]]



_______________________________________________

R-sig-phylo mailing list - 
[email protected]<mailto:[email protected]>

https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2F2.zoppoz.workers.dev%3A443%2Fhttps%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-phylo&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364665693%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=OvGBiOS0%2FRhAazqnbVvhfurbhAOOdcod7HjuqbdE7i4%3D&reserved=0<https://stat.ethz.ch/mailman/listinfo/r-sig-phylo>

Searchable archive at 
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2F2.zoppoz.workers.dev%3A443%2Fhttp%2Fwww.mail-archive.com%2Fr-sig-phylo%40r-project.org%2F&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364676685%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=%2B3dCyE08gfQq%2BqHhKlP05hG3zjiByokiHSc2KxQco9w%3D&reserved=0<http://www.mail-archive.com/[email protected]/>



        [[alternative HTML version deleted]]



_______________________________________________

R-sig-phylo mailing list - 
[email protected]<mailto:[email protected]>

https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2F2.zoppoz.workers.dev%3A443%2Fhttps%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-phylo&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364689964%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=8Yk9KwToyBxnfHhzFl2k6MHNvjWY1%2FC3Ppo8GDaENXU%3D&reserved=0<https://stat.ethz.ch/mailman/listinfo/r-sig-phylo>

Searchable archive at 
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2F2.zoppoz.workers.dev%3A443%2Fhttp%2Fwww.mail-archive.com%2Fr-sig-phylo%40r-project.org%2F&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364700712%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=vkk4mRlMFPgDOZw7yPD%2FiAVxpEtUoB83DtRzh1l4t6Q%3D&reserved=0<http://www.mail-archive.com/[email protected]/>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - [email protected]
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/[email protected]/

Re: [R-sig-phylo] Model-averaging corHMM models?

Reply via email to