2024 has been weird – but in one of the happier examples of that, I find myself agreeing with Liam AND James. 😉
I’d been implicitly thinking about model averaging the rates first, then using this for ancestral state recon: what’s the average rate for 0A->0B, given (most generally) some models have no hidden state (so no 0A->0B possible), some have just A and B, some have A, B, C, and D (with no guarantee that 0B in one model corresponds to 0B in another). But doing the recon in each model first, then collapsing the hidden states so we just look at likelihood of 0 or 1 at a node and weight these across the model weights, does indeed make a lot of sense and is a much better way (which I should have remembered, as this is how it works in hisse plotting already). I think it still leaves difficulties if one wants to estimate probability of hidden state “B” at a node across models since that state isn’t the same “thing” across models of different complexity unlike the observed state, but fortunately few should have this use case. Best, Brian _________________________________________ Brian O’Meara He/Him Professor, Dept. of Ecology & Evolutionary Biology University of Tennessee, Knoxville From: James Boyko <[email protected]> Date: Friday, November 15, 2024 at 7:37 AM To: [email protected] <[email protected]> Cc: O'Meara, Brian C <[email protected]>, Rafael S Marcondes <[email protected]>, r-sig-phylo <[email protected]>, Jeremy Michael Beaulieu <[email protected]> Subject: Re: [R-sig-phylo] Model-averaging corHMM models? Ya I agree with Liam’s assessment. This is what we’ve done in our past empirical applications with hidden states: combine hidden state probabilities so that we only have the observed states, and then average across models across those. As Brian says, it’s not really possible to average hidden state information because of the label switching problem. Imagine fitting an identical HMM twice to the same data. There is no difference in the likelihood if you were to switch rate class labels. Eg if in fit 1: rate class A is fast transition rates and rate class B was slow rates, now swap. There is no difference in the likelihood if now rate class A is slow and rate class B is fast. The labels are arbitrary and it means that we can’t identify which rate classes should be grouped together during model averaging. Hence why we summarize to observed states before averaging. I recently added a “model_list” class to corhmm so it’ll be easy to make this an exported function. (Can also add Brian’s redundancy detection). I’ll try to get to it this weekend, but shoot me an email if you need help coding this sooner. Best, James On Fri, Nov 15, 2024 at 5:50 AM Liam J. Revell <[email protected]<mailto:[email protected]>> wrote: Dear list. I'm going to make a general pitch for model averaging ancestral state reconstruction -- without solving Rafa's problem that this is not yet implemented in corHMM. Setting aside, for a moment, hidden-rate models -- imagine a set of standard Mk model for a binary trait. There are (reasonably) four such models: an ER model, an ARD model, a 0 → 1 model, and a 1 → 0 model. Now imagine further an unlikely, but possible, scenario in which all of our weight of evidence fell on the 0 → 1 and a 1 → 0 models, but 51% landed on the former and 49% on the latter. In the former model our marginal reconstruction at the root will be unambiguously 0 (assuming both 0 & 1 are observed at the tips). Clearly, however, the chances that the root node is 0 or 1 are closer to 50:50 than 100:0 (since under our nearly equally well-supported 1 → 0 model the same node is unambiguously 1). The same dilemma holds for more realistic scenarios -- indeed, I've seen it in empirical data! Lastly, here's a suggestion on how to do model averaging with hidden states: first "merge" hidden levels, and then model-average using Akaike weights (or whatever your preferred approach is to model averaging). That is, if a node has a marginal scaled likelihood of "0A" of 0.4 and "0B" of 0.3 (for observed states 0/1 and hidden levels A/B), then the total probability that that node was in observed state "0" is just 0.4 + 0.3 = 0.7. This is implemented natively in phytools::ancr; however, I don't recall if I built it to support hidden states. (For ancestral state reconstruction of a "fitHRM" object in phytools, however, it is possible to "hide" hidden states after reconstruction, then model-averaging, including across hidden & "non-hidden" state models becomes quite easy.) Hope this is helpful. Best wishes. Liam Liam J. Revell Professor of Biology, University of Massachusetts Boston Web: http://faculty.umb.edu/liam.revell/ Book: Phylogenetic Comparative Methods in R<https://press.princeton.edu/books/phylogenetic-comparative-methods-in-r> (Princeton University Press, 2022) On 11/14/2024 1:19 PM, O'Meara, Brian C via R-sig-phylo wrote: CAUTION: EXTERNAL SENDER Hi, Rafa. One thing to look at is whether the models are effectively identical; if the 3 state is 2 AICc worse than the 2 state it could be that it’s just adding a single extra parameter with no improvement (i.e, the likelihoods might be the same). In some of our other software we now have code to detect functionally redundant models and remove this redundancy, but I don’t remember whether it’s in corHMM yet. Assuming the models are somewhat different, much as I like model averaging I’d use the one with lower AICc for ancestral state reconstruction (and yep, glad you brought up all the uncertainties with anc recon). With hisse, with model averaging one can basically model the turnover, net diversification, and other diversification-flavored rates: this edge is 0A in this model, and 0C in that model, but one can still just take the turnover rate from each. It’s partly a benefit from hisse’s relative inattention to the variation in transition rates. But corHMM is all about the transition rates: in one model, we have rates 0A->1A, 0B->1B, while in the other we have 0A->1A, 0B->1B, and 0C->1C: it’s not that the overall rate of 0->1 is the unweighted mean of the 0*->1* rates in each model, as state B might be really rare in one model and common in another. One can think about what happens at equilibrium with the estimated rate matrix, or use the simmap functionality in corHMM to look at empirical frequencies of going from 0->1, but it’s an area where I think we need to think a bit more about how to summarize and present results (for one thing, I think summarizing by wait times (reciprocal of rates) rather than rates can be a better way to summarize, so an ephemeral state with a high outgoing rate that a species is expected to occupy for only a short time does not have a huge impact on the average parameter). I hope this helps; I’m CCing Jeremy Beaulieu and James Boyko in case they’re not seeing R-sig-phylo posts. Best, Brian _________________________________________ Brian O’Meara He/Him Professor, Dept. of Ecology & Evolutionary Biology University of Tennessee, Knoxville From: R-sig-phylo <[email protected]><mailto:[email protected]> on behalf of Rafael S Marcondes <[email protected]><mailto:[email protected]> Date: Thursday, November 14, 2024 at 12:11 PM To: r-sig-phylo <[email protected]><mailto:[email protected]> Subject: [R-sig-phylo] Model-averaging corHMM models? Hi all, I have two corHMM models within 2 AICc units of each other. They differ in the number of hidden states (2 versus 3). I'm trying to decide how to proceed from here and wondering if model-averaging would be the way to go. I'm interested in drawing biological conclusions based on the parameter estimates, and in doing ancestral character estimation (strictly for visualization purposes only--I'm well aware of the caveats of over-interpreting ACE states). corHMM doesn't seem to have an in-built model averaging utility. But by analogy with the one in hisse (modelAveRates), it seems as if it's just a matter of averaging the parameter estimates based on model AIC weights. Is it really that simple? How would I deal with the param present in one model and not the other (related to the additional hidden state)? And then would it be appropriate to run ACE using the averaged rate parameters? Or would it be more appropriate to perform ACE separately for each model and then average the node likelihoods instead? Thanks, -Rafa *--* *Rafael S. Marcondes, Ph.D. (he/him)* (The R in Rafael is pronounced like the h in "hat") *https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2F2.zoppoz.workers.dev%3A443%2Fhttps%2Fwww.rafaelmarcondes.com%2F&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364613426%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=sVqB7KbRcrZl1BVDOD7SV%2BhfGZunxII4uaKNqK%2FTE68%3D&reserved=0<https://www.rafaelmarcondes.com/> <https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2F2.zoppoz.workers.dev%3A443%2Fhttps%2Fwww.rafaelmarcondes.com%2F&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364653476%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=vHeIemksgxArMdqW9je6Mzd%2FaF7m4yd9m8BQMkR45Aw%3D&reserved=0><https://www.rafaelmarcondes.com/>* Faculty Fellow in EEB Department of BioSciences Rice University Houston TX 77005 *"Eu quase que nada não sei. Mas desconfio de muita coisa"* *"I almost don't know nothing. But I suspect many things"* -João Guimarães Rosa, Brazilian novelist (Portuguese original and free English translation by me) [[alternative HTML version deleted]] _______________________________________________ R-sig-phylo mailing list - [email protected]<mailto:[email protected]> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2F2.zoppoz.workers.dev%3A443%2Fhttps%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-phylo&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364665693%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=OvGBiOS0%2FRhAazqnbVvhfurbhAOOdcod7HjuqbdE7i4%3D&reserved=0<https://stat.ethz.ch/mailman/listinfo/r-sig-phylo> Searchable archive at https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2F2.zoppoz.workers.dev%3A443%2Fhttp%2Fwww.mail-archive.com%2Fr-sig-phylo%40r-project.org%2F&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364676685%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=%2B3dCyE08gfQq%2BqHhKlP05hG3zjiByokiHSc2KxQco9w%3D&reserved=0<http://www.mail-archive.com/[email protected]/> [[alternative HTML version deleted]] _______________________________________________ R-sig-phylo mailing list - [email protected]<mailto:[email protected]> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2F2.zoppoz.workers.dev%3A443%2Fhttps%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-phylo&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364689964%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=8Yk9KwToyBxnfHhzFl2k6MHNvjWY1%2FC3Ppo8GDaENXU%3D&reserved=0<https://stat.ethz.ch/mailman/listinfo/r-sig-phylo> Searchable archive at https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2F2.zoppoz.workers.dev%3A443%2Fhttp%2Fwww.mail-archive.com%2Fr-sig-phylo%40r-project.org%2F&data=05%7C02%7Cliam.revell%40umb.edu%7C8d4d0ab8549e42ee097508dd04d9055f%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C638672052364700712%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C60000%7C%7C%7C&sdata=vkk4mRlMFPgDOZw7yPD%2FiAVxpEtUoB83DtRzh1l4t6Q%3D&reserved=0<http://www.mail-archive.com/[email protected]/> [[alternative HTML version deleted]] _______________________________________________ R-sig-phylo mailing list - [email protected] https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/[email protected]/
