Refactoring expressivity/predict into ExpressiveTranslator. #292

kauterry · 2023-12-20T20:19:42Z

Almost done with implementation, saving it as a draft since I'm yet to test it.

yilinyang7 · 2023-12-20T23:21:55Z

I'm not super sure of the intention of this PR, can I ask a few questions:

Why having both inference/ and predict/ subdirectories, as they're seem to serve same purpose it could be confusing
This new ExpressiveTranslator only support single-audio translation, its use-case would be too narrow, isn't it? The single audio translation is well-served by our two public demo (internal & HF) plus an gadio app.py script. I'm not convinced why we'll need this.

kauterry · 2023-12-21T00:44:47Z

I'm not super sure of the intention of this PR, can I ask a few questions:

Why having both inference/ and predict/ subdirectories, as they're seem to serve same purpose it could be confusing

This new ExpressiveTranslator only support single-audio translation, its use-case would be too narrow, isn't it? The single audio translation is well-served by our two public demo (internal & HF) plus an gadio app.py script. I'm not convinced why we'll need this.

There is absolutely no harm in having this extra way of doing expressivity inference. Developers aren’t going to use the demo, they need the code. If you see for M4T we have m4t_predict which is a thin layer around the translator class, we want to stick to the same format for expressivity_predict as well. Btw the more ways we have the easier we make our models for adoption, that’s the whole point of OSS. In this case, writing an expressive translator strips out all the unnecessary boilerplate from the user, and they can call expressive inference easily, making it in the same format as M4T inference.

I don’t think it’s confusing, the CLI directory has stuff which the user can leverage directly. Inference acts as a bridge between the CLI and the actual models. If we have a translator, we definitely need to have an expressive_translator. So I strongly disagree with you on this. In fact, this discrepancy between m4t_predict’s design and expressivity_predict’s design was pointed out by @elbayadm. I feel we should’ve paid more attention to the design of these interfaces which expose our models to the users.

kauterry · 2023-12-21T00:46:09Z

predict falls under the CLI directory. Inference is a directory of its own, feel free to suggest a better name to make it clear.

yilinyang7 · 2023-12-21T15:29:29Z

Thanks for the reply.
I agree it doesn't harm to implement this extra single-audio inference script.
Yet my point is, when I looked at ExpressiveTranslator class, it only supports single-audio inference. With such a general class name, isn't that too limited, and could confuse future users? I believe it should support batching as Translator and PretsselGenerator does.
For the directory, I don't understand why you want to have both cli/expressivity/predict and cli/expressivity/inference. We could instead put the script files under one directory?

yilinyang7 · 2023-12-21T15:33:49Z

For expressivity inference, the batched inference and the automatic evaluation metrics that runs after it should be the default point-of-entry.
I don't disagree with an additional single audio inference script, but just align us on what's the main point-of-entry.
If this ExpressiveTranslator class doesn't support batched inference, it'll be confusing to the users.

kauterry · 2023-12-21T16:49:22Z

This is going to support batched inference, and I will refactor expressivity_evaluate to use this script. Exactly like how Translator is used in m4t_evaluate and m4t_predict. How does that sound?

yilinyang7 · 2023-12-21T17:05:15Z

This is going to support batched inference, and I will refactor expressivity_evaluate to use this script. Exactly like how Translator is used in m4t_evaluate and m4t_predict. How does that sound?

yep, that sounds good, thanks!

elbayadm · 2023-12-21T21:32:01Z

Thank you @kauterry for drafting this PR.
@yilinyang7, the main thing that I thought was missing in SC is this:
speech_output, text_output = expressive_translator.predict(input) where a user wouldn't need to compute fbank and src_gcmvn for prosody outside of the call to a predict function. The point is not to add new functionalities (e.g. batching) but to provide a better entry point in python à la HF's transformers. Not gradio, not demo and not command line.
Functionalities can be added later and refactoring expressivity_evaluate is definitely necessary.

elbayadm · 2023-12-21T21:35:50Z

README.md

 | Model Name        | #params | checkpoint                                                                                                                                                                                                                                                                                                                                                                 |
 | ----------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| W2v-BERT 2.0 | 600M    | [🤗 Model card](https://huggingface.co/facebook/conformer-shaw) - [checkpoint](https://huggingface.co/facebook/conformer-shaw/resolve/main/conformer_shaw.pt)
+| W2v-BERT 2.0 | 600M    | [🤗 Model card](https://huggingface.co/facebook/w2v-bert-2.0) - [checkpoint](https://huggingface.co/facebook/w2v-bert-2.0/resolve/main/conformer_shaw.pt)


can we fix this in another PR

yilinyang7 · 2023-12-21T21:54:56Z

predict falls under the CLI directory. Inference is a directory of its own, feel free to suggest a better name to make it clear.

sorry I missed this, and I just realize I mis-read the directory layers.

yilinyang7 · 2023-12-21T22:03:09Z

Thank you @kauterry for drafting this PR. @yilinyang7, the main thing that I thought was missing in SC is this: speech_output, text_output = expressive_translator.predict(input) where a user wouldn't need to compute fbank and src_gcmvn for prosody outside of the call to a predict function. The point is not to add new functionalities (e.g. batching) but to provide a better entry point in python à la HF's transformers. Not gradio, not demo and not command line. Functionalities can be added later and refactoring expressivity_evaluate is definitely necessary.

Thanks for the clarification, that makes sense!
I think the current discrepancy is that expressivity_evaluate does batched inference, while current ExpressiveTranslator class only support single-audio inference.
If we want a clearer API call, we might need to refactor every script to use this new ExpressiveTranslator class, which should support batched inference for efficiency.
If I misunderstood anything, let me know!

elbayadm · 2023-12-21T22:07:24Z

Thank you @kauterry for drafting this PR. @yilinyang7, the main thing that I thought was missing in SC is this: speech_output, text_output = expressive_translator.predict(input) where a user wouldn't need to compute fbank and src_gcmvn for prosody outside of the call to a predict function. The point is not to add new functionalities (e.g. batching) but to provide a better entry point in python à la HF's transformers. Not gradio, not demo and not command line. Functionalities can be added later and refactoring expressivity_evaluate is definitely necessary.

Thanks for the clarification, that makes sense! I think the current discrepancy is that expressivity_evaluate does batched inference, while current ExpressiveTranslator class only support single-audio inference. If we want a clearer API call, we might need to refactor every script to use this new ExpressiveTranslator class, which should support batched inference for efficiency. If I misunderstood anything, let me know!

That's reasonable, if it's already implemented then the new ExpressiveTranslator should use that as well.

yilinyang7

After the bug fix, the batched ExpressiveTranslator inference is done, and could replicate previous results over all metrics.

facebook-github-bot · 2025-02-10T16:09:14Z

Hi @kauterry!

Thank you for your pull request.

We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but the CLA is no longer valid, and will need to be resubmitted.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

Refactoring expressivity/predict into ExpressiveTranslator.

ceada0c

kauterry requested review from cbalioglu, cndn, elbayadm and yilinyang7 December 20, 2023 20:19

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 20, 2023

elbayadm reviewed Dec 21, 2023

View reviewed changes

yilinyang7 added 2 commits January 4, 2024 21:19

batched expressive_translator

778e11e

bug fix & formatter

791004d

yilinyang7 approved these changes Jan 6, 2024

View reviewed changes

Refactoring expressivity/predict into ExpressiveTranslator. #292

Are you sure you want to change the base?

Refactoring expressivity/predict into ExpressiveTranslator. #292

Uh oh!

Conversation

kauterry commented Dec 20, 2023

Uh oh!

yilinyang7 commented Dec 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kauterry commented Dec 21, 2023

Uh oh!

kauterry commented Dec 21, 2023

Uh oh!

yilinyang7 commented Dec 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yilinyang7 commented Dec 21, 2023

Uh oh!

kauterry commented Dec 21, 2023

Uh oh!

yilinyang7 commented Dec 21, 2023

Uh oh!

elbayadm commented Dec 21, 2023

Uh oh!

elbayadm Dec 21, 2023

Choose a reason for hiding this comment

Uh oh!

kauterry Jan 11, 2024

Choose a reason for hiding this comment

Uh oh!

yilinyang7 commented Dec 21, 2023

Uh oh!

yilinyang7 commented Dec 21, 2023

Uh oh!

elbayadm commented Dec 21, 2023

Uh oh!

yilinyang7 left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Feb 10, 2025

Process

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

yilinyang7 commented Dec 20, 2023 •

edited

Loading

yilinyang7 commented Dec 21, 2023 •

edited

Loading