Skip to content

Adjust mul_mat_f16 work memory #1226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 29, 2023
Merged

Conversation

ggerganov
Copy link
Member

@ggerganov ggerganov commented Apr 29, 2023

Haven't tested this yet. The goal is to allocate just the needed amount of memory when not using cuBLAS

@ggerganov ggerganov requested review from slaren and 0cc4m April 29, 2023 09:45
@ggerganov ggerganov force-pushed the adjust-mul-mat-f16-work-memory branch from 638651a to 0ffcd89 Compare April 29, 2023 10:54
@slaren
Copy link
Member

slaren commented Apr 29, 2023

Looks good, I didn't realize that this could increase the maximum size of the work memory, so I set it to the worst case maximum to make testing easier.
In the future we shouldn't need to use any work memory at all for this with cuBLAS, I have been testing converting between f16 and f32 in the GPU, and it is faster that way.

@ggerganov ggerganov merged commit 214b6a3 into master Apr 29, 2023
@ggerganov ggerganov deleted the adjust-mul-mat-f16-work-memory branch April 29, 2023 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants