Skip to content

Commit cc0e504

Browse files
authored
[Hackability Refactor] Move the Quant JSONs into quant_config (#1056)
1 parent b273a82 commit cc0e504

File tree

8 files changed

+13
-10
lines changed

8 files changed

+13
-10
lines changed

.github/workflows/pull.yml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -332,7 +332,7 @@ jobs:
332332
333333
echo "::group::Run inference with quantize file"
334334
if [ $(uname -s) != Darwin ]; then
335-
python3 generate.py --quantize config/data/cuda.json --checkpoint "./checkpoints/${REPO_NAME}/model.pth"
335+
python3 generate.py --quantize torchchat/quant_config/cuda.json --checkpoint "./checkpoints/${REPO_NAME}/model.pth"
336336
fi
337337
echo "::endgroup::"
338338
@@ -378,7 +378,7 @@ jobs:
378378
379379
echo "::group::Run inference with quantize file"
380380
if [ $(uname -s) == Darwin ]; then
381-
python3 export.py --output-dso-path /tmp/model.so --quantize config/data/cuda.json --checkpoint "./checkpoints/${REPO_NAME}/model.pth"
381+
python3 export.py --output-dso-path /tmp/model.so --quantize torchchat/quant_config/cuda.json --checkpoint "./checkpoints/${REPO_NAME}/model.pth"
382382
python3 generate.py --dso-path /tmp/model.so --checkpoint "./checkpoints/${REPO_NAME}/model.pth"~
383383
fi
384384
echo "::endgroup::"
@@ -501,9 +501,9 @@ jobs:
501501
python3 torchchat.py generate --checkpoint-path ${MODEL_PATH} --temperature 0 --pte-path ${MODEL_DIR}/${MODEL_NAME}.pte
502502
503503
echo "******************************************"
504-
echo "*** --quantize config/data/mobile.json ***"
504+
echo "*** --quantize torchchat/quant_config/mobile.json ***"
505505
echo "******************************************"
506-
# python export.py --quantize config/data/mobile.json --checkpoint-path ${MODEL_PATH} --output-pte-path ${MODEL_DIR}/${MODEL_NAME}.pte
506+
# python export.py --quantize torchchat/quant_config/mobile.json --checkpoint-path ${MODEL_PATH} --output-pte-path ${MODEL_DIR}/${MODEL_NAME}.pte
507507
# python3 torchchat.py generate --checkpoint-path ${MODEL_PATH} --temperature 0 --pte-path ${MODEL_DIR}/${MODEL_NAME}.pte
508508
509509

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -270,7 +270,7 @@ python3 torchchat.py export llama3.1 --output-dso-path exportedModels/llama3.1.s
270270

271271
> [!NOTE]
272272
> If your machine has cuda add this flag for performance
273-
`--quantize config/data/cuda.json` when exporting.
273+
`--quantize torchchat/quant_config/cuda.json` when exporting.
274274

275275
For more details on quantization and what settings to use for your use
276276
case visit our [customization guide](docs/model_customization.md).
@@ -327,11 +327,11 @@ Similar to AOTI, to deploy onto device, we first export the PTE artifact, then w
327327
The following example uses the Llama3.1 8B Instruct model.
328328
```
329329
# Export
330-
python3 torchchat.py export llama3.1 --quantize config/data/mobile.json --output-pte-path llama3.1.pte
330+
python3 torchchat.py export llama3.1 --quantize torchchat/quant_config/mobile.json --output-pte-path llama3.1.pte
331331
```
332332

333333
> [!NOTE]
334-
> We use `--quantize config/data/mobile.json` to quantize the
334+
> We use `--quantize torchchat/quant_config/mobile.json` to quantize the
335335
llama3.1 model to reduce model size and improve performance for
336336
on-device use cases.
337337

docs/quantization.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -47,8 +47,8 @@ on-device usecases.
4747
## Quantization API
4848

4949
Quantization options are passed in json format either as a config file
50-
(see [cuda.json](../config/data/cuda.json) and
51-
[mobile.json](../config/data/mobile.json)) or a JSON string.
50+
(see [cuda.json](../torchchat/quant_config/cuda.json) and
51+
[mobile.json](../torchchat/quant_config/mobile.json)) or a JSON string.
5252

5353
The expected JSON format is described below. Refer to the tables above
5454
for valid `bitwidth` and `groupsize` values.
@@ -120,7 +120,7 @@ python3 generate.py llama3 --pte-path llama3.pte --prompt "Hello my name is"
120120

121121
## Quantization Profiles
122122

123-
Four [sample profiles](https://github.com/pytorch/torchchat/tree/main/config/data) are included with the torchchat distribution: `cuda.json`, `desktop.json`, `mobile.json`, `pi5.json`
123+
Four [sample profiles](https://github.com/pytorch/torchchat/tree/main/torchchat/quant_config/) are included with the torchchat distribution: `cuda.json`, `desktop.json`, `mobile.json`, `pi5.json`
124124
with profiles optimizing for execution on cuda, desktop, mobile and
125125
raspberry Pi devices.
126126

torchchat/quant_config/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Chat with LLMs Everywhere: Configs
2+
3+
This directory contains sample quantization configurations.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)