update README
This commit is contained in:
parent
448a53e6a7
commit
df8672ce75
3 changed files with 7 additions and 6 deletions
|
@ -16,9 +16,9 @@
|
||||||
</h4>
|
</h4>
|
||||||
|
|
||||||
## News or Update
|
## News or Update
|
||||||
|
- 2023-05-30 - (Update) - support download/upload quantized model from/to 🤗 Hub.
|
||||||
- 2023-05-27 - (Update) - Support quantization and inference for `gpt_bigcode`, `codegen` and `RefineWeb/RefineWebModel`(falcon) model types.
|
- 2023-05-27 - (Update) - Support quantization and inference for `gpt_bigcode`, `codegen` and `RefineWeb/RefineWebModel`(falcon) model types.
|
||||||
- 2023-05-04 - (Update) - Support using faster cuda kernel when `not desc_act or group_size == -1`.
|
- 2023-05-04 - (Update) - Support using faster cuda kernel when `not desc_act or group_size == -1`.
|
||||||
- 2023-04-29 - (Update) - Support loading quantized model from arbitrary quantize_config and model_basename.
|
|
||||||
|
|
||||||
*For more histories please turn to [here](docs/NEWS_OR_UPDATE.md)*
|
*For more histories please turn to [here](docs/NEWS_OR_UPDATE.md)*
|
||||||
|
|
||||||
|
@ -141,13 +141,13 @@ model.save_quantized(quantized_model_dir, use_safetensors=True)
|
||||||
# or pass explcit token with: use_auth_token="hf_xxxxxxx"
|
# or pass explcit token with: use_auth_token="hf_xxxxxxx"
|
||||||
# (uncomment the following three lines to enable this feature)
|
# (uncomment the following three lines to enable this feature)
|
||||||
# repo_id = f"YourUserName/{quantized_model_dir}"
|
# repo_id = f"YourUserName/{quantized_model_dir}"
|
||||||
# commit_message = f"AutoGPTQ model for {pretrained_model}: {quantize_config.bits}bits, gr{quantize_config.group_size}, desc_act={quantize_config.desc_act}"
|
# commit_message = f"AutoGPTQ model for {pretrained_model_dir}: {quantize_config.bits}bits, gr{quantize_config.group_size}, desc_act={quantize_config.desc_act}"
|
||||||
# model.push_to_hub(repo_id, commit_message=commit_message, use_auth_token=True)
|
# model.push_to_hub(repo_id, commit_message=commit_message, use_auth_token=True)
|
||||||
|
|
||||||
# alternatively you can save and push at the same time
|
# alternatively you can save and push at the same time
|
||||||
# (uncomment the following three lines to enable this feature)
|
# (uncomment the following three lines to enable this feature)
|
||||||
# repo_id = f"YourUserName/{quantized_model_dir}"
|
# repo_id = f"YourUserName/{quantized_model_dir}"
|
||||||
# commit_message = f"AutoGPTQ model for {pretrained_model}: {quantize_config.bits}bits, gr{quantize_config.group_size}, desc_act={quantize_config.desc_act}"
|
# commit_message = f"AutoGPTQ model for {pretrained_model_dir}: {quantize_config.bits}bits, gr{quantize_config.group_size}, desc_act={quantize_config.desc_act}"
|
||||||
# model.push_to_hub(repo_id, save_dir=quantized_model_dir, use_safetensors=True, commit_message=commit_message, use_auth_token=True)
|
# model.push_to_hub(repo_id, save_dir=quantized_model_dir, use_safetensors=True, commit_message=commit_message, use_auth_token=True)
|
||||||
|
|
||||||
# load quantized model to the first GPU
|
# load quantized model to the first GPU
|
||||||
|
|
|
@ -16,9 +16,9 @@
|
||||||
</h4>
|
</h4>
|
||||||
|
|
||||||
## 新闻或更新
|
## 新闻或更新
|
||||||
|
- 2023-05-30 - (更新) - 支持从 🤗 Hub 下载量化好的模型或上次量化好的模型到 🤗 Hub。
|
||||||
- 2023-05-27 - (更新) - 支持以下模型的量化和推理: `gpt_bigcode`, `codegen` 以及 `RefineWeb/RefineWebModel`(falcon)。
|
- 2023-05-27 - (更新) - 支持以下模型的量化和推理: `gpt_bigcode`, `codegen` 以及 `RefineWeb/RefineWebModel`(falcon)。
|
||||||
- 2023-05-04 - (更新) - 支持在 `not desc_act or group_size == -1` 的情况下使用更快的 cuda 算子。
|
- 2023-05-04 - (更新) - 支持在 `not desc_act or group_size == -1` 的情况下使用更快的 cuda 算子。
|
||||||
- 2023-04-29 - (更新) - 支持从指定的模型权重文件名或量化配置(quantize_config)加载量化过的模型。
|
|
||||||
|
|
||||||
*获取更多的历史信息,请转至[这里](docs/NEWS_OR_UPDATE.md)*
|
*获取更多的历史信息,请转至[这里](docs/NEWS_OR_UPDATE.md)*
|
||||||
|
|
||||||
|
@ -138,13 +138,13 @@ model.save_quantized(quantized_model_dir, use_safetensors=True)
|
||||||
# 或者可以使用 use_auth_token="hf_xxxxxxx" 来显式地添加账户认证 token
|
# 或者可以使用 use_auth_token="hf_xxxxxxx" 来显式地添加账户认证 token
|
||||||
# (取消下面三行代码的注释来使用该功能)
|
# (取消下面三行代码的注释来使用该功能)
|
||||||
# repo_id = f"YourUserName/{quantized_model_dir}"
|
# repo_id = f"YourUserName/{quantized_model_dir}"
|
||||||
# commit_message = f"AutoGPTQ model for {pretrained_model}: {quantize_config.bits}bits, gr{quantize_config.group_size}, desc_act={quantize_config.desc_act}"
|
# commit_message = f"AutoGPTQ model for {pretrained_model_dir}: {quantize_config.bits}bits, gr{quantize_config.group_size}, desc_act={quantize_config.desc_act}"
|
||||||
# model.push_to_hub(repo_id, commit_message=commit_message, use_auth_token=True)
|
# model.push_to_hub(repo_id, commit_message=commit_message, use_auth_token=True)
|
||||||
|
|
||||||
# 或者你也可以同时将量化好的模型保存到本地并上传至 Hugging Face Hub
|
# 或者你也可以同时将量化好的模型保存到本地并上传至 Hugging Face Hub
|
||||||
# (取消下面三行代码的注释来使用该功能)
|
# (取消下面三行代码的注释来使用该功能)
|
||||||
# repo_id = f"YourUserName/{quantized_model_dir}"
|
# repo_id = f"YourUserName/{quantized_model_dir}"
|
||||||
# commit_message = f"AutoGPTQ model for {pretrained_model}: {quantize_config.bits}bits, gr{quantize_config.group_size}, desc_act={quantize_config.desc_act}"
|
# commit_message = f"AutoGPTQ model for {pretrained_model_dir}: {quantize_config.bits}bits, gr{quantize_config.group_size}, desc_act={quantize_config.desc_act}"
|
||||||
# model.push_to_hub(repo_id, save_dir=quantized_model_dir, use_safetensors=True, commit_message=commit_message, use_auth_token=True)
|
# model.push_to_hub(repo_id, save_dir=quantized_model_dir, use_safetensors=True, commit_message=commit_message, use_auth_token=True)
|
||||||
|
|
||||||
# 加载量化好的模型到能被识别到的第一块显卡中
|
# 加载量化好的模型到能被识别到的第一块显卡中
|
||||||
|
|
|
@ -1,4 +1,5 @@
|
||||||
## <center>News or Update</center>
|
## <center>News or Update</center>
|
||||||
|
- 2023-05-30 - (Update) - support download/upload quantized model from/to 🤗 Hub.
|
||||||
- 2023-05-27 - (Update) - Support quantization and inference for `gpt_bigcode`, `codegen` and `RefineWeb/RefineWebModel`(falcon) model types.
|
- 2023-05-27 - (Update) - Support quantization and inference for `gpt_bigcode`, `codegen` and `RefineWeb/RefineWebModel`(falcon) model types.
|
||||||
- 2023-05-04 - (Update) - Support using faster cuda kernel when `not desc_act or group_size == -1`
|
- 2023-05-04 - (Update) - Support using faster cuda kernel when `not desc_act or group_size == -1`
|
||||||
- 2023-04-29 - (Update) - Support loading quantized model from arbitrary quantize_config and model_basename.
|
- 2023-04-29 - (Update) - Support loading quantized model from arbitrary quantize_config and model_basename.
|
||||||
|
|
Loading…
Add table
Reference in a new issue