AutoGPTQ/auto_gptq/modeling
潘其威(William) 1e353a8dc5
Merge pull request #24 from PanQiWei/speedup_quantization
Offloading and Multiple devices quantization/inference
2023-04-28 18:50:12 +08:00
..
__init__.py fix device mismatch when directly using model to inference after quantization 2023-04-28 16:41:46 +08:00
_base.py Merge pull request #24 from PanQiWei/speedup_quantization 2023-04-28 18:50:12 +08:00
_const.py add gpt2 2023-04-28 09:13:22 +09:00
_utils.py support conv1d,conv2d 2023-04-28 09:13:00 +09:00
auto.py add gpt2 2023-04-28 09:14:05 +09:00
bloom.py support dispatch layers to different devices when loading pretrained model before quantization 2023-04-27 02:24:08 +08:00
gpt2.py fix device mismatch when directly using model to inference after quantization 2023-04-28 16:41:46 +08:00
gpt_neox.py support dispatch layers to different devices when loading pretrained model before quantization 2023-04-27 02:24:08 +08:00
gptj.py support dispatch layers to different devices when loading pretrained model before quantization 2023-04-27 02:24:08 +08:00
llama.py support dispatch layers to different devices when loading pretrained model before quantization 2023-04-27 02:24:08 +08:00
moss.py support dispatch layers to different devices when loading pretrained model before quantization 2023-04-27 02:24:08 +08:00
opt.py support dispatch layers to different devices when loading pretrained model before quantization 2023-04-27 02:24:08 +08:00