AutoGPTQ/auto_gptq/nn_modules
2023-08-07 18:59:04 +08:00
..
fused_modules support inherit one of the three fused attention class and customize attn_bias building logic 2023-08-07 18:59:04 +08:00
qlinear remove unnecessary lines 2023-08-06 16:24:44 +08:00
triton_utils support 32dim triton kernel 2023-06-02 19:04:12 +09:00
__init__.py refactor file structure 2023-04-25 18:58:20 +08:00
_fused_base.py add trainable mode 2023-05-26 13:11:30 +08:00
fused_gptj_attn.py add trainable mode 2023-05-26 13:11:30 +08:00
fused_llama_attn.py add trainable mode 2023-05-26 13:11:30 +08:00
fused_llama_mlp.py update FusedLlamaMLPForQuantizedModel for general usage purpose 2023-05-27 07:47:20 +08:00