Commit graph

282 commits

Author SHA1 Message Date
qwopqwop200
8c7c806d36
if exllama auto diable fused attention 2023-08-07 19:24:16 +09:00
qwopqwop200
11afc47f7f
support gqa 2023-08-07 19:00:05 +09:00
qwopqwop200
2f48780165
fix bug disable exlllama 2023-08-07 16:28:30 +09:00
qwopqwop200
25972d65bf
support static_groups and fix bug 2023-08-07 16:27:48 +09:00
qwopqwop200
6233afce3b
support static_groups 2023-08-07 16:25:44 +09:00
fxmarty
71f23268eb
Merge pull request #1 from qwopqwop200/exllama-q4-kernel
Exllama q4 kernel
2023-08-05 00:15:22 +09:00
Felix Marty
d0608b09db rocm support 2023-08-04 13:38:02 +00:00
Félix Marty
4fb3e20c5e Merge branch 'main' into exllama-q4-kernel 2023-08-04 15:13:34 +02:00
潘其威(William)
5d8fa85029
Merge pull request #226 from LeiWang1999/fix/general_attr
Register quant params in GeneralQuantLinear for friendly post process.
2023-08-04 18:42:54 +08:00
潘其威(William)
45152b7add
Merge pull request #220 from fxmarty/fix-revison-loading
Fix revision used to load the quantization config
2023-08-04 18:25:22 +08:00
qwopqwop200
79ab5076c7
revert fused_llama_attn.py 2023-08-04 18:19:54 +09:00
qwopqwop200
068210d0b7
exllama support flash attention 2023-08-03 16:30:16 +09:00
qwopqwop200
7a7df5655a
support group query attention 2023-08-03 16:08:49 +09:00
leiwang1999
a0de5c2c51 regist buffer of general quant linear 2023-08-03 05:15:09 +00:00
qwopqwop200
3fc097dcd8
change pcak func support only 4 bit 2023-08-01 20:01:45 +09:00
qwopqwop200
a1fd81c72d
if training disable exllama 2023-08-01 12:29:58 +09:00
qwopqwop200
a60c9a8552
add pack fun 2023-08-01 12:22:41 +09:00
Felix Marty
339c57a902 fix 2023-07-31 15:57:44 +00:00
Felix Marty
129fa4b67e act-order now works fine 2023-07-31 15:36:58 +00:00
Felix Marty
1f99b94ae2 fix revision 2023-07-31 15:03:33 +00:00
Felix Marty
5660b22f28 fix bug quantization config loading 2023-07-31 14:28:37 +00:00
Felix Marty
38447262c0 fix fused attn 2023-07-31 13:46:32 +00:00
Felix Marty
760667dccc cleaning 2023-07-31 11:58:10 +00:00
Felix Marty
179776bd1d exllama kernel 2023-07-31 11:50:45 +00:00
Felix Marty
23eb519e68 typo 2023-07-28 17:45:34 +00:00
Felix Marty
caf6625b68 warning about triton 2023-07-28 17:42:37 +00:00
PanQiWei
1138240385 update version to 0.3.2 2023-07-26 18:40:44 +08:00
PanQiWei
ff1f100ded remove argument 'save_dir' in method from_quantized 2023-07-26 17:58:04 +08:00
PanQiWei
722a621aaa simplified code 2023-07-26 17:53:47 +08:00
潘其威(William)
22748dd2b7
Merge pull request #209 from PanQiWei/fix_no_cuda_kernel
Fix error raised when CUDA kernels are not installed
2023-07-26 14:07:30 +08:00
潘其威(William)
fd24e84eb2
Merge pull request #166 from casperbh96/main
[FEATURE] Implement perplexity metric to compare against llama.cpp
2023-07-26 14:04:51 +08:00
PanQiWei
5883b45d73 fix error raised when cuda kernels are not installed 2023-07-26 13:59:28 +08:00
潘其威(William)
bbc4a7c455
Merge pull request #208 from TheBloke/TB_Add_SafeTensors_Metadata
Add Safetensors metadata saving, with some values saved to each .safetensor file
2023-07-26 11:54:47 +08:00
潘其威(William)
228867a753
Merge pull request #207 from TheBloke/TB_version
Add a central version number
2023-07-26 11:27:23 +08:00
潘其威(William)
2456f71125
Merge pull request #205 from TheBloke/TB_fix_revision
Fix `revision` and other huggingface_hub kwargs in .from_quantized()
2023-07-26 10:34:43 +08:00
TheBloke
2647c92743 safetensors_metadata: add conversion to str() for input metadata to avoid errors from save_safe. Warn if this results in keys being overwritten. 2023-07-25 21:14:21 +00:00
TheBloke
ee7d80945b Add version to metadata using new value 2023-07-25 14:25:24 +00:00
TheBloke
3817d154af Merge branch 'TB_version' into TB_Add_SafeTensors_Metadata 2023-07-25 14:09:29 +00:00
TheBloke
7575eae6ab Added to __init__.py to show a central version number. Also slightly adjust way version is stored in setup.py to make it easier to edit on version update. Bump version to 0.3.1 in both 2023-07-25 14:06:51 +00:00
TheBloke
eeaf5ebc53 Extend huggingface_hub features to AutoGPTQForCausalLM.from_pretrained() so models can be quantised from the hub including using a private token and revision/branch etc 2023-07-25 13:26:37 +00:00
TheBloke
c9124e3fc7 Fix revision and other huggingface_hub args for .from_quantized(), which were not being passed through 2023-07-25 12:48:33 +00:00
TheBloke
3f359fc778 Add support for Safetensors metadata 2023-07-25 11:30:39 +00:00
qwopqwop200
9578c59d31
fix cuda bug 2023-07-25 16:50:05 +09:00
tc
e28e8ee809 Add support for InternLM 2023-07-07 09:25:40 -07:00
Casper
992a0ab102 Reference Perplexity class 2023-06-19 20:03:32 +02:00
Casper
b351c8c547 Add perplexity calculation class 2023-06-19 20:03:22 +02:00
潘其威(William)
046c031139
Merge pull request #141 from AngainorDev/patch-1
Fix error message
2023-06-19 10:11:10 +08:00
LaaZa
03577a7698 Rename the class to match reference capitalisation 2023-06-18 21:01:07 +03:00
LaaZa
9fd558f2ba Add support for Baichuan 2023-06-18 20:13:29 +03:00
Angainor Development
e75611e1b7
Fix error message 2023-06-05 22:19:09 +02:00