Lex Song
f2ab4fab46
Fix CUDA out of memory error in qlinear_old.py
...
Add a missing line from qlinear.py to qlinear_old.py to convert the output tensor.
This resolves a CUDA out of memory error that occurred without this line.
2023-05-20 21:10:11 +08:00
TheBloke
898f1ef62d
Rename 'quant_cuda' to 'autogptq_cuda' to avoid conflicts with existing GPTQ-for-LLaMa installations.
2023-05-20 09:33:51 +01:00
PanQiWei
4bb10fda49
groupsize -> group_size
2023-05-12 13:37:52 +08:00
qwopqwop200
3ff6ab18cb
Merge branch 'main' into faster-llama
2023-05-06 00:20:29 +09:00
TheBloke
1b3329b399
Fix 'groupsize' -> 'group_size' in all other .py files. I haven't touched any CUDA kernels in case there's any complexity there I don't understand
2023-05-05 14:44:16 +01:00
PanQiWei
6cba6e7123
reformat code
2023-05-04 22:16:08 +08:00
qwopqwop200
c359f672a8
support faster and model load strict
2023-05-04 09:04:07 +09:00
qwopqwop200
f0f37c1fe7
fix bug
2023-05-01 18:09:39 +09:00
qwopqwop200
9dfcac8e26
add qlinear_old
2023-05-01 13:03:57 +09:00