Commit graph

333 commits

Author SHA1 Message Date
Automation Pipeline
9fb99f61e7 Merge remote-tracking branches 'laaza/Mistral' and 'laaza/MPT' 2023-10-22 07:53:59 -04:00
Vivek Khandelwal
e4b2493733
Modify qlinear_cuda for tracing the GPTQ model (#367)
Changes:
-- The change to the torch.bitwise_and is done because during
   tracing this model the current usage of the torch.bitwise_and
   result in an in-place variant of this op, resulting in an issue
   during the downstream lowering pipeline of the traced model via
   Torch-MLIR and IREE-SHARK. That's why the op usage is changed to
   not result in an in-place variaunt.

-- The change to the torch.matmul call in the forward function is
   done because currently, it assumes that the weights will always
   be of fp16 type. But, when the model is executed for the float32
   weights it results in an error. That's why the current change
   cast the LHS of the matmul to the same type as the RHS one.

Both the above changes doesn't affect the model in any way.

Signed-Off By: Vivek Khandelwal <vivek@nod-labs.com>
2023-10-21 01:06:01 +09:00
LaaZa
4b7389ddb7 Merge branch 'main' into MPT
# Conflicts:
#	auto_gptq/modeling/__init__.py
#	auto_gptq/modeling/_const.py
#	auto_gptq/modeling/auto.py
2023-10-04 20:21:49 +03:00
LaaZa
99acbead42 Add support for Mistral models. 2023-10-04 01:07:55 +03:00
student686
c1a3013c45 import exllama QuantLinear instead of exllamav2's 2023-09-27 11:05:13 +08:00
潘其威(William)
3de7fbb0d5
Revert "fix bug(breaking change) remove (zeors -= 1)" 2023-09-27 10:37:31 +08:00
潘其威(William)
62fd0371ac
Merge branch 'main' into main 2023-09-26 14:09:04 +08:00
潘其威(William)
b461b6fa13
Merge pull request #335 from z80maniac/ignore-extra-args
Ignore unknown parameters in quantize_config.json
2023-09-26 14:00:38 +08:00
潘其威(William)
04db761eed
Merge pull request #347 from alex4321/peft-model-use-adapter-name
Use `adapter_name` for `get_gptq_peft_model` with `train_mode=True`
2023-09-26 13:55:06 +08:00
Marc Sun
c912bf361a exllamav2 integration 2023-09-25 16:51:18 +00:00
Alexander Pozharskii
0185095402 Use adapter_name for get_gptq_peft_model with train_mode=True 2023-09-24 17:11:19 +04:00
ZXED
121dbd15a5
Ignore unknown parameters in quantize_config.json 2023-09-10 18:39:40 +03:00
qwopqwop200
94de4ef185
GPTQ backward compatibility support 2023-09-08 10:16:29 +09:00
TheBloke
02a87dce76 Add support for Falcon as part of Transformers 4.33.0, including new Falcon 180B 2023-09-06 18:03:33 +01:00
qwopqwop200
6b1ceb1897
if exllama auto diable fused attention 2023-09-06 18:14:04 +09:00
qwopqwop200
ad5b0d72ee
fix bug 2023-09-06 16:41:41 +09:00
潘其威(William)
1793227283
Merge pull request #311 from SunMarc/fix_max_input_length
fix typo in max_input_length
2023-09-01 10:21:54 +08:00
潘其威(William)
782bb603d9
Merge pull request #303 from JustinLin610/patch-1
Update qwen.py for Qwen-VL
2023-09-01 10:20:24 +08:00
Marc Sun
04b321da89
fix type 2023-08-31 14:07:16 -04:00
潘其威(William)
1e938e6bad
Merge pull request #310 from PanQiWei/fix_to()_metod_bug
fix model type changed after calling .to() method
2023-08-31 19:04:02 +08:00
PanQiWei
c7021f0f44 fix model type changed after calling .to() method 2023-08-31 18:39:03 +08:00
qwopqwop200
45a1ee4d84
install check qigen 2023-08-31 14:37:39 +09:00
Junyang Lin
7c39a3a315
Update qwen.py for Qwen-VL
add transformer.visual as outside layer for the adaptation to Qwen-VL
2023-08-30 16:29:55 +08:00
PanQiWei
604c96144f temporarily set the version of main branch to 0.5.0.dev0 2023-08-25 17:36:23 +08:00
潘其威(William)
e5050a5650
Revert "V0.4.2 release" 2023-08-25 17:26:55 +08:00
潘其威(William)
1049fd014a
Merge pull request #287 from PanQiWei/v0.4.2-release
V0.4.2 release
2023-08-25 17:26:41 +08:00
qwopqwop200
6a9d80eddc Merge remote-tracking branch 'qwopqwop200/main' into main 2023-08-25 18:06:03 +09:00
qwopqwop200
dafdd6189a
duplicate code remove 2023-08-25 14:59:13 +09:00
Félix Marty
8254da4f15 update version 2023-08-24 17:47:14 +02:00
Felix Marty
04730ac66c expose api to set exllama max length 2023-08-24 11:22:15 +00:00
qwopqwop200
f23a06f911
Merge branch 'PanQiWei:main' into main 2023-08-17 15:22:43 +09:00
qwopqwop200
b8a42911a6
qigen refactoring 2023-08-17 15:22:16 +09:00
qwopqwop200
5d5b687ca8
qigen formatting qlinear 2023-08-17 15:19:01 +09:00
qwopqwop200
084c9d8860
name change 2023-08-17 15:17:09 +09:00
PanQiWei
893fc5d7a3 release 0.4.1 2023-08-13 16:35:59 +08:00
PanQiWei
34b4ba451c fix typo 2023-08-13 16:26:02 +08:00
qwopqwop200
051f3facc7
change arguments name 2023-08-11 16:10:32 +09:00
qwopqwop200
a807e038bb
remove many contiguous and change arguments name 2023-08-11 16:09:42 +09:00
qwopqwop200
c591d6a1e1
change name make_quant_cpu to make_quant_qigen 2023-08-11 15:12:33 +09:00
qwopqwop200
2c1afc2ad9
chang name make_quant_cpu to make_quant_qigen 2023-08-11 15:04:58 +09:00
qwopqwop200
aa5528cb10
use_cpu name change and default dtype change 2023-08-11 09:51:36 +09:00
qwopqwop200
870be83bea
Merge branch 'PanQiWei:main' into main 2023-08-10 22:48:30 +09:00
qwopqwop200
7ba78af3ae support cpu 2023-08-10 22:48:04 +09:00
Felix Marty
4af7ea619d patch for transformers compatiblity 2023-08-09 14:23:59 +00:00
PanQiWei
44c7a1a184 make exllama_kernels compilation as optional 2023-08-09 17:42:22 +08:00
PanQiWei
172deae049 expose disable_exllama argument 2023-08-09 12:03:31 +08:00
PanQiWei
86a3d4a094 release 0.4.0 2023-08-09 11:54:31 +08:00
qwopqwop200
fe244503e0
add "," 2023-08-08 19:57:23 +09:00
qwopqwop200
d22f89c524
support qwen 2023-08-08 19:27:43 +09:00
qwopqwop200
dc5541e78a
static groups default value change 2023-08-08 14:11:39 +09:00