Some questions about Auto Mixed Precision Training /expected scalar type Float but found Half #241
xxw11
started this conversation in
Community | General
Replies: 2 comments 11 replies
-
Hi, we are trying to reproduce your problem. Could you tell the exact version of your |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
1.当我使用如下配置时会发生错误
fp16 = dict(
mode=AMP_TYPE.NAIVE
)
但是我如果切换fp16的类型为AMP_TYPE.TORCH将会正常运行,
fp16 = dict( mode=AMP_TYPE.TORCH )
我不知道这可能是哪方面的原因?
2.似乎fp16和zero不能一起使用,
我会报如下错误
It is not allowed to set fp16 and zero configuration in your config file at the same time
但是当我注释掉关于fp16的设置会出现如下错误
下面是我的部分设置
`
BATCH_SIZE = 8
SEQ_LEN = 2048
NUM_EPOCHS = 50
TENSOR_PARALLEL = 4
zero = dict(
level=2,
dynamic_loss_scale=True,
overlap_comm=True,
clip_grad=1.0,
cpu_offload=False,
)
gradient_accumulation = 5
optimizer = dict(
type=SGD,
lr=0.00015,
weight_decay=1e-2,
)
loss = dict(
type=GPTLMLoss,
)
model = dict(
type=gpt2_Y,
checkpoint=True,
)
parallel = dict(
pipeline=1,
tensor=dict(size=TENSOR_PARALLEL, mode='2d'),
)
`
Beta Was this translation helpful? Give feedback.
All reactions