Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

文档中提到的测试例运行报错 #70483

Open
chengyi192 opened this issue Dec 26, 2024 · 2 comments
Open

文档中提到的测试例运行报错 #70483

chengyi192 opened this issue Dec 26, 2024 · 2 comments
Assignees
Labels

Comments

@chengyi192
Copy link

chengyi192 commented Dec 26, 2024

https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/paddle_v3_features/auto_parallel_cn.html#sanzidongbingxinghefenbushicelve

paddle版本:paddlepaddle_gpu-3.0.0.dev20241223-cp310-cp310-linux_x86_64.whl
文档中3.1 数据并行 提到的示例代码运行正常。
文档中3.4 3D 混合并行策略提到的示例代码,直接运行会报错。报错信息:

    for step, inputs in enumerate(dataloader):
  File "/usr/local/lib/python3.10/dist-packages/paddle/distributed/auto_parallel/api.py", line 3309, in __next__
    batch_data = next(self.iter)
AttributeError: 'ShardDataloader' object has no attribute 'iter'
@will-jl944
Copy link
Contributor

paddlepaddle_gpu-3.0.0.dev20241223-cp310-cp310-linux_x86_64.whl

这是develop版本的paddle,对应的文档是:https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/paddle_v3_features/auto_parallel_cn.html#d

@chengyi192
Copy link
Author

chengyi192 commented Dec 26, 2024

paddlepaddle_gpu-3.0.0.dev20241223-cp310-cp310-linux_x86_64.whl

这是develop版本的paddle,对应的文档是:https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/paddle_v3_features/auto_parallel_cn.html#d

感谢您的帮助,这次可以运行了,不过文档4.5 动转静训练提到的测试例还是无法运行:
该章节提到要添加一块代码,但是我在4.4 3D 混合并行策略 代码中添加后,会报错:

报错log:
INFO 2024-12-26 10:00:20,273 helper.py:274] start to build program for mode = train.

C++ Traceback (most recent call last):
0   paddle::pybind::static_api_mean(_object*, _object*, _object*)
1   CallStackRecorder::AttachToOps()
2   CallStackRecorder::GetOpCallstackInfo()

Error Message Summary:
FatalError: `Segmentation fault` is detected by the operating system.
  [TimeInfo: *** Aborted at 1735207220 (unix time) try "date -d @1735207220" if you are using GNU date ***]
  [SignalInfo: *** SIGSEGV (@0x0) received by PID 37717 (TID 0x7fa67a0a1740) from PID 0 ***]

#代码如下
opt = dist.shard_optimizer(opt)
#添加文档提到的代码
dist_model = dist.to_static(
    model, dataloader, paddle.mean, opt
)

dist_model.train()
for step, inputs in enumerate(dataloader()):
    data = inputs
    loss = dist_model(data)
    print(step, loss)
exit()

for step, inputs in enumerate(dataloader()):
    data = inputs[0]
    logits = model(data)
    loss = paddle.mean(logits)
    loss.backward()
    opt.step()
    opt.clear_grad()

请问是我哪里操作的不对吗?请问在你那里可以正常运行吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants