-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How much GPU does this work usually require? #26
Comments
@WangLanxiao Thanks for your question, we did use an Nvidia Titan Xp to run all of our experiments, and the model did use 12Gb. Maybe the error is related to newer PyTorch releases. What happens if you try to reduce the image size? |
which kind of pytorch should I use?now I use pytorch1.0.1 to train it. thanks for your reply! |
@WangLanxiao Well, we did develop this model on a very old version of PyTorch (0.2.0), but it worked as far as 0.4.0/1, maybe it is related to the convolution mode CUDNN selects during training. Please follow this PyTorch forum discussion to get more insights. https://discuss.pytorch.org/t/what-does-torch-backends-cudnn-benchmark-do/5936 I forgot to tell you, that unfortunately, DMN is not paralellizable, due to non-convergence during training. |
what cause DMN is not paralellizable? parser.add_argument('--visdom', type=str, default=None, |
I think we have an error related to the lr setup that disables correct gradient update after allreduce on all gpus.
|
how can i use the test_dmn?I do not find some main function about test dmn |
Are you referring to using DMN on evaluation mode? |
yes,i have used evaluation function in train.py.and i get the MIOU.But it do not use test_dmn.py |
Actually, that's the expected behaviour, to evaluate a model, a combination of |
when i change batchsize =2.and give suitable img_w and img_h to dataloader,but |
It is not possible to increase the batch size on DMN, by the same reason that disables the model to be parallelized |
@WangLanxiao I suppose this issue was fixed, as we haven't report any activity during the past 7 days, feel free to reopen it if your issue has not been solved or open a new one if you have a new question. |
Hi @andfoy, I was trying on custom dataset but facing an issue. Due to model large gpu requirement, I can't train above 128 image size on single 16 gb nvidia gpu. As model is non-paralellizable, I can't use multiple GPUs. Can you guide any way to train on high resolution data, as 128 image size is very difficult to interpret. |
@Shivanshmundra which version of PyTorch are you using, AFAIK the model was trained on COCO images at 512px on the largest size |
I am using the latest version of pytorch, will downgrading pytorch version result in less gpu usage? |
It is possible, this model was developed using a very old version of PyTorch |
I just tried with pytorch 0.4, with cuda 9.0 as you suggested in README, but still it's giving not enough memory error above 128 resolution. |
when i train the project on the GPU0(8G)
i get:
RuntimeError: CUDA out of memory. Tried to allocate 9.50 MiB (GPU 0; 7.93 GiB total capacity; 6.49 GiB already allocated; 14.81 MiB free; 30.49 MiB cached)
when i train the project on the GPU0(12G)
i get:
RuntimeError: CUDA out of memory. Tried to allocate 16.88 MiB (GPU 0; 11.90 GiB total capacity; 10.58 GiB already allocated; 18.44 MiB free; 62.29 MiB cached)
when i train the project on the GPU0(12G) GPU1(12G)
I add:
if args.cuda:
net = nn.DataParallel(net , device_ids=[0,1])
net.cuda()
i get:
RuntimeError: CUDA out of memory. Tried to allocate 16.88 MiB (GPU 0; 11.90 GiB total capacity; 10.58 GiB already allocated; 18.44 MiB free; 62.29 MiB cached)
what can I do to solove this question?Thanks
The text was updated successfully, but these errors were encountered: