-
-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an installation guide for cloud server users. #1261
Comments
I don't use Cloud servers myself, and don't really have the time to setup on cloud and support updating the instructions. I welcome PRs in this area though. You may find some useful information here. Equally, it may be out of date: |
Well I guess I have to be the one to add some cloud tutorials here: @torzdf if you see this please consider to add this into your tutorials, this can be really helpful for beginners without local GPUs. For users first time setting up such environments, I strongly recommend you to follow the following steps or you might waste a lot of time just to set up a proper environment..... My VM: Ubuntu 2004 GRID 11.1 4 vCPUS 20 vRAM with 1/4 NVIDIA T4 GPU
I don't really recommend google colab, unless you have colab pro. So if you don't have Pro, better not use colab. For buying VMs: Please be aware that choosing an image with NVIDIA in its name does not necessarily mean that VM has a GPU !!! For the image to work, you have to choose a VM with GPU !!! The funny thing is, you can buy a VM without a GPU and install NVIDIA image on it, this can be highly misleading to beginners. So be careful about this, if you want to do what I exactly did to set up my cloud ML environment, I suggest you buy a VM with the properties I mentioned above. I bought it at Tencent Cloud in HK zone, which is only like 0.5$ per hour, yes Chinese things might be unreliable but at the end of the day some of them do are reliable and my VM is one of them. Tencent Cloud has a GPU trial for new users, costing only around 0.2$ for 15 days. ( Only 15 days ) You can buy such VMs at Google, aws, azure etc. , but usually more expensive.
The first thing you should do is :
This is to make sure your VM has a NVIDIA driver or even GPU. If the output says " Ensure the driver is installed and up running", there's a very big chance it doesn't even have a GPU, so please save yourself some time and go find another one.
Now go to Fill in whatever it wants to finish your register, it doesn't really matter what you fill in. Then at the NGC main page at the right corner where your avatar is, hover it and click settings. Click Set API Key Generate an API key and save it anyway you like. Go back to the previous page Click Install NGC CLI. Choose LINUX AMD64 and do exactly what it tells you to do. If it asks for any key, it means the API key I meant earlier. Go back to the place where you set your API key, and do the commands it tells you to. Then, install docker on your VM, some VMs may have one installed already.
This two commands should work for most VMs, if they don't, use your errors to google a solution for your VM. Open a tmux session to keep everything you will do later permanently:
Then set up the NGC container using this command:
This command should be a whole but I don't know why it shows as two commands, just join them together manually. This can also be quite slow the first time. Then do this to install google drive commands. If you figure out an easier way, do that and comment down below, this is the only part where I feel wierd about doing. Make sure you are in /workspace at this moment, if not
At this point you will be asked to verify your google account, just do whatever it says. To upload files or directory:
OR
I believe you can also download files from google drive in a similar way like that. Then do this to install some dependencies you have to install manually:
Even though the tkinter or whatever it is called is for the GUI, for whatever reasons, CLI users still have to install it. Now clone the faceswap respo into /workspace
and
Then install the requirements by:
Then configure the deepfake:
Ignore " CUDA / Cudnn not found " leave blank for that tensorflow thing For your training images, I suggest you extract them at local, zip them up, and upload them to an URL, then curl them to /workspace/faceswap by:
OR upload your training data to google drives first, then download it to your VM using sounds a bit complex, but I did this because it is much easier for me to test different VMs and find the right one, and in the long term this method does make things more convenient. Then you can start training your model:
At this point the script should be running just fine, you can see outputs like [#130414] Saved model: A loss 0.0224 B loss 0.02614 Something like that The [#130414] suggests that the process has finished 130414 iterations. For deepfake, you should try to achieve 80,000 iterations at minimum for a decent result, assuming that your training data is of good quality. If you decides to test out your model: press enter, wait for the training to stop cd to /workspace
Upload your model folder to your google drive.
( If your followed my commands exactly, the model folder should be Yes you can start the training back on now to save yourself some time and money. go do google drive via your browser and download that model folder to local. you can delete it in google drive for easier upload next time. convert your video using the downloaded model. For this, go to USAGE.md of this respo. If you didn't start the training earlier, remember to turn it back on. You have to stop the training for the upload. I suggest you convert things locally because the model is much smaller comparative to your training data files, thus you need to more time downloading things if you decide to convert them at the VM. Please don't delete your tmux session, it should be named 0 by default. To get back to the NGC container every time logging on to your VM, you should:
if it failed:
to see if you named it sth else If it says: No tmux running Congratulations, you have to redo all the steps above and you also lost your trained model. So don't delete that tmux session, watch out for hot keys like ctrl-c and ctrl-z, there should be no reason for you to jump back to your default user directory, you really don't need to. If you want to train 2 models at the same time on one VM, I don't recommend doing that cause the speed is basically the same if you train them one at a time consecutively. Hope this can help. If you realize any improvements, please comment down below to help more people. |
Many developers, especially MAC users, don't have computers with NVIDIA or AMD GPU, thus we have to use cloud servers with GPU.
But we (specifically people like me) are struggling to set up the dependencies for these servers, like installing CUDA and CUDNN.
Yes, I did read tons of installation guide, including the NVIDIA developer installation guide, but still, I am trying to get CUDNN installed.
The text was updated successfully, but these errors were encountered: