-
Here is a simple piece of code: using JLD2, Lux, LuxCUDA
@load "./NN/luxmodel.jld2" model ps st
ps = ps |> gpu_device()
input = CUDA.ones(Float64, 9, 1024*256)
@time for i = 1:10
Lux.apply(model, input, ps, st)
end I just found that if the model is being run only once, the time it takes is But if it is being run 10 times inside a for loop, the time it takes is I think this problem might be related to GC, how can I avoid it? |
Beta Was this translation helpful? Give feedback.
Answered by
FR13ndSDP
Dec 17, 2023
Replies: 1 comment
-
Just found out that I should use |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
avik-pal
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Just found out that I should use
CUDA.@time
instead of@time
. Converting input toFloat32
and running the model usingLux.apply(model, cu(input), ps, st)
is much faster.