You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently re-implementing SparK and stumbled over the masking token that gets re-introduced during densifying. https://github.com/keyu-tian/SparK/blob/a63e386f8e5186bc07ad7fce86e06b08f48a61ea/pretrain/spark.py#L99C1-L110C9
I was wondering if you tested how important the Mask Tokens are or if you have any intuition for what the utility of them are. For Transformers I get that one needs to have a non-zero mask tokens to attend to it somehow and change their values, but is the same still true/necessary for CNNs?
Did you by any chance ablate the benefits of the mask token (+projection) against just having the zeros post-masking that get passed into the decoder?
Thanks for the great work.
Cheers,
Tassilo
The text was updated successfully, but these errors were encountered:
Hello,
I am currently re-implementing SparK and stumbled over the masking token that gets re-introduced during
densifying
.https://github.com/keyu-tian/SparK/blob/a63e386f8e5186bc07ad7fce86e06b08f48a61ea/pretrain/spark.py#L99C1-L110C9
I was wondering if you tested how important the Mask Tokens are or if you have any intuition for what the utility of them are. For Transformers I get that one needs to have a non-zero mask tokens to attend to it somehow and change their values, but is the same still true/necessary for CNNs?
Did you by any chance ablate the benefits of the mask token (+projection) against just having the zeros post-masking that get passed into the decoder?
Thanks for the great work.
Cheers,
Tassilo
The text was updated successfully, but these errors were encountered: