I am trying to reproduce the StreetCLIP.
-
Two-stage linear prob
-
"A Street View photo in {country}."
-
"A Street View photo from {city}."
-
-
unzip gps_query_imgs.zip mv gps_query_imgs ./data/img2gps_dataset/image
-
unzip im2gps3ktest.zip mv im2gps3ktest ./data/img2gps_dataset/image
git clone https://huggingface.co/geolocal/StreetCLIP
python eval_img2gps.py --model-name ViT-B-32 --ckpt-path path/to/StreetCLIP/ckpt
- n=2997
Model | Source | 1KM | 25KM | 200KM | 750KM | 2,500KM |
---|---|---|---|---|---|---|
CLIP@ViT-L-14-336 | Paper | - | 19.5 | 34.0 | 60.0 | 78.1 |
CLIP@ViT-L-14-336 | OpenAI's CLIP-reproduce | 4.07 | 20.09 | 31.90 | 54.72 | 72.07 |
StreetCLIP@ViT-L-14-336 | Paper | - | 22.4 | 37.4 | 61.3 | 80.4 |
StreetCLIP@ViT-L-14-336 | StreetCLIP-reproduce | 4.24 | 21.79 | 34.73 | 55.52 | 74.84 |
StreetCLIP@ViT-L-14-336 | StreetCLIP-reproduce-USStatesPrompt | 4.24 | 22.69 | 36.17 | 57.72 | 77.28 |
CLIP@ViT-B-32 | OpenAI's CLIP | 1.67 | 8.88 | 14.65 | 32.87 | 53.72 |
CLIP@ViT-B-16 | OpenAI's CLIP | 2.47 | 12.41 | 20.39 | 39.71 | 61.86 |
CLIP@ViT-L-14 | OpenAI's CLIP | 3.34 | 17.68 | 28.86 | 51.55 | 68.90 |
CLIP@ViT-H-14 | OpenCLIP | 3.94 | 18.69 | 30.60 | 51.95 | 71.10 |
- n=237
Model | Source | 1KM | 25KM | 200KM | 750KM | 2,500KM |
---|---|---|---|---|---|---|
CLIP@ViT-L-14-336 | Paper | - | 27.0 | 42.2 | 71.7 | 86.9 |
CLIP@ViT-L-14-336 | OpenAI's CLIP-reproduce | 4.64 | 26.58 | 40.08 | 63.71 | 80.17 |
StreetCLIP@ViT-L-14-336 | Paper | - | 28.3 | 45.1 | 74.7 | 88.2 |
StreetCLIP@ViT-L-14-336 | StreetCLIP-reproduce | 5.49 | 28.27 | 42.62 | 67.51 | 80.17 |
StreetCLIP@ViT-L-14-336 | StreetCLIP-reproduce-USStatesPrompt | 5.49 | 29.96 | 45.57 | 70.46 | 83.54 |
CLIP@ViT-B-32 | OpenAI's CLIP | 2.11 | 16.46 | 26.58 | 46.41 | 66.24 |
CLIP@ViT-B-16 | OpenAI's CLIP | 2.53 | 19.83 | 31.65 | 52.74 | 71.31 |
CLIP@ViT-L-14 | OpenAI's CLIP | 4.22 | 24.05 | 35.44 | 58.65 | 77.63 |
CLIP@ViT-H-14 | OpenCLIP | 5.49 | 29.54 | 44.30 | 65.82 | 79.75 |