Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The finetuning hyperparameters of resnet50 #27

Open
Vickeyhw opened this issue Apr 17, 2023 · 7 comments
Open

The finetuning hyperparameters of resnet50 #27

Vickeyhw opened this issue Apr 17, 2023 · 7 comments

Comments

@Vickeyhw
Copy link

The hyperparameter settings (batch size and learning rate) in the paper seem inconsistent with the code. Which could reproduce the performance (80.6 acc) reported in the paper?

@keyu-tian
Copy link
Owner

keyu-tian commented Apr 19, 2023

You could see https://github.com/keyu-tian/SparK/blob/main/downstream_imagenet/arg.py#L20 and this would result in 80.6 acc.

Training losses of each epoch:

[0.1574, 0.0063, 0.0056, 0.0054, 0.0053, 0.0053, 0.0052, 0.0052, 0.0051, 0.0051, 0.0050, 0.0050, 0.0050, 0.0050, 0.0050, 0.0049, 0.0049, 0.0049, 0.0049, 0.0050, 0.0048, 0.0048, 0.0049, 0.0048, 0.0048, 0.0048, 0.0048, 0.0048, 0.0048, 0.0048, 0.0048, 0.0047, 0.0048, 0.0048, 0.0047, 0.0047, 0.0048, 0.0048, 0.0047, 0.0048, 0.0047, 0.0047, 0.0047, 0.0047, 0.0047, 0.0048, 0.0047, 0.0048, 0.0047, 0.0047, 0.0047, 0.0046, 0.0046, 0.0046, 0.0047, 0.0046, 0.0046, 0.0045, 0.0046, 0.0046, 0.0046, 0.0046, 0.0046, 0.0046, 0.0046, 0.0046, 0.0046, 0.0046, 0.0046, 0.0046, 0.0046, 0.0045, 0.0045, 0.0045, 0.0045, 0.0045, 0.0046, 0.0045, 0.0044, 0.0045, 0.0045, 0.0045, 0.0044, 0.0045, 0.0045, 0.0045, 0.0044, 0.0045, 0.0045, 0.0045, 0.0045, 0.0044, 0.0045, 0.0044, 0.0044, 0.0044, 0.0044, 0.0044, 0.0044, 0.0045, 0.0044, 0.0044, 0.0044, 0.0044, 0.0043, 0.0044, 0.0043, 0.0044, 0.0044, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0043, 0.0042, 0.0042, 0.0042, 0.0042, 0.0042, 0.0042, 0.0042, 0.0042, 0.0041, 0.0042, 0.0042, 0.0042, 0.0041, 0.0041, 0.0041, 0.0041, 0.0041, 0.0041, 0.0042, 0.0041, 0.0041, 0.0041, 0.0041, 0.0040, 0.0041, 0.0040, 0.0040, 0.0041, 0.0041, 0.0040, 0.0040, 0.0040, 0.0040, 0.0040, 0.0039, 0.0040, 0.0040, 0.0039, 0.0039, 0.0040, 0.0040, 0.0040, 0.0039, 0.0040, 0.0039, 0.0039, 0.0040, 0.0039, 0.0038, 0.0039, 0.0039, 0.0039, 0.0039, 0.0038, 0.0039, 0.0038, 0.0038, 0.0038, 0.0039, 0.0038, 0.0038, 0.0038, 0.0038, 0.0038, 0.0038, 0.0037, 0.0038, 0.0038, 0.0038, 0.0037, 0.0037, 0.0038, 0.0037, 0.0037, 0.0037, 0.0037, 0.0037, 0.0037, 0.0037, 0.0037, 0.0037, 0.0037, 0.0036, 0.0036, 0.0036, 0.0037, 0.0037, 0.0037, 0.0036, 0.0036, 0.0036, 0.0036, 0.0036, 0.0036, 0.0036, 0.0035, 0.0035, 0.0036, 0.0035, 0.0035, 0.0035, 0.0035, 0.0035, 0.0036, 0.0036, 0.0034, 0.0035, 0.0035, 0.0035, 0.0034, 0.0035, 0.0034, 0.0034, 0.0034, 0.0034, 0.0034, 0.0034, 0.0034, 0.0034, 0.0034, 0.0034, 0.0034, 0.0034, 0.0035, 0.0034, 0.0034, 0.0033, 0.0034, 0.0034, 0.0034, 0.0034, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0032, 0.0032, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0033, 0.0032, 0.0033, 0.0033, 0.0032, 0.0033, 0.0033, 0.0033, 0.0032, 0.0032, 0.0033, 0.0033]

Best validation accs (EMA model) of each epoch:

[0.12, 0.72, 22.60, 45.40, 52.51, 55.70, 58.43, 60.69, 62.26, 63.42, 64.43, 65.04, 65.57, 66.13, 66.57, 66.89, 67.34, 67.53, 67.94, 68.10, 68.11, 68.37, 68.62, 68.76, 68.87, 68.95, 69.18, 69.34, 69.35, 69.50, 69.67, 69.67, 69.90, 69.98, 69.98, 70.06, 70.10, 70.24, 70.28, 70.35, 70.35, 70.35, 70.44, 70.60, 70.64, 70.79, 71.00, 71.00, 71.00, 71.01, 71.04, 71.20, 71.20, 71.23, 71.23, 71.31, 71.31, 71.43, 71.46, 71.50, 71.60, 71.71, 71.71, 71.73, 71.87, 71.89, 72.13, 72.13, 72.13, 72.13, 72.19, 72.19, 72.25, 72.28, 72.43, 72.50, 72.57, 72.60, 72.69, 72.69, 72.69, 72.69, 72.77, 72.79, 72.93, 72.98, 72.98, 73.05, 73.21, 73.30, 73.30, 73.30, 73.38, 73.51, 73.53, 73.60, 73.61, 73.67, 73.67, 73.67, 73.67, 73.70, 73.79, 73.82, 74.02, 74.02, 74.09, 74.09, 74.09, 74.23, 74.27, 74.31, 74.42, 74.43, 74.50, 74.53, 74.58, 74.64, 74.72, 74.93, 74.93, 74.93, 74.93, 75.04, 75.04, 75.07, 75.14, 75.21, 75.27, 75.33, 75.36, 75.36, 75.45, 75.49, 75.57, 75.62, 75.71, 75.83, 75.83, 75.85, 75.96, 76.01, 76.06, 76.14, 76.16, 76.17, 76.29, 76.29, 76.29, 76.38, 76.45, 76.56, 76.60, 76.64, 76.71, 76.76, 76.80, 76.95, 77.03, 77.10, 77.10, 77.16, 77.18, 77.28, 77.28, 77.37, 77.38, 77.53, 77.59, 77.61, 77.61, 77.74, 77.75, 77.88, 77.96, 77.99, 78.01, 78.09, 78.15, 78.18, 78.24, 78.25, 78.26, 78.37, 78.44, 78.48, 78.63, 78.63, 78.69, 78.69, 78.69, 78.72, 78.73, 78.76, 78.80, 78.88, 78.91, 79.03, 79.07, 79.07, 79.07, 79.09, 79.25, 79.25, 79.25, 79.25, 79.30, 79.30, 79.33, 79.39, 79.44, 79.54, 79.54, 79.59, 79.61, 79.66, 79.66, 79.72, 79.77, 79.82, 79.83, 79.83, 79.89, 79.89, 79.96, 79.97, 80.04, 80.04, 80.06, 80.11, 80.18, 80.18, 80.19, 80.19, 80.19, 80.19, 80.19, 80.22, 80.25, 80.25, 80.25, 80.25, 80.28, 80.30, 80.30, 80.30, 80.30, 80.30, 80.37, 80.39, 80.39, 80.44, 80.44, 80.44, 80.44, 80.44, 80.44, 80.44, 80.44, 80.48, 80.49, 80.49, 80.51, 80.52, 80.53, 80.53, 80.53, 80.53, 80.53, 80.53, 80.53, 80.55, 80.56, 80.58, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59, 80.59]

@Vickeyhw
Copy link
Author

@keyu-tian I have tried this setting except that batch size is 2048, for the reason that 8 gpus cannot accommodate 4096 images. Unfortunately, my training losses became NaN. Does the batch size affect so much?

@keyu-tian
Copy link
Owner

@Vickeyhw can u provide the command and logs?

@Vickeyhw
Copy link
Author

Vickeyhw commented Apr 20, 2023

@keyu-tian The args are:

(nstream_imagenet/main.py, line  29)=> initial args:
{'base_lr': 0.002,
 'batch_size_per_gpu': 256,
 'best_val_acc': 0.0,
 'bs': 2048,
 'clip': -1,
 'cmd': '--local_rank=0 --exp_name res50 --exp_dir '
        './output//Spark_res50_official_finetune_default_lr2e-3_ld0_7_300e '
        '--data_path=imagenet2012/ImageNet_ILSVRC2012 '
        '--model=resnet50 --ep=300 --bs=2048 '
        '--resume_from=./resnet50_1kpretrained_timm_style.pth',

 'cur_ep': '',
 'data_path': '/home/bingxing2/public/imagenet2012/ImageNet_ILSVRC2012',
 'dataloader_workers': 8,
 'device': device(type='cuda', index=0),
 'dist_on_itp': False,
 'dist_url': 'env://',
 'drop_path': 0.05,
 'ema': 0.9999,
 'ep': 300,
 'eval_data_path': '',
 'exp_dir': 'output/Spark_res50_official_finetune_default_lr2e-3_ld0_7_300e',
 'exp_name': 'res50',
 'finish_time': '',
 'first_logging': True,
 'glb_batch_size': 2048,
 'global_rank': 0,
 'img_size': 224,
 'is_local_master': True,
 'is_master': True,
 'local_rank': 0,
 'log_epoch': <bound method FineTuneArgs.log_epoch of FineTuneArgs(prog='main.py', usage=None, description=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=True)>,
 'log_txt_name': 'output/Spark_res50_official_finetune_default_lr2e-3_ld0_7_300e/finetune_log.txt',
 'lr': 0.016,
 'lr_scale': 0.7,
 'mixup': 0.1,
 'model': 'resnet50',
 'opt': 'lamb',
 'remain_time': '',
 'rep_aug': 0,
 'resume_from': './resnet50_1kpretrained_timm_style.pth',
 'sbn': False,
 'tb_lg_dir': '/home/bingxing2/home/scx6008/hw/SparK/output/Spark_res50_official_finetune_default_lr2e-3_ld0_7_300e/tensorboard_log',
 'train_acc': 0.0,
 'train_loss': 0.0,
 'wd': 0.02,
 'world_size': 8,
 'wp_ep': 5}

The logs are:

{"name": "res50", "cmd": "--local_rank=0 --exp_name res50  --exp_dir ./output//Spark_res50_official_finetune_default_lr2e-3_ld0_7_300e --data_path=imagenet2012/ImageNet_ILSVRC2012 --model=resnet50 --ep=300 --bs=2048 --resume_from=./resnet50_1kpretrained_timm_style.pth",  "model": "resnet50"}

{"cur_ep": "", "train_L": 0.0, "train_acc": 0.0, "best_val_acc": 0.0, "rema": "", "fini": ""}
{"cur_ep": "1/300", "train_L": 0.15892334485948087, "train_acc": 0.62875, "best_val_acc": 4.741999879479408, "rema": "1 day, 3:49:28", "fini": "04-18 14:07"}
{"cur_ep": "1/300", "train_L": 0.1589116028137505, "train_acc": 0.67875, "best_val_acc": 4.741999879479408, "rema": "1 day, 3:49:55", "fini": "04-18 14:07"}
{"cur_ep": "2/300", "train_L": 0.006166074915975333, "train_acc": 17.9, "best_val_acc": 4.741999879479408, "rema": "23:23:38", "fini": "04-18 09:46"}
{"cur_ep": "2/300", "train_L": 0.006192157221585512, "train_acc": 17.850625, "best_val_acc": 4.741999879479408, "rema": "23:23:11", "fini": "04-18 09:45"}
{"cur_ep": "3/300", "train_L": 0.005531815142929554, "train_acc": 31.03125, "best_val_acc": 4.741999879479408, "rema": "23:10:48", "fini": "04-18 09:37"}
{"cur_ep": "3/300", "train_L": 0.005463680490851402, "train_acc": 31.53375, "best_val_acc": 4.741999879479408, "rema": "23:10:48", "fini": "04-18 09:37"}
{"cur_ep": "4/300", "train_L": 0.005408975677192211, "train_acc": 34.206875, "best_val_acc": 4.741999879479408, "rema": "23:16:55", "fini": "04-18 09:48"}
{"cur_ep": "4/300", "train_L": 0.0053789736032485965, "train_acc": 34.44125, "best_val_acc": 4.741999879479408, "rema": "23:16:55", "fini": "04-18 09:48"}
{"cur_ep": "5/300", "train_L": 0.005379272519052029, "train_acc": 34.675, "best_val_acc": 4.741999879479408, "rema": "23:14:28", "fini": "04-18 09:50"}
{"cur_ep": "5/300", "train_L": 0.005380104035139084, "train_acc": 34.819375, "best_val_acc": 4.741999879479408, "rema": "23:14:28", "fini": "04-18 09:50"}
{"cur_ep": "6/300", "train_L": 0.005402451483905315, "train_acc": 34.683125, "best_val_acc": 43.54200065135956, "rema": "1 day, 0:28:58", "fini": "04-18 11:10"}
{"cur_ep": "6/300", "train_L": 0.00541508878916502, "train_acc": 34.975625, "best_val_acc": 43.54200065135956, "rema": "1 day, 0:29:01", "fini": "04-18 11:10"}
{"cur_ep": "7/300", "train_L": 0.005425414913892746, "train_acc": 35.530625, "best_val_acc": 43.54200065135956, "rema": "22:59:30", "fini": "04-18 09:45"}
{"cur_ep": "7/300", "train_L": 0.005408091994374991, "train_acc": 35.624375, "best_val_acc": 43.54200065135956, "rema": "22:59:30", "fini": "04-18 09:45"}
{"cur_ep": "8/300", "train_L": 0.005414580475538969, "train_acc": 35.54125, "best_val_acc": 43.54200065135956, "rema": "22:52:39", "fini": "04-18 09:43"}
{"cur_ep": "8/300", "train_L": 0.005382590828090906, "train_acc": 35.548125, "best_val_acc": 43.54200065135956, "rema": "22:52:39", "fini": "04-18 09:43"}
{"cur_ep": "9/300", "train_L": 0.005471015560626983, "train_acc": 34.43, "best_val_acc": 43.54200065135956, "rema": "22:54:35", "fini": "04-18 09:50"}
{"cur_ep": "9/300", "train_L": 0.005475558865070343, "train_acc": 34.46375, "best_val_acc": 43.54200065135956, "rema": "22:54:35", "fini": "04-18 09:50"}
{"cur_ep": "10/300", "train_L": 0.005533626601845026, "train_acc": 33.153125, "best_val_acc": 43.54200065135956, "rema": "22:50:06", "fini": "04-18 09:50"}
{"cur_ep": "11/300", "train_L": 0.005674835596978664, "train_acc": 30.861875, "best_val_acc": 43.54200065135956, "rema": "1 day, 0:08:31", "fini": "04-18 11:13"}
{"cur_ep": "11/300", "train_L": 0.00564453030526638, "train_acc": 30.945, "best_val_acc": 43.54200065135956, "rema": "1 day, 0:08:34", "fini": "04-18 11:13"}
{"cur_ep": "12/300", "train_L": 0.005810007998347282, "train_acc": 28.496875, "best_val_acc": 43.54200065135956, "rema": "22:46:05", "fini": "04-18 09:56"}
{"cur_ep": "12/300", "train_L": 0.005785862766951323, "train_acc": 28.413125, "best_val_acc": 43.54200065135956, "rema": "22:46:05", "fini": "04-18 09:56"}
{"cur_ep": "13/300", "train_L": 0.005920035427808761, "train_acc": 25.194375, "best_val_acc": 43.54200065135956, "rema": "22:33:55", "fini": "04-18 09:48"}
{"cur_ep": "13/300", "train_L": 0.005946401672065258, "train_acc": 25.08125, "best_val_acc": 43.54200065135956, "rema": "22:33:55", "fini": "04-18 09:48"}
{"cur_ep": "14/300", "train_L": 0.006212903738021851, "train_acc": 19.32875, "best_val_acc": 43.54200065135956, "rema": "22:36:47", "fini": "04-18 09:56"}
{"cur_ep": "14/300", "train_L": 0.006215040449798107, "train_acc": 19.2525, "best_val_acc": 43.54200065135956, "rema": "22:36:47", "fini": "04-18 09:56"}
{"cur_ep": "15/300", "train_L": 0.006856135655939579, "train_acc": 8.3275, "best_val_acc": 43.54200065135956, "rema": "22:27:26", "fini": "04-18 09:51"}
}
{"cur_ep": "16/300", "train_L": 0.00847176744043827, "train_acc": 0.505, "best_val_acc": 43.54200065135956, "rema": "23:35:44", "fini": "04-18 11:04"}
{"cur_ep": "16/300", "train_L": 0.00849286153614521, "train_acc": 0.4875, "best_val_acc": 43.54200065135956, "rema": "23:36:30", "fini": "04-18 11:05"}
{"cur_ep": "17/300", "train_L": 0.4059472487047315, "train_acc": 0.17875, "best_val_acc": 43.54200065135956, "rema": "21:58:07", "fini": "04-18 09:31"}
{"cur_ep": "17/300", "train_L": 0.4075041047215462, "train_acc": 0.1925, "best_val_acc": 43.54200065135956, "rema": "21:58:55", "fini": "04-18 09:32"}
{"cur_ep": "18/300", "train_L": 2.594160234707594, "train_acc": 0.20125, "best_val_acc": 43.54200065135956, "rema": "21:50:27", "fini": "04-18 09:28"}
{"cur_ep": "18/300", "train_L": 2.599315336358547, "train_acc": 0.19, "best_val_acc": 43.54200065135956, "rema": "21:50:27", "fini": "04-18 09:28"}
{"cur_ep": "19/300", "train_L": 168.27923201017379, "train_acc": 0.213125, "best_val_acc": 43.54200065135956, "rema": "21:47:55", "fini": "04-18 09:30"}
{"cur_ep": "19/300", "train_L": 168.41776486291886, "train_acc": 0.20375, "best_val_acc": 43.54200065135956, "rema": "21:47:55", "fini": "04-18 09:30"}
{"cur_ep": "20/300", "train_L": 9278.141375096131, "train_acc": 0.196875, "best_val_acc": 43.54200065135956, "rema": "21:40:22", "fini": "04-18 09:28"}
{"cur_ep": "20/300", "train_L": 9242.067996807862, "train_acc": 0.225625, "best_val_acc": 43.54200065135956, "rema": "21:40:25", "fini": "04-18 09:28"}
{"cur_ep": "21/300", "train_L": 385708.6738011719, "train_acc": 0.193125, "best_val_acc": 43.54200065135956, "rema": "22:55:53", "fini": "04-18 10:48"}
{"cur_ep": "21/300", "train_L": 384956.72077226563, "train_acc": 0.216875, "best_val_acc": 43.54200065135956, "rema": "22:55:53", "fini": "04-18 10:48"}
{"cur_ep": "22/300", "train_L": 4882009.3119, "train_acc": 0.20625, "best_val_acc": 43.54200065135956, "rema": "21:27:03", "fini": "04-18 09:24"}
{"cur_ep": "22/300", "train_L": 4842194.24495, "train_acc": 0.22, "best_val_acc": 43.54200065135956, "rema": "21:27:06", "fini": "04-18 09:24"}
{"cur_ep": "23/300", "train_L": 51736270.4836, "train_acc": 0.209375, "best_val_acc": 43.54200065135956, "rema": "21:28:39", "fini": "04-18 09:30"}
{"cur_ep": "23/300", "train_L": 52044852.8624, "train_acc": 0.169375, "best_val_acc": 43.54200065135956, "rema": "21:28:42", "fini": "04-18 09:30"}
{"cur_ep": "24/300", "train_L": 122264843.6944, "train_acc": 0.22125, "best_val_acc": 43.54200065135956, "rema": "21:21:00", "fini": "04-18 09:27"}
{"cur_ep": "25/300", "train_L": 462680769.6992, "train_acc": 0.184375, "best_val_acc": 43.54200065135956, "rema": "21:16:28", "fini": "04-18 09:27"}
{"cur_ep": "25/300", "train_L": 464397992.0096, "train_acc": 0.211875, "best_val_acc": 43.54200065135956, "rema": "21:16:30", "fini": "04-18 09:27"}
{"cur_ep": "26/300", "train_L": 1761032724.8896, "train_acc": 0.198125, "best_val_acc": 43.54200065135956, "rema": "22:27:24", "fini": "04-18 10:43"}
{"cur_ep": "26/300", "train_L": 1763697504.5888, "train_acc": 0.188125, "best_val_acc": 43.54200065135956, "rema": "22:27:51", "fini": "04-18 10:43"}
{"cur_ep": "27/300", "train_L": NaN, "train_acc": 0.20375, "best_val_acc": 43.54200065135956, "rema": "20:57:51", "fini": "04-18 09:18"}
{"cur_ep": "27/300", "train_L": NaN, "train_acc": 0.211875, "best_val_acc": 43.54200065135956, "rema": "20:58:21", "fini": "04-18 09:18"}
{"cur_ep": "28/300", "train_L": NaN, "train_acc": 0.19125, "best_val_acc": 43.54200065135956, "rema": "20:49:07", "fini": "04-18 09:14"}

@keyu-tian
Copy link
Owner

i see, i would check that. Perhaps I copied the wrong code of lamb optimizer.

BTW, have you tried a ConvNeXt-small? Would it fail too?

@Vickeyhw
Copy link
Author

@keyu-tian ConvNeXt-small seems normal so far.

@Vickeyhw
Copy link
Author

@keyu-tian Have you found the resnet-50 fine-tuning problem? The ConvNeXt-small reaches 83.96 validation acc fine-tuning from your released pertaining weights.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants