Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I reproduced the code with an accuracy of only 54 #19

Open
facias914 opened this issue Jul 13, 2023 · 19 comments
Open

I reproduced the code with an accuracy of only 54 #19

facias914 opened this issue Jul 13, 2023 · 19 comments

Comments

@facias914
Copy link

facias914 commented Jul 13, 2023

The original code version is too old, so I reproduced the code to the new mmrotate version. I loaded the weights you provided and it went fine, the result was an accuracy of 68 on the validation set and 54 on the test set.

I don't know where the problem is, the weight file is loaded smoothly, I have also checked the configuration file parameters, but I just don't know what went wrong.If there is a problem with my reproduction, the final result should be 0, and it should not be as high as 54

dataset_type = 'DOTADataset'
data_root = '/data/facias/DOTA/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

angle_version = 'le90'
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='RResize', img_scale=(1024, 1024)),
    dict(
        type='RRandomFlip',
        flip_ratio=[0.25, 0.25, 0.25],
        direction=['horizontal', 'vertical', 'diagonal'],
        version=angle_version),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1024, 1024),
        flip=False,
        transforms=[
            dict(type='RResize', img_scale=(1024, 1024)),
            dict(type='RRandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img'])
        ])
]

data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'train_split/labelTXt/',
        img_prefix=data_root + 'train_split/images/',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'val_split/labelTxt/',
        img_prefix=data_root + 'val_split/images/',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        test_mode=True,   #若test数据集没有标注,则设为True
        ann_file=data_root + 'test_split/images/',
        img_prefix=data_root + 'test_split/images/',
        pipeline=test_pipeline))

model = dict(
    type='OrientedRCNN',
    backbone=dict(
        type='ViT_Win_RVSA_V3_WSZ7',
        img_size=1024,
        embed_dim=768,
        depth=12,
        num_heads=12,
        mlp_ratio=4,
        qkv_bias=True,
        drop_rate=0.,
        attn_drop_rate=0.,
        drop_path_rate=0.15,
        use_abs_pos_emb=True),
    neck=dict(
        type='FPN',
        in_channels=[768, 768, 768, 768],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='OrientedRPNHead',
        in_channels=256,
        feat_channels=256,
        version=angle_version,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='MidpointOffsetCoder',
            angle_range=angle_version,
            target_means=[0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0, 0.5, 0.5]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(
            type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)),
    roi_head=dict(
        type='OrientedStandardRoIHead',
        bbox_roi_extractor=dict(
            type='RotatedSingleRoIExtractor',
            roi_layer=dict(
                type='RoIAlignRotated',
                out_size=7,
                sample_num=2,
                clockwise=True),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=dict(
            type='RotatedShared2FCBBoxHead',
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=15,
            bbox_coder=dict(
                type='DeltaXYWHAOBBoxCoder',
                angle_range=angle_version,
                norm_factor=None,
                edge_swap=True,
                proj_xy=True,
                target_means=(.0, .0, .0, .0, .0),
                target_stds=(0.1, 0.1, 0.2, 0.2, 0.1)),
            reg_class_agnostic=True,
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
            loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))),
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=0,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_pre=2000,
            max_per_img=2000,
            nms=dict(type='nms', iou_threshold=0.8),
            min_bbox_size=0),
        rcnn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                match_low_quality=False,
                iou_calculator=dict(type='RBboxOverlaps2D'),
                ignore_iof_thr=-1),
            sampler=dict(
                type='RRandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False)),
    test_cfg=dict(
        rpn=dict(
            nms_pre=2000,
            max_per_img=2000,
            nms=dict(type='nms', iou_threshold=0.8),
            min_bbox_size=0),
        rcnn=dict(
            nms_pre=2000,
            min_bbox_size=0,
            score_thr=0.05,
            nms=dict(iou_thr=0.1),
            max_per_img=2000)))
# evaluation
evaluation = dict(interval=1, metric='mAP')
# optimizer
optimizer = dict(type='SGD', lr=0.0025, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 3,
    step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
checkpoint_config = dict(interval=1)

# yapf:disable
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook')
    ])
# yapf:enable

dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]

# disable opencv multithreading to avoid system being overloaded
opencv_num_threads = 0
# set multi-process start method as `fork` to speed up the training
mp_start_method = 'fork'
@DotWang
Copy link
Collaborator

DotWang commented Jul 13, 2023

@facias914 It is suggested to use adamw for training vision transformer networks, and reproduce the method according to our parameter settings. The detailed settings can be seen in config files.

@facias914
Copy link
Author

Thank you for your apply. I just used the checkpoint file provided by you. I compared the parameter list and it is consistent with yours. The specific content is above.

@DotWang
Copy link
Collaborator

DotWang commented Jul 13, 2023

@facias914
Copy link
Author

I know what you mean. But I just use the checkpoint file to inference, not to train.

@facias914
Copy link
Author

Now I want to use the old mmrotate version to reproduce the code. But the old mmrotate need DOTA dataset's annotations to be pkl. Can you send me a DOTA datasets with pkl annotation? Thank you !

@DotWang
Copy link
Collaborator

DotWang commented Jul 13, 2023

@facias914 which checkpoint you use? Did you use this: https://1drv.ms/u/s!AimBgYV7JjTlgVJM4Znng50US8KD?e=o4MRMQ ?

@DotWang
Copy link
Collaborator

DotWang commented Jul 13, 2023

@facias914 The DOTA dataset needs to be clipped with BBoxTookit, then the pkl can be obtained.

@facias914
Copy link
Author

@facias914 which checkpoint you use? Did you use this: https://1drv.ms/u/s!AimBgYV7JjTlgVJM4Znng50US8KD?e=o4MRMQ ?

Yes, I used this one

截屏2023-07-13 21 32 47

@facias914
Copy link
Author

@facias914 The DOTA dataset needs to be clipped with BBoxTookit, then the pkl can be obtained.

Thank you!

@DotWang
Copy link
Collaborator

DotWang commented Jul 13, 2023

@facias914 which checkpoint you use? Did you use this: https://1drv.ms/u/s!AimBgYV7JjTlgVJM4Znng50US8KD?e=o4MRMQ ?

Yes, I used this one

截屏2023-07-13 21 32 47

So the inference is conducted on unclipped images?

@facias914
Copy link
Author

@facias914 which checkpoint you use? Did you use this: https://1drv.ms/u/s!AimBgYV7JjTlgVJM4Znng50US8KD?e=o4MRMQ ?

Yes, I used this one
截屏2023-07-13 21 32 47

So the inference is conducted on unclipped images?

No, I use clipped val images and clipped test images with size=1024 and gap = 200, obtaining mAP=68 and 54 respectively.

@DotWang
Copy link
Collaborator

DotWang commented Jul 13, 2023

@facias914 OK, in fact, we didn't conduct the local validation. We directly train the model on the merged train+val set, and submit the results of testing set to the evaluation website. You can implement the same evaluation. The testing set also needs to be clipped with BBoxToolkit.

@facias914
Copy link
Author

@facias914 OK, in fact, we didn't conduct the local validation. We directly train the model on the merged train+val set, and submit the results of testing set to the evaluation website. You can implement the same evaluation. The testing set also needs to be clipped with BBoxToolkit.

Thank You!I get the mAP87 in val dataset using the old mmrotate version. By the way , I get the test dataset pkl file , but I don't find the code to convert the pkl to txt. I used the pkl2txt code writen by myself , but failed to get the result in the DOTA website . Can you tell me where the pkl2txt file is?

@DotWang
Copy link
Collaborator

DotWang commented Jul 14, 2023

@facias914 Shouldn't obbdetection automatically convert the pkl? (mmrotate is built based on obbdetection)
https://mmrotate.readthedocs.io/en/latest/get_started.html#test-a-model

@facias914
Copy link
Author

@facias914 Shouldn't obbdetection automatically convert the pkl? (mmrotate is built based on obbdetection) https://mmrotate.readthedocs.io/en/latest/get_started.html#test-a-model

Yes, I used obbdetection built from https://github.com/jbwang1997/OBBDetection, and I have got the pkl file. But the DOTA website requests txt file. So the pkl2txt code is what I need now.

@DotWang
Copy link
Collaborator

DotWang commented Jul 14, 2023

@facias914 For DOTA-V1.0, using --format-only and OBBDetection will auto produce the required format, please refer to our readme.

@facias914
Copy link
Author

@facias914 For DOTA-V1.0, using --format-only and OBBDetection will auto produce the required format, please refer to our readme.

Thank you very much for your reply. I also ran to 78.74 in the test set. Next, I will investigate why there is a big gap between the reproducible code and the official code.

@HuangShiqi128
Copy link

Hi,

May I ask which config file in Bboxtoolkit you use to split DOTA dataset? is it ss_trainval.json?

{
"nproc": 20,
"load_type": "dota",
"img_dirs": [
"data/DOTA1_0/train/images/",
"data/DOTA1_0/val/images/"
],
"ann_dirs": [
"data/DOTA1_0/train/labelTxt/",
"data/DOTA1_0/val/labelTxt/"
],
"classes": null,
"prior_annfile": null,
"merge_type": "addition",
"sizes": [
1024
],
"gaps": [
200
],
"rates": [
1.0
],
"img_rate_thr": 0.6,
"iof_thr": 0.7,
"no_padding": false,
"padding_value": [
104, 116, 124
],
"filter_empty": true,
"save_dir": "data/split_ss_dota1_0/trainval/",
"save_ext": ".png"
}

@PP-explore
Copy link

@facias914 For DOTA-V1.0, using --format-only and OBBDetection will auto produce the required format, please refer to our readme.

Thank you very much for your reply. I also ran to 78.74 in the test set. Next, I will investigate why there is a big gap between the reproducible code and the official code.

@facias914
Hello, I also encountered the same issue. I used the official ViTAE-B + RVSA model to infer the test dataset, but the mAP I obtained is only 0.394. I would like to ask for your advice on how you achieved a mAP similar to the authors'. Could you provide some help? I have shared some of my configuration information in this issue :#39 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants