I reproduced the code with an accuracy of only 54 #19

facias914 · 2023-07-13T07:59:27Z

The original code version is too old, so I reproduced the code to the new mmrotate version. I loaded the weights you provided and it went fine, the result was an accuracy of 68 on the validation set and 54 on the test set.

I don't know where the problem is, the weight file is loaded smoothly, I have also checked the configuration file parameters, but I just don't know what went wrong.If there is a problem with my reproduction, the final result should be 0, and it should not be as high as 54

dataset_type = 'DOTADataset'
data_root = '/data/facias/DOTA/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

angle_version = 'le90'
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='RResize', img_scale=(1024, 1024)),
    dict(
        type='RRandomFlip',
        flip_ratio=[0.25, 0.25, 0.25],
        direction=['horizontal', 'vertical', 'diagonal'],
        version=angle_version),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1024, 1024),
        flip=False,
        transforms=[
            dict(type='RResize', img_scale=(1024, 1024)),
            dict(type='RRandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='DefaultFormatBundle'),
            dict(type='Collect', keys=['img'])
        ])
]

data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'train_split/labelTXt/',
        img_prefix=data_root + 'train_split/images/',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'val_split/labelTxt/',
        img_prefix=data_root + 'val_split/images/',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        test_mode=True,   #若test数据集没有标注，则设为True
        ann_file=data_root + 'test_split/images/',
        img_prefix=data_root + 'test_split/images/',
        pipeline=test_pipeline))

model = dict(
    type='OrientedRCNN',
    backbone=dict(
        type='ViT_Win_RVSA_V3_WSZ7',
        img_size=1024,
        embed_dim=768,
        depth=12,
        num_heads=12,
        mlp_ratio=4,
        qkv_bias=True,
        drop_rate=0.,
        attn_drop_rate=0.,
        drop_path_rate=0.15,
        use_abs_pos_emb=True),
    neck=dict(
        type='FPN',
        in_channels=[768, 768, 768, 768],
        out_channels=256,
        num_outs=5),
    rpn_head=dict(
        type='OrientedRPNHead',
        in_channels=256,
        feat_channels=256,
        version=angle_version,
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],
            ratios=[0.5, 1.0, 2.0],
            strides=[4, 8, 16, 32, 64]),
        bbox_coder=dict(
            type='MidpointOffsetCoder',
            angle_range=angle_version,
            target_means=[0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
            target_stds=[1.0, 1.0, 1.0, 1.0, 0.5, 0.5]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(
            type='SmoothL1Loss', beta=0.1111111111111111, loss_weight=1.0)),
    roi_head=dict(
        type='OrientedStandardRoIHead',
        bbox_roi_extractor=dict(
            type='RotatedSingleRoIExtractor',
            roi_layer=dict(
                type='RoIAlignRotated',
                out_size=7,
                sample_num=2,
                clockwise=True),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=dict(
            type='RotatedShared2FCBBoxHead',
            in_channels=256,
            fc_out_channels=1024,
            roi_feat_size=7,
            num_classes=15,
            bbox_coder=dict(
                type='DeltaXYWHAOBBoxCoder',
                angle_range=angle_version,
                norm_factor=None,
                edge_swap=True,
                proj_xy=True,
                target_means=(.0, .0, .0, .0, .0),
                target_stds=(0.1, 0.1, 0.2, 0.2, 0.1)),
            reg_class_agnostic=True,
            loss_cls=dict(
                type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0),
            loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))),
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,
                neg_iou_thr=0.3,
                min_pos_iou=0.3,
                match_low_quality=True,
                ignore_iof_thr=-1),
            sampler=dict(
                type='RandomSampler',
                num=256,
                pos_fraction=0.5,
                neg_pos_ub=-1,
                add_gt_as_proposals=False),
            allowed_border=0,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_pre=2000,
            max_per_img=2000,
            nms=dict(type='nms', iou_threshold=0.8),
            min_bbox_size=0),
        rcnn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.5,
                neg_iou_thr=0.5,
                min_pos_iou=0.5,
                match_low_quality=False,
                iou_calculator=dict(type='RBboxOverlaps2D'),
                ignore_iof_thr=-1),
            sampler=dict(
                type='RRandomSampler',
                num=512,
                pos_fraction=0.25,
                neg_pos_ub=-1,
                add_gt_as_proposals=True),
            pos_weight=-1,
            debug=False)),
    test_cfg=dict(
        rpn=dict(
            nms_pre=2000,
            max_per_img=2000,
            nms=dict(type='nms', iou_threshold=0.8),
            min_bbox_size=0),
        rcnn=dict(
            nms_pre=2000,
            min_bbox_size=0,
            score_thr=0.05,
            nms=dict(iou_thr=0.1),
            max_per_img=2000)))
# evaluation
evaluation = dict(interval=1, metric='mAP')
# optimizer
optimizer = dict(type='SGD', lr=0.0025, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=1.0 / 3,
    step=[8, 11])
runner = dict(type='EpochBasedRunner', max_epochs=12)
checkpoint_config = dict(interval=1)

# yapf:disable
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook')
    ])
# yapf:enable

dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]

# disable opencv multithreading to avoid system being overloaded
opencv_num_threads = 0
# set multi-process start method as `fork` to speed up the training
mp_start_method = 'fork'

DotWang · 2023-07-13T09:14:12Z

@facias914 It is suggested to use adamw for training vision transformer networks, and reproduce the method according to our parameter settings. The detailed settings can be seen in config files.

facias914 · 2023-07-13T11:58:51Z

Thank you for your apply. I just used the checkpoint file provided by you. I compared the parameter list and it is consistent with yours. The specific content is above.

DotWang · 2023-07-13T12:15:48Z

@facias914 I mean hyper parameters, such as optimizer, scheduler, and so on, please refer to this config: https://github.com/ViTAE-Transformer/Remote-Sensing-RVSA/blob/main/Object%20Detection/configs/obb/oriented_rcnn/vit_base_win/faster_rcnn_orpn_our_rsp_vit-base-win-rvsa_v3_wsz7_fpn_1x_dota10_lr1e-4_ldr75_dpr15.py

facias914 · 2023-07-13T12:37:58Z

I know what you mean. But I just use the checkpoint file to inference, not to train.

facias914 · 2023-07-13T12:43:32Z

Now I want to use the old mmrotate version to reproduce the code. But the old mmrotate need DOTA dataset's annotations to be pkl. Can you send me a DOTA datasets with pkl annotation? Thank you !

DotWang · 2023-07-13T13:15:35Z

@facias914 which checkpoint you use? Did you use this: https://1drv.ms/u/s!AimBgYV7JjTlgVJM4Znng50US8KD?e=o4MRMQ ?

DotWang · 2023-07-13T13:20:34Z

@facias914 The DOTA dataset needs to be clipped with BBoxTookit, then the pkl can be obtained.

facias914 · 2023-07-13T13:35:46Z

@facias914 which checkpoint you use? Did you use this: https://1drv.ms/u/s!AimBgYV7JjTlgVJM4Znng50US8KD?e=o4MRMQ ?

Yes, I used this one

facias914 · 2023-07-13T13:36:09Z

@facias914 The DOTA dataset needs to be clipped with BBoxTookit, then the pkl can be obtained.

Thank you!

DotWang · 2023-07-13T13:54:55Z

@facias914 which checkpoint you use? Did you use this: https://1drv.ms/u/s!AimBgYV7JjTlgVJM4Znng50US8KD?e=o4MRMQ ?

Yes, I used this one

So the inference is conducted on unclipped images?

facias914 · 2023-07-13T13:59:06Z

@facias914 which checkpoint you use? Did you use this: https://1drv.ms/u/s!AimBgYV7JjTlgVJM4Znng50US8KD?e=o4MRMQ ?

Yes, I used this one

So the inference is conducted on unclipped images?

No, I use clipped val images and clipped test images with size=1024 and gap = 200, obtaining mAP=68 and 54 respectively.

DotWang · 2023-07-13T14:54:02Z

@facias914 OK, in fact, we didn't conduct the local validation. We directly train the model on the merged train+val set, and submit the results of testing set to the evaluation website. You can implement the same evaluation. The testing set also needs to be clipped with BBoxToolkit.

facias914 · 2023-07-14T02:07:46Z

@facias914 OK, in fact, we didn't conduct the local validation. We directly train the model on the merged train+val set, and submit the results of testing set to the evaluation website. You can implement the same evaluation. The testing set also needs to be clipped with BBoxToolkit.

Thank You！I get the mAP87 in val dataset using the old mmrotate version. By the way , I get the test dataset pkl file , but I don't find the code to convert the pkl to txt. I used the pkl2txt code writen by myself , but failed to get the result in the DOTA website . Can you tell me where the pkl2txt file is?

DotWang · 2023-07-14T03:10:22Z

@facias914 Shouldn't obbdetection automatically convert the pkl? (mmrotate is built based on obbdetection)
https://mmrotate.readthedocs.io/en/latest/get_started.html#test-a-model

facias914 · 2023-07-14T03:20:50Z

@facias914 Shouldn't obbdetection automatically convert the pkl? (mmrotate is built based on obbdetection) https://mmrotate.readthedocs.io/en/latest/get_started.html#test-a-model

Yes, I used obbdetection built from https://github.com/jbwang1997/OBBDetection, and I have got the pkl file. But the DOTA website requests txt file. So the pkl2txt code is what I need now.

DotWang · 2023-07-14T03:28:17Z

@facias914 For DOTA-V1.0, using --format-only and OBBDetection will auto produce the required format, please refer to our readme.

facias914 · 2023-07-14T07:04:14Z

@facias914 For DOTA-V1.0, using --format-only and OBBDetection will auto produce the required format, please refer to our readme.

Thank you very much for your reply. I also ran to 78.74 in the test set. Next, I will investigate why there is a big gap between the reproducible code and the official code.

HuangShiqi128 · 2023-09-13T11:22:23Z

Hi,

May I ask which config file in Bboxtoolkit you use to split DOTA dataset? is it ss_trainval.json?

{
"nproc": 20,
"load_type": "dota",
"img_dirs": [
"data/DOTA1_0/train/images/",
"data/DOTA1_0/val/images/"
],
"ann_dirs": [
"data/DOTA1_0/train/labelTxt/",
"data/DOTA1_0/val/labelTxt/"
],
"classes": null,
"prior_annfile": null,
"merge_type": "addition",
"sizes": [
1024
],
"gaps": [
200
],
"rates": [
1.0
],
"img_rate_thr": 0.6,
"iof_thr": 0.7,
"no_padding": false,
"padding_value": [
104, 116, 124
],
"filter_empty": true,
"save_dir": "data/split_ss_dota1_0/trainval/",
"save_ext": ".png"
}

PP-explore · 2024-07-19T04:30:37Z

@facias914 For DOTA-V1.0, using --format-only and OBBDetection will auto produce the required format, please refer to our readme.

Thank you very much for your reply. I also ran to 78.74 in the test set. Next, I will investigate why there is a big gap between the reproducible code and the official code.

@facias914
Hello, I also encountered the same issue. I used the official ViTAE-B + RVSA model to infer the test dataset, but the mAP I obtained is only 0.394. I would like to ask for your advice on how you achieved a mAP similar to the authors'. Could you provide some help? I have shared some of my configuration information in this issue ：#39 (comment).

DotWang mentioned this issue Apr 19, 2024

目标检测识别任务的训练评估结果复现不成功。能够成功复现obbdetection仓库oriented_rcnn的训练评估结果，但vitae-rvsa_dota10_ms的训练评估效果不理想 #33

Open

DotWang mentioned this issue Sep 26, 2024

放出来的vitae-rsva-dota权重可能有错 #41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I reproduced the code with an accuracy of only 54 #19

I reproduced the code with an accuracy of only 54 #19

facias914 commented Jul 13, 2023 •

edited

Loading

DotWang commented Jul 13, 2023 •

edited

Loading

facias914 commented Jul 13, 2023

DotWang commented Jul 13, 2023

facias914 commented Jul 13, 2023

facias914 commented Jul 13, 2023

DotWang commented Jul 13, 2023

DotWang commented Jul 13, 2023 •

edited

Loading

facias914 commented Jul 13, 2023

facias914 commented Jul 13, 2023

DotWang commented Jul 13, 2023

facias914 commented Jul 13, 2023

DotWang commented Jul 13, 2023 •

edited

Loading

facias914 commented Jul 14, 2023

DotWang commented Jul 14, 2023 •

edited

Loading

facias914 commented Jul 14, 2023

DotWang commented Jul 14, 2023

facias914 commented Jul 14, 2023

HuangShiqi128 commented Sep 13, 2023

PP-explore commented Jul 19, 2024

I reproduced the code with an accuracy of only 54 #19

I reproduced the code with an accuracy of only 54 #19

Comments

facias914 commented Jul 13, 2023 • edited Loading

DotWang commented Jul 13, 2023 • edited Loading

facias914 commented Jul 13, 2023

DotWang commented Jul 13, 2023

facias914 commented Jul 13, 2023

facias914 commented Jul 13, 2023

DotWang commented Jul 13, 2023

DotWang commented Jul 13, 2023 • edited Loading

facias914 commented Jul 13, 2023

facias914 commented Jul 13, 2023

DotWang commented Jul 13, 2023

facias914 commented Jul 13, 2023

DotWang commented Jul 13, 2023 • edited Loading

facias914 commented Jul 14, 2023

DotWang commented Jul 14, 2023 • edited Loading

facias914 commented Jul 14, 2023

DotWang commented Jul 14, 2023

facias914 commented Jul 14, 2023

HuangShiqi128 commented Sep 13, 2023

PP-explore commented Jul 19, 2024

facias914 commented Jul 13, 2023 •

edited

Loading

DotWang commented Jul 13, 2023 •

edited

Loading

DotWang commented Jul 13, 2023 •

edited

Loading

DotWang commented Jul 13, 2023 •

edited

Loading

DotWang commented Jul 14, 2023 •

edited

Loading