Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rotated Bounding Box output #2148

Open
BGA3 opened this issue Jan 5, 2019 · 26 comments
Open

Rotated Bounding Box output #2148

BGA3 opened this issue Jan 5, 2019 · 26 comments
Labels
want enhancement Want to improve accuracy, speed or functionality

Comments

@BGA3
Copy link

BGA3 commented Jan 5, 2019

Hi
This is not really an issue/bug, but more a request. Would it be possible to output rotated bounding boxes (non-axis alligned)? Would this be a huge change?
I have started looking into the code to implement this myself, however as I dont know the code the learning curve is quite steep.

Thx in advance

@AlexeyAB
Copy link
Owner

AlexeyAB commented Jan 5, 2019

@BGA3 Hi,

I think there will be many changes, at least there:

  1. So you should add +1 coordinate angle, so filters=(classes + 6)x3 instead of filters=(classes + 5)x3 in the cfg-file.

  2. darknet/src/box.h

    Lines 18 to 20 in 48d461f

    typedef struct{
    float x, y, w, h;
    } box;

  3. About ~27 lines where is number 4 you should change to the 5: https://github.com/AlexeyAB/darknet/blob/48d461f9bda2d01dc2a83c7c2a520f61b3ea5b79/src/yolo_layer.c

2.1

darknet/src/yolo_layer.c

Lines 84 to 109 in 48d461f

box get_yolo_box(float *x, float *biases, int n, int index, int i, int j, int lw, int lh, int w, int h, int stride)
{
box b;
b.x = (i + x[index + 0*stride]) / lw;
b.y = (j + x[index + 1*stride]) / lh;
b.w = exp(x[index + 2*stride]) * biases[2*n] / w;
b.h = exp(x[index + 3*stride]) * biases[2*n+1] / h;
return b;
}
float delta_yolo_box(box truth, float *x, float *biases, int n, int index, int i, int j, int lw, int lh, int w, int h, float *delta, float scale, int stride)
{
box pred = get_yolo_box(x, biases, n, index, i, j, lw, lh, w, h, stride);
float iou = box_iou(pred, truth);
float tx = (truth.x*lw - i);
float ty = (truth.y*lh - j);
float tw = log(truth.w*w / biases[2*n]);
float th = log(truth.h*h / biases[2*n + 1]);
delta[index + 0*stride] = scale * (tx - x[index + 0*stride]);
delta[index + 1*stride] = scale * (ty - x[index + 1*stride]);
delta[index + 2*stride] = scale * (tw - x[index + 2*stride]);
delta[index + 3*stride] = scale * (th - x[index + 3*stride]);
return iou;
}

2.2

darknet/src/yolo_layer.c

Lines 156 to 163 in 48d461f

static box float_to_box_stride(float *f, int stride)
{
box b = { 0 };
b.x = f[0];
b.y = f[1 * stride];
b.w = f[2 * stride];
b.h = f[3 * stride];
return b;

  1. darknet/src/data.c

    Lines 328 to 332 in 48d461f

    x = boxes[i].x;
    y = boxes[i].y;
    w = boxes[i].w;
    h = boxes[i].h;
    id = boxes[i].id;

3.1

darknet/src/data.c

Lines 385 to 389 in 48d461f

truth[(i-sub)*5+0] = x;
truth[(i-sub)*5+1] = y;
truth[(i-sub)*5+2] = w;
truth[(i-sub)*5+3] = h;
truth[(i-sub)*5+4] = id;

  1. darknet/src/image.c

    Lines 315 to 417 in 48d461f

    void draw_detections_v3(image im, detection *dets, int num, float thresh, char **names, image **alphabet, int classes, int ext_output)
    {
    int selected_detections_num;
    detection_with_class* selected_detections = get_actual_detections(dets, num, thresh, &selected_detections_num, names);
    // text output
    qsort(selected_detections, selected_detections_num, sizeof(*selected_detections), compare_by_lefts);
    int i;
    for (i = 0; i < selected_detections_num; ++i) {
    const int best_class = selected_detections[i].best_class;
    printf("%s: %.0f%%", names[best_class], selected_detections[i].det.prob[best_class] * 100);
    if (ext_output)
    printf("\t(left_x: %4.0f top_y: %4.0f width: %4.0f height: %4.0f)\n",
    round((selected_detections[i].det.bbox.x - selected_detections[i].det.bbox.w / 2)*im.w),
    round((selected_detections[i].det.bbox.y - selected_detections[i].det.bbox.h / 2)*im.h),
    round(selected_detections[i].det.bbox.w*im.w), round(selected_detections[i].det.bbox.h*im.h));
    else
    printf("\n");
    int j;
    for (j = 0; j < classes; ++j) {
    if (selected_detections[i].det.prob[j] > thresh && j != best_class) {
    printf("%s: %.0f%%\n", names[j], selected_detections[i].det.prob[j] * 100);
    }
    }
    }
    // image output
    qsort(selected_detections, selected_detections_num, sizeof(*selected_detections), compare_by_probs);
    for (i = 0; i < selected_detections_num; ++i) {
    int width = im.h * .006;
    if (width < 1)
    width = 1;
    /*
    if(0){
    width = pow(prob, 1./2.)*10+1;
    alphabet = 0;
    }
    */
    //printf("%d %s: %.0f%%\n", i, names[selected_detections[i].best_class], prob*100);
    int offset = selected_detections[i].best_class * 123457 % classes;
    float red = get_color(2, offset, classes);
    float green = get_color(1, offset, classes);
    float blue = get_color(0, offset, classes);
    float rgb[3];
    //width = prob*20+2;
    rgb[0] = red;
    rgb[1] = green;
    rgb[2] = blue;
    box b = selected_detections[i].det.bbox;
    //printf("%f %f %f %f\n", b.x, b.y, b.w, b.h);
    int left = (b.x - b.w / 2.)*im.w;
    int right = (b.x + b.w / 2.)*im.w;
    int top = (b.y - b.h / 2.)*im.h;
    int bot = (b.y + b.h / 2.)*im.h;
    if (left < 0) left = 0;
    if (right > im.w - 1) right = im.w - 1;
    if (top < 0) top = 0;
    if (bot > im.h - 1) bot = im.h - 1;
    //int b_x_center = (left + right) / 2;
    //int b_y_center = (top + bot) / 2;
    //int b_width = right - left;
    //int b_height = bot - top;
    //sprintf(labelstr, "%d x %d - w: %d, h: %d", b_x_center, b_y_center, b_width, b_height);
    if (im.c == 1) {
    draw_box_width_bw(im, left, top, right, bot, width, 0.8); // 1 channel Black-White
    }
    else {
    draw_box_width(im, left, top, right, bot, width, red, green, blue); // 3 channels RGB
    }
    if (alphabet) {
    char labelstr[4096] = { 0 };
    strcat(labelstr, names[selected_detections[i].best_class]);
    int j;
    for (j = 0; j < classes; ++j) {
    if (selected_detections[i].det.prob[j] > thresh && j != selected_detections[i].best_class) {
    strcat(labelstr, ", ");
    strcat(labelstr, names[j]);
    }
    }
    image label = get_label_v3(alphabet, labelstr, (im.h*.03));
    draw_label(im, top + width, left, label, rgb);
    free_image(label);
    }
    if (selected_detections[i].det.mask) {
    image mask = float_to_image(14, 14, 1, selected_detections[i].det.mask);
    image resized_mask = resize_image(mask, b.w*im.w, b.h*im.h);
    image tmask = threshold_image(resized_mask, .5);
    embed_image(tmask, im, left, top);
    free_image(mask);
    free_image(resized_mask);
    free_image(tmask);
    }
    }
    free(selected_detections);
    }

  2. Use C-function void cvFillConvexPoly(CvArr* img, const CvPoint* pts, int npts, CvScalar color, int line_type=8, int shift=0 ) in function draw_detections_cv_v3() to draw rotated rectangle as described here in C++:
    https://stackoverflow.com/questions/43342199/draw-rotated-rectangle-in-opencv-c

darknet/src/image.c

Lines 482 to 602 in 48d461f

void draw_detections_cv_v3(IplImage* show_img, detection *dets, int num, float thresh, char **names, image **alphabet, int classes, int ext_output)
{
int i, j;
if (!show_img) return;
static int frame_id = 0;
frame_id++;
for (i = 0; i < num; ++i) {
char labelstr[4096] = { 0 };
int class_id = -1;
for (j = 0; j < classes; ++j) {
int show = strncmp(names[j], "dont_show", 9);
if (dets[i].prob[j] > thresh && show) {
if (class_id < 0) {
strcat(labelstr, names[j]);
class_id = j;
}
else {
strcat(labelstr, ", ");
strcat(labelstr, names[j]);
}
printf("%s: %.0f%% ", names[j], dets[i].prob[j] * 100);
}
}
if (class_id >= 0) {
int width = show_img->height * .006;
//if(0){
//width = pow(prob, 1./2.)*10+1;
//alphabet = 0;
//}
//printf("%d %s: %.0f%%\n", i, names[class_id], prob*100);
int offset = class_id * 123457 % classes;
float red = get_color(2, offset, classes);
float green = get_color(1, offset, classes);
float blue = get_color(0, offset, classes);
float rgb[3];
//width = prob*20+2;
rgb[0] = red;
rgb[1] = green;
rgb[2] = blue;
box b = dets[i].bbox;
b.w = (b.w < 1) ? b.w : 1;
b.h = (b.h < 1) ? b.h : 1;
b.x = (b.x < 1) ? b.x : 1;
b.y = (b.y < 1) ? b.y : 1;
//printf("%f %f %f %f\n", b.x, b.y, b.w, b.h);
int left = (b.x - b.w / 2.)*show_img->width;
int right = (b.x + b.w / 2.)*show_img->width;
int top = (b.y - b.h / 2.)*show_img->height;
int bot = (b.y + b.h / 2.)*show_img->height;
if (left < 0) left = 0;
if (right > show_img->width - 1) right = show_img->width - 1;
if (top < 0) top = 0;
if (bot > show_img->height - 1) bot = show_img->height - 1;
//int b_x_center = (left + right) / 2;
//int b_y_center = (top + bot) / 2;
//int b_width = right - left;
//int b_height = bot - top;
//sprintf(labelstr, "%d x %d - w: %d, h: %d", b_x_center, b_y_center, b_width, b_height);
float const font_size = show_img->height / 1000.F;
CvPoint pt1, pt2, pt_text, pt_text_bg1, pt_text_bg2;
pt1.x = left;
pt1.y = top;
pt2.x = right;
pt2.y = bot;
pt_text.x = left;
pt_text.y = top - 12;
pt_text_bg1.x = left;
pt_text_bg1.y = top - (10 + 25 * font_size);
pt_text_bg2.x = right;
pt_text_bg2.y = top;
CvScalar color;
color.val[0] = red * 256;
color.val[1] = green * 256;
color.val[2] = blue * 256;
// you should create directory: result_img
//static int copied_frame_id = -1;
//static IplImage* copy_img = NULL;
//if (copied_frame_id != frame_id) {
// copied_frame_id = frame_id;
// if(copy_img == NULL) copy_img = cvCreateImage(cvSize(show_img->width, show_img->height), show_img->depth, show_img->nChannels);
// cvCopy(show_img, copy_img, 0);
//}
//static int img_id = 0;
//img_id++;
//char image_name[1024];
//sprintf(image_name, "result_img/img_%d_%d_%d.jpg", frame_id, img_id, class_id);
//CvRect rect = cvRect(pt1.x, pt1.y, pt2.x - pt1.x, pt2.y - pt1.y);
//cvSetImageROI(copy_img, rect);
//cvSaveImage(image_name, copy_img, 0);
//cvResetImageROI(copy_img);
cvRectangle(show_img, pt1, pt2, color, width, 8, 0);
if (ext_output)
printf("\t(left_x: %4.0f top_y: %4.0f width: %4.0f height: %4.0f)\n",
(float)left, (float)top, b.w*show_img->width, b.h*show_img->height);
else
printf("\n");
cvRectangle(show_img, pt_text_bg1, pt_text_bg2, color, width, 8, 0);
cvRectangle(show_img, pt_text_bg1, pt_text_bg2, color, CV_FILLED, 8, 0); // filled
CvScalar black_color;
black_color.val[0] = 0;
CvFont font;
cvInitFont(&font, CV_FONT_HERSHEY_SIMPLEX, font_size, font_size, 0, font_size * 3, 8);
cvPutText(show_img, labelstr, pt_text, &font, black_color);
}
}
if (ext_output) {
fflush(stdout);
}
}

  1. filters=255

@BGA3
Copy link
Author

BGA3 commented Jan 12, 2019

Yeah okay, I see that it is a lot of work for a person that is not (yet) familiar with the code. Damn.

Is this something that would be of interest for the repo, or do you think it is only interresting for my project?

@AlexeyAB
Copy link
Owner

@BGA3 It would be interesting, but so far not a priority.

@AlexeyAB AlexeyAB added the want enhancement Want to improve accuracy, speed or functionality label Jan 12, 2019
@BGA3
Copy link
Author

BGA3 commented Feb 6, 2019

So, I have done some good progress in adding the alpha/rotation parameter. I can now train and test the network, and the x,y,w,h are found perfectly (as before) but the angle is not. I feel that I am close to resolving this, but I have been stuck for days now. @AlexeyAB would you be interested in looking my modifications (perhaps you can quickly spot the remaining issue)?

@AlexeyAB
Copy link
Owner

AlexeyAB commented Feb 6, 2019

@BGA3 Hi,
Yes, I can spend a little time and look at your solution.

@BGA3
Copy link
Author

BGA3 commented Feb 7, 2019

Ah that sounds great, @AlexeyAB. Thank you very much. I will give you all the likes and stars I can :)

Give me a couple of days to make my modifications readable to others, and to create a new branch. I will revert.

@BGA3
Copy link
Author

BGA3 commented Feb 8, 2019

@AlexeyAB Im ready with a commit now, that I think is well readable to others :) do you need to give me special acces to the repo? I am getting a "error 403" when trying to push to a new branch.

@AlexeyAB
Copy link
Owner

AlexeyAB commented Feb 9, 2019

@BGA3 You can fork this repo, commit your changes there and press Pull request.

@BGA3
Copy link
Author

BGA3 commented Feb 10, 2019

ah ofc, like that. I tried branching and pushing using Sourcetree directly from your repo, but ofc I dont have rights to that.
I believe I just send you a pull request (Y)

@BGA3
Copy link
Author

BGA3 commented Feb 15, 2019

@AlexeyAB a small update: I got the rotation output working! When I run darknet.exe directly and also using Darknet.py from python using the CPU build.

However, I have one issue: when I use the GPU dll from python (Spyder/Anaconda) I get the following error:
compute_capability = 750, cudnn_half = 1
layer filters size input output
0 conv 16 3 x 3 / 1 640 x 352 x 1 ‑> 640 x 352 x 16 0.065 BF
1 max 2 x 2 / 2 640 x 352 x 16 ‑> 320 x 176 x 16 0.004 BF
2 conv 32 3 x 3 / 1 320 x 176 x 16 ‑> 320 x 176 x 32 0.519 BF
3 max 2 x 2 / 2 320 x 176 x 32 ‑> 160 x 88 x 32 0.002 BF
4 conv 64 3 x 3 / 1 160 x 88 x 32 ‑> 160 x 88 x 64 0.519 BF
5 max 2 x 2 / 2 160 x 88 x 64 ‑> 80 x 44 x 64 0.001 BF
6 conv 128 3 x 3 / 1 80 x 44 x 64 ‑> 80 x 44 x 128 0.519 BF
7 max 2 x 2 / 2 80 x 44 x 128 ‑> 40 x 22 x 128 0.000 BF
8 conv 256 3 x 3 / 1 40 x 22 x 128 ‑> 40 x 22 x 256 0.519 BF
9 max 2 x 2 / 2 40 x 22 x 256 ‑> 20 x 11 x 256 0.000 BF
10 conv 512 3 x 3 / 1 20 x 11 x 256 ‑> 20 x 11 x 512 0.519 BF
11 max 2 x 2 / 1 20 x 11 x 512 ‑> 20 x 11 x 512 0.000 BF
12 conv 1024 3 x 3 / 1 20 x 11 x 512 ‑> 20 x 11 x1024 2.076 BF
13 conv 256 1 x 1 / 1 20 x 11 x1024 ‑> 20 x 11 x 256 0.115 BF
14 conv 512 3 x 3 / 1 20 x 11 x 256 ‑> 20 x 11 x 512 0.519 BF
15 conv 21 1 x 1 / 1 20 x 11 x 512 ‑> 20 x 11 x 21 0.005 BF
16 yolo
17 route 13
18 conv 128 1 x 1 / 1 20 x 11 x 256 ‑> 20 x 11 x 128 0.014 BF
19 upsample 2x 20 x 11 x 128 ‑> 40 x 22 x 128
20 route 19 8
21 conv 256 3 x 3 / 1 40 x 22 x 384 ‑> 40 x 22 x 256 1.557 BF
22 conv 21 1 x 1 / 1 40 x 22 x 256 ‑> 40 x 22 x 21 0.009 BF
23 yolo
Total BFLOPS 6.964
Allocate additional workspace_size = 52.43 MB
Loading weights from C:/GIT/darknet/build/darknet/backup/yolov3‑tiny_rotated_last.weights...Done!
cuDNN Error: CUDNN_STATUS_EXECUTION_FAILED: No error

What is even more weird is, that I can easily use GPU (on the same PC) when I run darknet.exe. Would it be possible for your to have a quick look? I have just made a new commit to my pull request.

@dselivanov
Copy link

@BGA3 I've checked your fork and it seems you've removed angle parameter in the last commit. I'm interested in this functionality as well. So if you can push the latest working version, I can take a look and then we can draft a PR here.

Thanks.

@BGA3
Copy link
Author

BGA3 commented Mar 19, 2019

Hi @dselivanov The thing is, that I unfortunately somewhat abandoned this topic, as my project went in another direction. So at the moment I dont have time to further develop this unfortunately.

@dselivanov
Copy link

I've tried to "hack" darknet codebase - it is not easy, parameters are hardcoded in many files and it will take quite a lot of time to make it work. For now I've switched to pytorch, it seems it will be faster to make it work there.

@dexception
Copy link

@AlexeyAB
Can you please reply... if you have any plans to add this functionality.

@BGA3
Copy link
Author

BGA3 commented Aug 20, 2019

For information, I ended up implement rotated bounding box detection in python/keras using the yolov2 framework (easier due to only one yolo-layer). And I realized that the learning curve was large for the darknet solution, atleast for my skillset.

@AlexeyAB
Copy link
Owner

@dexception not yet

@zcrnudt
Copy link

zcrnudt commented Nov 24, 2019

I revised the code for rotating ROI detection, but angle is not correct. Can someone help me to correct it ? @AlexeyAB

@dselivanov
Copy link

dselivanov commented Nov 24, 2019

@zcrnudt which code did you revise? what is incorrect? it is completely unclear what do you ask.

@zcrnudt
Copy link

zcrnudt commented Nov 24, 2019

Add angle parameter to the model (x,y,w,h) as (x,y,w,h,a), but the angle cannot converge in training step.@dselivanov

@jamessmith90
Copy link

This is pending for last 1 year. Any progress ?

@dselivanov
Copy link

@jamessmith90 task for whom? Feel free to contribute code, this is open source project.

@jamessmith90
Copy link

The fact that @BGA3 has not updated the code for rotation.

@robosina
Copy link

robosina commented Oct 18, 2020

@AlexeyAB , @dselivanov I have added rotation support in detection mode and make a pull request in this link but I would also like to implement this requested enhancement. As far as I looked in comments I can't find any fork that make progress on it, If there isn't any fork in this issue so I will deploy rotated bounding box from scratch.(I think in some tasks makes convergence faster and maybe better accuracy)

It seems that we have to work on labeling procedure too.

@dselivanov
Copy link

@robosina there is some relevant discussion here #4360 (comment). In our case we have great success with rotated boxes. We even submitted paper to ECCV this year but got "border reject" due to "not enough scientific novelty" (but at the same time I've seen so many bullshit papers accepted).

However I haven't seen any serious attempts to implement it within darknet framework. Initially (~ 1.5 year ago) I've tried to do so but gave up as calculations related to the b-boxes are hardcoded in quite a lot of places. I'm not that comfortable with darknet codebase.

@HFVladimir
Copy link

@dselivanov Hi! I've read all discussions here about rotated bboxes, still haven't found worked solution. Trying to implement it by my own. As I understood you have some working solution on pytorch as shown at #4360. Could you please share some code how you did that or at least that public implementation I could get as starting point. It would be very helpfull for me, if I had it would match it with current C code and make a PR.

@dselivanov
Copy link

dselivanov commented Feb 8, 2021

@HFVladimir unfortunately I can't share code as it belongs to the company. But overall logic is described in #4360 - you need just to add one extra parameter for the angle.

predicted_angle = sigmoid(raw_yolo_layer_output)
angle_loss = sin(predicted_angle - true_angle) ^ 2

We've tried logistic and tanh for sigmoid activation. Haven't noticed any significant difference between them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
want enhancement Want to improve accuracy, speed or functionality
Projects
None yet
Development

No branches or pull requests

8 participants