1

I have pictures like that one below.

Here is a car from the back and some background.

How can I adjust rect to fit car better, without free space around? It can be very approximate solution, main thing here is that car can vary a lot, as well as background. I've tried many methods from OpenCV, like active counters (nothing good at all), thresholding, sobel and canny operators, etc. Any ideas?

enter image description here

Here is 2670 examples, if someone is interested in deeper understanding of problem. In archive there are also front rear cars, but that is not necessary to adjust them too, main goal is cars from behind.

UndeadDragon
  • 113
  • 6
  • is your picture really that small and blurry, or is this just a "bad" excerpt? – Marcus Müller Jun 09 '16 at 08:50
  • because. frankly, I have great eyes, and I know the human mind is pretty great at segmentation, but if the bright spot in the background wasn't there, it'd be nearly impossible to tell for me where the car ends. – Marcus Müller Jun 09 '16 at 08:51
  • @MarcusMüller, yes, most of my pictures isn't good quality. But there is structure, therefore must be possible to adjust or segment. I successes a little with sobel, but it doesn't works in general case. – UndeadDragon Jun 09 '16 at 08:52
  • Could you please provide more examples? Is the background always like in this image, or is it varying? Is it important to have the full car in the rectangle (i.e. is it important that the tires are in the image or is it enough when the trunk is in the rectangle)? – M529 Jun 09 '16 at 08:53
  • @MarcusMüller my goal isn't perfetcly find car on image. I want just adjust rect as much as possible - approximate solution will be ok. – UndeadDragon Jun 09 '16 at 08:53
  • But there is structure, therefore must be possible to adjust or segment. well, that is a bold claim! Frankly, I don't think that's true without very much ado. – Marcus Müller Jun 09 '16 at 08:54
  • The point is "there is structure" goes for the background, for the car, for the edges of your car, for the street, for the shadow of your car, but not all cars will have a nice shadow like yours during all weather/sun positions. Not all cars will be as "quadratic" as yours, not all cars will have a shiny reflection on their back window... so frankly, telling this car from a larger sign, a garage, a door, is kind of hard if you don't assume it's a car. However, even with that assumption, I really don't see that clear difference between car and background that you see there. – Marcus Müller Jun 09 '16 at 08:57
  • @MarcusMüller I just added example of what I achieved with sobel. As you can see, it is possible. But that method has bad generalization. – UndeadDragon Jun 09 '16 at 09:01
  • Can you please add some more information about other constraints of your problem? For example, is it absolutely guaranteed that all images are images of cars from behind like this one? What is the end goal? What are you trying to achieve? – A_A Jun 09 '16 at 09:04
  • @M529 some more examples. Background vary a much, yes. – UndeadDragon Jun 09 '16 at 09:06
  • @A_A Car can be from front and from behind, but it is ok if I will find method for only from behind. It is guaranteed that car is in center (+-, but mostly) and tooks most place of the picture. I'm trying to improve my classifying model, it works bad with pictures with much background place, and I have no resources to improve it performance in other places. – UndeadDragon Jun 09 '16 at 09:11
  • 1
    @UndeadDragon can you please not link to a russian site with much more dodgy adverts than picture content, but add these examples to your question by editing your question? – Marcus Müller Jun 09 '16 at 09:11
  • @MarcusMüller Stack claims me to upload not more than 2 pictures, not my fail. – UndeadDragon Jun 09 '16 at 09:13
  • ah, ok. My adblocker doesn't let me see much of your pictures, but if you could upload them at a more reputable image hosting service such as imgur.com directly, that would be great. Then link to them, I'll add the images to your question then. – Marcus Müller Jun 09 '16 at 09:38
  • @MarcusMüller I think will be easier to upload a part of my testset. I just added link to google drive in topic start post. – UndeadDragon Jun 09 '16 at 10:02

1 Answers1

1

Assumptions

  • The car is basically always in the center
  • The background on each side (upper, lower, right, and left side) is not too "noisy" in that sense that the colors do not change much in a stripe of a few pixels.

Suggested Solution

Yes, you could train a neuronal network for this task, since nowadays apparently everything is done by any kind of machine learning. But maybe this is overkill. Try a simpler solutions like the following. Please note that this process has to be done independently for all four image sides - I only explain it for one side here for the sake of simplicity:

  • Initialize an histogram variable $h$ with the first column of pixels in your image
  • In a loop, select the i = 1, 2, ... n-th column of pixels in your image.
    • Calculate the histogram $h_i$ of the selected column
    • If the histogram is similar to $h$, add $h_i$ to $h$
    • If the histogram is dissimilar to $h$ stop the loop and take the loop index as the position information for one side of your rectangle. Maybe you want to subtract a few pixel lines for padding... That also may differ from the side you are inspecting, e.g. I am not sure if the wheels will trigger the loop cancellation early enough. But that is up to you to find the appropriate threshold ;)

As a similarity measure of the histograms, a few ideas come to my mind, such as: squared-sum-of-residuals after subtracting the two histograms, or a joint-entropy/mutual information measure.

M529
  • 1,736
  • 10
  • 15
  • Thank you for your response. It is very good way to think about. I will try it and write about results. Also, I already tried conv network with regression (where output was empty space rate on image), but that didn't work good. Problem is that it is hard to configure network to understand what I want, I need to describe my own objective function and maybe back-propagation. And yes, it seems to be overkill, very slow for such goal. – UndeadDragon Jun 09 '16 at 10:01
  • Well, that method works pretty good, thank you. Best results I achieved with blur and OpenCV compareHist with correlation method. Much remains to be tuned, but that idea seems to be good solution. – UndeadDragon Jun 10 '16 at 11:09
  • That's great to hear! :) Since you already know that the vehicle is approximately in the middle of the image, you could also impose a condition on the number of removed pixel columns/rows, such as: The left and right number of removed columns must approximately be equal (or you take the lower one). The same may also work for the upper and lower edges. – M529 Jun 10 '16 at 11:44
  • Yep, you are right. The only problem now is speed. For unknown for me reason OpenCV calcHist tooks 0.3 ms for every column (1 x 108 image) on PC, at least in python, it is unbelievable slow (and I dont think problem in Python, usually it uses C under the hood). I think about optimization now, maybe will write my custom functions. – UndeadDragon Jun 10 '16 at 14:29
  • Seems that sometimes method is not robust enough :( How do you think, how I can train CNN to fit my problem? What must be output? Maybe four coords of car? – UndeadDragon Jun 17 '16 at 16:58
  • I have no idea how this could be done with a neuronal network. Do the images where the approach fails have something in common? So a special case that could be treated specially in the algorithm? And how high is your error-rate? What makes you sure that a neuronal network does have a lower error rate? The images are quite hard to process in terms of contrast. – M529 Jun 18 '16 at 09:36