How can I recognize animals in a video stream or static images with openCV or other library/software?

Question

I'm a software developer not experienced in AI or machine learning, but I'm now interested in developing this kind of software. I want to develop software that recognizes some specific objects, specifically, animals from a video stream (or a sequence of static images).

I saw there's a library called openCV which is often commented in this forum, but what I saw so far is this library is a helper for working with images, I didn't find the object recognition or self learning part.

Is openCV a good starting point? better go for some theory first? or there are other already developed libraries or frameworks aimed for object recognition?

EDIT To give some context: I will have ona camera checking a landscape, mostly static but some leaves may move with the wind or some person may step in, and I want to get an alert when some animal is into view, I can reduce the "animals" to only birds (not always I will have a nice bird/sky contrast).

I did some work with supervised neural networks some 15 years ago and studied some AI and machine learning theory, but I guess things have improved way too much since then, that's why I was asking for some more practical first steps.

Thank you

What you're asking to do is a very, very difficult problem. OpenCV features a machine learning library, but this is not something that you "just do" - it's going to take a lot of time to come up to speed on image processing and machine learning, and even then you're working on a problem that is very difficult. — Chuck, Aug 20 '15 at 19:10
Also, this problem would be better suited for asking at the computer science SE site, but I would imagine you'll get a similar response. This question doesn't have anything to do specifically with robotics. — Chuck, Aug 20 '15 at 19:12
I'm voting to close this question as off-topic because it is not a question about robotics. — Chuck, Aug 20 '15 at 19:13
Not necessarily that hard. If you have a static background all you really need to do is detect movement with BackgroundSubtractorMOG2 and those moving parts are likely animals. If your background is panning, then it is only slightly more complicated. — Octopus, Aug 20 '15 at 21:42
@Chuck, the XKCD hover-over text is exactly what I thought of when I read this question. — Ryan Loggerythm, Aug 20 '15 at 21:56
@K. Weber, I agree with Chuck that in the aggregate sense, it really is an advanced topic. However, if you are looking to do a "smaller" set of it, there are ways to "cheat". Can you give more specifics as to exactly your use case? For example, if you really have video/images of only animals (no humans), you might be able to cheat using some OpenCV libraries that detect faces... but once again it depends on what you really are trying to accomplish. — Aerophilic, Aug 21 '15 at 05:16
I agree with Octopus and Aerophilic in that there are some scenarios such as the scene is known in advance, or if the animals are all of a known type, etc., where you might be able to get by. However, if different types of animals are in the same scene, or you don't know when the scene has no animals (to initialize background subtraction), or things other than animals are in the scene (vehicles), then it will be very tough. Even if you do modify your question to clarify what exactly it is you are looking for, this question is still not directly related to robotics. — Chuck, Aug 21 '15 at 13:46
@Aerophilic, current face detection algorithms work great (for human faces) when the subject is looking at the camera and is oriented vertically, forehead at the top. Other orientations, or faces in profile invariably fail the algorithms. Similarly, I think it is still very difficult to classify an animal. My previous comment was only about identifying pixels in the image as animal because it moves through the scene. Identifying which animal is still a complex problem. The OP's question can be interpretted at different levels. — Octopus, Aug 21 '15 at 20:29
I do think that this is off topic for robotics, but I find it to be a fascinating question. — Octopus, Aug 21 '15 at 20:35
@Chuck I don't know if I agree that this is off topic for robotics, though perhaps we should move this to "meta". But in short, based on my direct experience, but also via the definition of "robot" by the former director of the Robotics Institute @ Carnegie Mellon, a Robot is anything that "Sense, Thinks, or Acts". Since this is part of the sensing, I don't see why we shouldn't talk about computer vision on this Stack Overflow. — Aerophilic, Aug 23 '15 at 19:05
@Aerophilic I'm going off of the guidelines for on-topic questions, which say basically Arduino-only questions belong on the Arduino SE, Electronics-only on the EE SE, etc., and programming-only questions on Stack Overflow. Artificial Intelligence could be here by your reasoning, but it is better suited elsewhere. — Chuck, Aug 23 '15 at 23:00
@Chuck Perhaps we can kick off a Meta Discussion on this topic, but it may already be addressed in one of these three questions: http://meta.robotics.stackexchange.com/questions/1/what-do-we-do-about-platform-specific-questions, http://meta.robotics.stackexchange.com/questions/5/how-do-we-address-questions-about-related-subject-areas, or http://meta.robotics.stackexchange.com/questions/4/where-does-robotics-end-and-electrical-engineering-begin. In all three questions, it seems the community seems to fall on the idea of being "more" inclusive than not, given the nature of robotics. I agree IMHO — Aerophilic, Aug 24 '15 at 01:48
Remember folks, comments are intended for improving questions and answers. If you want to chat, please do it in [chat]. If you have issues of policy, feel free to ask a question on [meta]. I will offer this question to both [cs.se] and [dsp.se] but I suspect that they will look at the fact that we have a good accepted answer here already and reject the migration. See also my answer to this meta question. — Mark Booth, Aug 24 '15 at 11:10
@MarkBooth: Yes, I'm loath to move a question once it has an upvoted accepted answer... even if the topic might be more appropriate elsewhere in the *.SE network. — Peter K., Aug 24 '15 at 12:48
I was looking for a starting point so I think now I have enough to start researching, once I get more specific doubts I will go to that forum. Thanks to all. — K. Weber, Aug 24 '15 at 13:21
Moderators from both [cs.se] and [dsp.se] have said that it wouldn't be appropriate to migrate this question. @K.Weber you may want to join those sites though, as future questions might be more appropriate to ask there. we would however welcome future questions here too. — Mark Booth, Aug 24 '15 at 14:14

score 5 · Accepted Answer · edited May 23 '17 at 12:37

Given that you are doing a more "constrained" goal, with a "mostly" static background, I would recommend simply doing a "background image subtraction" method. The "hard part" which has come a long way over the last decade is how you deal with shadows, light changes, and foliage moving.

There are tons of resources on this topic, but here is a good one I found after a quick cursory search: http://www.pyimagesearch.com/2015/05/25/basic-motion-detection-and-tracking-with-python-and-opencv/

This should get you to a 80% solution for what you want.

If you want to go deeper, and try to identify specific animals, there are two main approaches you can potentially follow. The easy one is Template Matching, the harder one is creating a Bayes Classifier.

In either approach, you would:

Gather a sample set of data (most likely by using the output from above)
Either:
- Create templates you would match against
- Train your classifier to identify the animals you would want

A couple of notes:

Template matching out of the box is highly scale and orientation dependent. While you can start with basic template matching, you'll probably quickly want to create a Gaussian pyramid. Here is a good reference: https://stackoverflow.com/questions/22480485/image-matching-in-opencv-python
Doing Bayesian Classifiers well is hard, and if you just search on Google Scholar, you'll see a tons of papers on the subject. However, it seems to be the "way to go" for high accuracy. Generally you would combine the base classifier with some other machine learning technique (such as a Markov Model). If you do go this route, I would recommend trying to do something "simpler" than trying to find a whole bird. Instead, I would recommend perhaps identifying a simple feature that would "mean" bird/animal, such as locating an "eye" or "beak".

Hope this helps.

How can I recognize animals in a video stream or static images with openCV or other library/software?

1 Answers1