7

I tried to build a classifier for some images that I have with the following code:

class1 = Import /@ 
   FileNames["*", 
    "location1"];

class2 = Import /@ 
   FileNames["*", 
    "location2"];

classifier = 
  Classify[Join[# -> "H" & /@ 
     class1, # -> "C" & /@ 
     class2]];

I believe that this would work, but it crashes with an out of memory error after the system tries to use more than 64GB of RAM.

Total size of all the images in collection is 42MB.

How can I build a classifier that in such a way that it doesn't require more than 64GB of RAM?

Update: I converted all the files to .jpg, and the issue remained.

JungHwan Min
  • 4,664
  • 1
  • 22
  • 36
soandos
  • 1,030
  • 8
  • 20
  • What is the file format of the images? – C. E. Mar 09 '16 at 00:44
  • @C.E. .tiff. Does it matter? – soandos Mar 09 '16 at 00:44
  • Not always, but it can. There is a discussion specifically on .tiff images and memory consumption here. – C. E. Mar 09 '16 at 00:53
  • @C.E. h'm. Should what format should I convert them to as a test? – soandos Mar 09 '16 at 00:54
  • I don't know, let's wait and see what others have to say. – C. E. Mar 09 '16 at 01:02
  • What are you trying to classify? What is special about the two classes of images? How separable are the two classes of images? This is a problem with functions like Classify — they promise the world in a black box that can be used unsupervised w/o knowing anything, but it works only when you have infinite memory for it to overfit to every possible combinations of inputs /rant – rm -rf Mar 09 '16 at 04:23
  • I have two types of images (happen to be two types of documents). A human can easily do it, no idea about a machine. Accuracy does not need to be great, but im aiming for a low false negative rate. All images are in in two classes (really its in the class I care about or its not). Does this help? – soandos Mar 09 '16 at 04:26
  • I'd begin by turning each image into a feature vector. – Searke Mar 09 '16 at 04:31
  • How do I do that? – soandos Mar 09 '16 at 04:32
  • 2
    There's no simple answer to that question. You begin by guessing "what values are relevant in these images for an algorithm that would need to distinguish between them?" and then you'd extract values for each of those. So for example, you might extract the brightness of each image. Or maybe how green the image is. You then have a vector of each of these features and use these vectors instead of the image. – Searke Mar 09 '16 at 04:34
  • In machine learning, getting the feature vector right is often the real battle. – Searke Mar 09 '16 at 04:38
  • Can you try picking a Method manually? Doesn't really matter which one. I get this kind of error too mainly when it is trying to figure out the best method automatically. – Philip Maymin Apr 21 '16 at 14:58
  • Alternately, you can use ImageResize and use 150x150 or a bit more... you don' need the full size images. – s.s.o Nov 13 '16 at 18:30

1 Answers1

2

This isn't a complete answer (indeed, that would be impossible since we don't have access to the images), but is too long for a comment.

You might try ImageIdentify[image,category,n] to generate "feature vectors" consisting of n words. If you are lucky, the best n matches might contain enough information to distinguish the two kinds of images. The advantage of this approach is that you would leverage all the sophisticated image processing inherent in ImageIdentify and the classifier needs to only function on the words. So in outline, for each image you would get a list like

feat = ImageIdentify[image, All, 5]

which returns a 5-vector of where feat[[All, 2]] contains the best-match words. These sets of words could then be used in the Classify step.

bill s
  • 68,936
  • 4
  • 101
  • 191