5

I am trying to extract data from images (using some of the methods presented in [1524]), and would like to have a way to selectively cover/remove text from axes and legends.

TextRecognize has been reasonably good at finding the text, but I am curious: is there a way to determine where this text is located?

For example, how could we determine the approximate image location of the "Test plot" text in the following case?

testImage = Image[Plot[x, {x, 0, 6 π}, 
 PlotLabel -> "Test plot", BaseStyle -> {FontSize -> 24}]]
TextRecognize[testImage]

enter image description here

"Test plot  
5 10 15"
Rashid
  • 1,523
  • 10
  • 18

1 Answers1

7

With enhanced in version 11.1 TextRecognize finding positions of recognized text becomes straightforward:

testImage = 
  Image[Plot[x, {x, 0, 6 π}, PlotLabel -> "Test plot", BaseStyle -> {FontSize -> 24}]];
res = TextRecognize[testImage, "Block", "BoundingBox", RecognitionPrior -> "SparseText"];
HighlightImage[testImage, {"Boundary", res}]

output

The original question is answered. But as one can see, not every glyph is recognized...

Another route goes through ComponentMeasurements:

comp = ComponentMeasurements[testImage, "BoundingBox"][[;; , 2]];
(* A workaround for bug *)
comp = Developer`FromPackedArray@comp;
comp = Rectangle @@@ comp;
comp = Select[comp, Area[#] < 10000 &];
HighlightImage[testImage, {"Boundary", comp}]

graphics

Now every glyph is found. Voilà!

Alexey Popkov
  • 61,809
  • 7
  • 149
  • 368