9

Version 11.1.1 comes with improved support for the HDFv5 (HDF is an acronym for Hierarchical Data Format) file format. How can I export (and import) the following data into h5 format?

The data I want to export contains both images and lists of lists, here's a minified example:

imgs = ExampleData /@ RandomChoice[ExampleData["TestImage"], 10];
faces = FindFaces /@ imgs;
dummy = <|"images" -> imgs, "faces" -> faces|>
Export["micro-face-dataset.h5", dummy]

enter image description here

Furthermore, I need the data to be written in a standard way so I can access it not only like

Import["micro-face-dataset.h5", {"Datasets", {"/images"}}]

but also from other languages, e.g. python's h5py:

f = h5py.File('micro-face-dataset.h5', 'r') 
dataset = f['/images']

Here's a list of subproblems that make this processes tricky:

  • How can we specify attributes and formats for mixed data types (and encode the images properly, etc.) for export? The docs show examples for import only: enter image description here

  • How can we export/import ragged lists? enter image description here

Update

@yode This is definitely possible, see ExampleData/image.h5, one can load it into python:

>>> import h5py, numpy as np
... f = h5py.File('image.h5','r')
... img = np.array(f['image24bitpixel'])
... print img.shape
(149, 227, 3)

It's just a question of how it was created, perhaps with ExportStructuredHDF5[] in GeneralUtilities.

M.R.
  • 31,425
  • 8
  • 90
  • 281

1 Answers1

7

This way can export the data

imgs = ExampleData /@ RandomChoice[ExampleData["TestImage"], 10];
faces = FindFaces /@ imgs;
dummy = <|"images" -> imgs, "faces" -> faces|>;

Get["GeneralUtilities`"];
ExportStructuredHDF5["micro-face-dataset.h5", 
    <|"images" -> ImageData /@ imgs, "faces" -> faces|>]

Import the data from file

Import["micro-face-dataset.h5", {"Data", "faces"}]

Image /@ Import["micro-face-dataset.h5", {"Data", "images"}]

enter image description here

partida
  • 6,816
  • 22
  • 48
  • But the ImportStructuredHDF5 don't work,so how do you import the file? – yode Jun 19 '17 at 08:11
  • @yode see my answer, I modify it – partida Jun 19 '17 at 10:20
  • I'm sorry,I find the ImportStructuredHDF5 work well,but it will fail if the path have some non-ASCII characters.. – yode Jun 19 '17 at 12:16
  • @yode I don't know what the right way,but this can workimporthdf5[file_] := Reap[RenameFile[file, "test_NonASCII.h5", OverwriteTarget -> True]; Sow[Image /@ Import["micro-face-dataset.h5", {"Data", "images"}]]; RenameFile["test_NonASCII.h5", file, OverwriteTarget -> True]][[2, 1]]; importhdf5["测试.h5"] – partida Jun 19 '17 at 12:34
  • 1
    Of you can solve the problem when just the .h5 file name have those character.But I am tend to think this this bug. if your path(not only in your file name,but also in your middle directory) have non-ASCII charater(such as Chinese character),then you will fail to import. – yode Jun 19 '17 at 12:52
  • @yode emmm....... yes, and the RunProcess don't support non-ascii well.I also encounter this many times before (;′⌒`) – partida Jun 19 '17 at 13:05