Alternatives to hdf5

Question

I've been using HDF5 for years, but as the size of the dataset grows I'm starting to experience the same problems listed here

http://cyrille.rossant.net/moving-away-hdf5/

Can you point me to a format like HDF5 with - Good support for parallel writes - Support for chunked access of huge matrices

My typical use case is a 100k x 100k integer matrix. I'd like to have it as a whole file from a logical perspective, but I need to write it chunk by chunk with parallel workers.

Can you please explain what kinds of datasets you need to export? This may be helpful to people looking to answer your question. I've considered HDF5 as well as netcfd. But these may be more geared to certain data sets. — Charles, Oct 15 '16 at 03:19
Compressed VTK supports chunks. You can save parallel many files and merge it together using PVD meta file. What is the size of your dataset? — Krzysztof Bzowski, Oct 15 '16 at 07:09
@aidan.penert.macdonald I kept with hdf5, using parallel writes with MPI. But I had to abandon Python — M. G., Apr 19 '17 at 16:09
[ADIOS][1] might be worth a look. [1]: https://www.olcf.ornl.gov/center-projects/adios/ — Nox, Feb 04 '19 at 17:04

score 9 · Answer 1 · edited Feb 04 '19 at 13:07

HDF5 is, to some extent, a filesystem on its own. By introducing B-Trees and by the way it manages blocks, it duplicates the functionality of a filesystem. When you are running your code, you are probably running it on an operating system with a proven and scalable filesystem. Hence, I would suggest to write your numerical raw data into a single file using raw file access or MPI-IO and write the meta-data (endianess, size, attributes, etc.) into a separate JSON or XML file. If you have multiple datasets you can organize them into a directory or a hierarchy of directories. When you want to distribute the dataset, you just have to pack it into a ZIP file.

The only downside is that you have to deal with Endianness yourself, which is, however, not hard.

For an inspiration on how this can be done see Dragly, et. al. "A. Experimental Directory Structure (Exdir): An Alternative to HDF5 Without Introducing a New File Format" Front. Neuroinform., 2018, 12.

Alternatives to hdf5

1 Answers1

Linked