10

In a lot of engineering tasks one ends up with massive amounts of .txt or .csv files from various measurement systems (oscilloscopes, temperature probes etc.) that usually end up on a local or central disk storage with some cryptic filename and no further explanation. This makes both working with the data and sharing it with colleagues a tedious task. My question is thus:

Is it a good idea to use EntityStore and friends (ResourceObject, ContentObject, ...) in a NoSQL-fashion to store and access data (in-between sessions, with multiple users, ...) and what are the implications of doing so (Where are the files stored?, Are they human-readable? Are they compatible across different versions of Mathematica? Can they be used by other software e.g. Matlab, ...)

Sascha
  • 8,459
  • 2
  • 32
  • 66
  • I've just tried this feature and entities saved with EntityStore doesn't seem to persist. (After I restart the kernel, $EntityStores resets into {}.) So you might need to save the store manually into a local file or to the wolfram cloud. – Gyebro Dec 02 '16 at 10:46
  • 2
    @Gyebro In theory you should be able to use the closely related ResourceObject functionality for persistence, we talked about that here but as was noted there are problems. I discovered it as a tried to use ResourceObject with EntityStore for this answer. – C. E. Dec 02 '16 at 11:12
  • Have you thought about using cloud storage like Google Drive to manage data across the team, and exporting metadata (data dictionaries) as JSON structures? Even without JSON, GD and using gsheets for metadata (which in addition enables live co-editing) has worked well for our science workflow, ~10GB scale. – alancalvitti Dec 02 '16 at 20:13
  • I think the EntityFramework is conceptually not what you want for this sort of resource management. ResourceObject would make much more sense but as far as I can tell there no good way to create a custom repository of data from, say, a set of experiments, though that'll probably come reasonably soon. On the other hand, the thing to do may just be to write a resource manager. I knocked up a quick example one here which lets one sync to the cloud or to a directory, like say one's Google Drive directory. – b3m2a1 Dec 05 '16 at 18:45
  • @MB1965, your ResourceFramework sounds interesting, as we also use GD. Do you have any examples or tutorials for use? One of our challenges is metadata management, which we hack at analysis time by merging various spreadsheets - but it does enable whole-team visibility and real-time co-editing, which is much better than hiring a DBA to update schemas. – alancalvitti Dec 07 '16 at 18:30
  • @alancalvitti It's not a terribly well-developed framework as I just knocked it up to see if that type of thing would be a sufficient answer to the posters problem. What I did was simply hash my files by filepath and store them in a directory--essentially fake a database--then I included a .m file of metadata with the name <path_hash>_info.m. I first did resource management in the cloud where there is MetaInformation but for the desktop/Google Drive version I had to use a different file type. If you're using Mathematica, .m files like this are the way to go for metadata. – b3m2a1 Dec 07 '16 at 21:15
  • @MB1965, well the problem w/ .m files is in team settings where other analysts are using different technologies (eg R, Python) - need solutions amenable to all. Also explicit hashing from path may not be necessary. We leverage GD's filesystem and import the entire data tree (including metadata files) as a single nested association, eg data["myDir","dataFolder1","data11"] etc. Is that what you use hashing for? – alancalvitti Dec 07 '16 at 21:51
  • @alancalvitti we should move this discussion out of the comments I think, but my reason for hashing was that I intended for the framework to be used as an as-you-go push/pull mechanism for data to a more general cache and I wanted ~/Desktop/example.txt to be distinct from ~/Documents/example.txt while keeping my directory structure flat. My thought was as you used Mathematica you could simply call something like $manager["Submit",file,metadata] and it would push and $manager[key] and it would pull. This solves @Sascha's problem without requiring ResourceObject. – b3m2a1 Dec 07 '16 at 22:04
  • @MB1965, sounds good, will be happy to chat or email. Re ~/Desktop/example.txt vs ~/Documents/example.txt , these are distinct paths and importing via the method I outlined preserves their difference. They would be accessed as, eg data["Documents","example.txt"] and data["Desktop","example.txt"] . As an independent step, I wrote another 1-liner function to serialize paths to a flat list - discussed elsewhere on M.SE. Starting with v10 functions, these are easy and time-efficient methods. – alancalvitti Dec 08 '16 at 01:15

0 Answers0