Iot datalake. Can you help me. Thanks

Question

I collect data from iot sensors. I get some byte datas, whose bits have some meanings. I will use some bits for anlytics finally.

So the following is my question.

Should I convert these bits used for analytcis to new fields, and save in another S3 bucket by using ETL job?
If I shoud do step 1, bit or it's meaning for analytics, which is better to save?
If I should do step 1, I guess that saving bit maybe better for compression and throttle, but not friendly for data scientist. Is there a good way to solve this?

You can [edit] your question either by clicking the link in this comment or by clicking the edit button underneath your question. For now I've copied the edits into your question and removed the answer. — anonymous2, Apr 18 '22 at 15:24
Without understanding the context of the data or the analysis you intend to perform answering this will be very hard, but I will say, converting the bitmask data to individual fields is likely to make querying the data a LOT easier. — hardillb, Apr 18 '22 at 18:13

score 0 · Answer 1 · answered Apr 21 '22 at 21:04

0

You should try using Time Series Database, it is suitable for building applications for IoT. For example, a DB of this type is InfluxDB

answered Apr 21 '22 at 21:04

pierreneter

1

score 0 · Answer 2 · edited Feb 27 '23 at 07:54

It depends on the probability that you will need to use incoming data for multiple analysis runs in a given duration of time. Here's why:
The bit packing is good for reduced storage and reduced transmission cost.
If manual steps/handwritten code needed, there's a chance of errors and complexity.
The expanded data format is good for easy analysis but higher storage cost.
So, if your incoming format is the tight format, there's a cost to convert to the expanded format and to store the expanded format for a duration of time.
So, if you don't have an immediate need to analyze the data, you can just store the bit format and wait until you need to expand it. And when you expand it, how long you need to keep the expanded format depends on whether you may need to run more analysis on it soon.

You may consider the following:
If you use something like protobuf with packed fields, you don't get bit fields, but the resulting output may compress better. And the integers occupy the least number of bytes at least, if not the least number of bits.
With protobuf, etc, you maybe able to convert to JSON/XML without handwritten code(not bitfields).
For proof of concept, you maybe able to get by with JSON/XML with compression.
There is something called bitproto that attempts to create a version of protobuf with bitfields. I have not tried this. You may want to.

Iot datalake. Can you help me. Thanks

2 Answers2