I am having the following discussion with my colleagues and wanted to get some other people's opinions on this. We (a major global company) are currently working with several kinds of presence/people counting sensors. Several vendors offer these, they are PoE devices that are plugged into a switch and placed in meeting rooms etc. The major players in this field offer a total package so to speak; the sensor, connects to their saas backend (consists of some IoT hub, storage for data, web application for reporting/sensor output and configuration). Sort of a black box; you just plug it in and get going with it.
Now of course there are several types of sensors and we want to get the data out for usage in a data lake to combine all the data, while keeping the individual reporting back-ends for fast, out of the box basic reporting.
Here is where the difference of opinion comes in. Some would say we need to get all the raw data and propose that the vendor sort of breaks open their black box and divert/duplicates the raw data from their IoT hub equivalent, to our company's IoT hub in azure. Note: the device cannot be properly managed from the azure Iot hub, it just for the data. The vendor cannot just deliver the hardware, because part of the solution is that some ML like operations are applied in their data storage, to tweak the raw data for optimization. Rationale for this is that we need the original data so we are fully in control, know what is going on and we can replicate their ML operations.
Another school of thought is that our company has a strategic IT policy of only getting saas, off the shelf applications, little or no customization as possible. Buy before build, etc. Therefore keep it as a black box, do your due diligence during vendor selection (GDPR, security, etc.), get a rough understanding of what is happening to the data to easy your mind and put a solid SLA in place if any worries left. Then use the vendors rest API to get the (modified) data to transfer to our data lake. Rationale: the vendor spent 10 years developing their product, therefore do not be so arrogant that you think you can understand it better. Since knowledge is poorly retained in big companies like us, it really is a false sense of security to try to get the raw data and replicate the vendors mechanisms. It is for non business critical application and if their API is well documented data wise, why bother.
Your thoughts?