Written by Laurence Hubert, CEO hurence
We were present at the 2023 edition of the Big Data and AI show. This year we positioned ourselves as observers (and not actors) because we wanted to take the time to discuss with our solution provider colleagues. We therefore spent the two days of the show in in-depth discussions and demonstrations with a very warm welcome from the actors we met, even when we had identified ourselves as competitors. We really liked the technical interactions we had as well as the very good quality exchanges in the workshops with big names who came to present their visions (Google, IBM Consulting, Business Decision and Mick Levy etc.), in the startup village and in a few remote corners of the show, sheltered from the noise of the big show A of the big players (many thanks to the young demonstrator from Illuin for providing a demonstration despite a nearby sound system). We can’t name them all but thanks to Airbyte (the welcome from John and Chris), Actian (for their delicious fruit juices and the welcome from Vincent), Prisme.AI (and the welcome from Bertrand), ThoughtSpot , Kairntech (and the welcome from Vincent and the team), Lettria, SnapLogic, ToucanToco (for its nice pot…), Data Galaxy etc, etc.
This year, the show was really divided into two main areas: the “data platform” providers area covering providers of solutions for data meshes, lakehouses, and other data warehouses. And the generative AI center. Note, on both poles, the same trend… no one codes anymore because the offers offer LOW CODE / NO CODE which to generate data ingestion or processing flows – which to consume data in various shapes. What we love about life is that it’s an eternal beginning. We have spent almost a decade structuring the flows and management of Data but will now push myriads of small flows or LOW CODE / NO CODE analyzes to introduce democratization but also a little chaos into our Data universe. Chaos that we may spend the next decade structuring. Moreover, given the average age on the stands, the IT and HPC/Big Data veterans (Philippe from HPDIA will not deny me), we say to ourselves that it is undoubtedly also a new generation tech that has blossomed and that it is also a generational change, including within the Hurence teams, that we are witnessing.
On the “Data platforms” side, almost no actor talks about data lakes (or even data hubs). Concepts are perceived as “has-been” through marketing discourse in favor of notions of Data Mesh. But… gradually this discourse is evolving. The death of the Data Lake was announced prematurely and this threw many decision-makers into doubt. However, there are technical realities in processing large volumes that no one can ignore because our use cases confront us with them quite quickly and quite violently. Our Data Mesh solution providers have had to adjust their presentations. From a miracle solution or architecture, the data mesh becomes more of a federated data governance approach on decentralized data sources than a technological solution for managing these same decentralized data sources… Indeed, the reality is stubborn and we have always need to “consume” volumetric data to calculate our models (AI or not) based on the crossing of business sources. We therefore always need to dump them somewhere and carry out transformations from one business universe to another… This need therefore for what we call a data lake (whatever name we choose to give it ) teaches us again that we do not process terabytes of data just by typing on business data APIs, which are generally incapable, often due to the back-ends behind, of holding up the load of the bombardments that our processing does Parallelized Big Data… But the idea of democratization of data and the idea of having data consumption APIs and a real contract for the supply of “clean” data obviously have their value. Just a few possible alerts from our experiences and confirmed by our customers met at the show and having experienced the paradigm for themselves:
- It turns out that many businesses do not want to take responsibility for managing the exposure of their data via APIs that we hope to make them manage in a Data Mesh approach. Because that’s simply not their job! Data Mesh theory advocates a logical and clean division of data domains which is, in itself, a good idea and a necessary knowledge representation. On the other hand, decentralized physical division is not self-evident, neither from a cultural point of view, nor from a technical point of view. The management of exposed data will probably be done by a “specialized profession” and according to methods that are specific to the context… as before. Even if in fact tools and APIs for democratizing consumption must undoubtedly be provided where the data is exposed.
- No one yet dares to put the notion of Data Lake back into their “slideware” architecture even if the discourse is evolving. Among some presenters who are a little less in the artistic vagueness, S3 storage is mentioned half-heartedly on a low-level base, as if it were purely anecdotal. Indeed, one of the consumers of data and one of the “business” suppliers of data is our Data Lake – S3. Because to calculate an LLM – Large Language Model – to obtain a specialized chatGPT for our company, we need a place where we deposit all the “business” sources necessary for the vectorization and then the calculation of this wonderful LLM. And this “professional” place will also be a data provider for all professions who want to use this LLM in their context. Reality therefore dies hard. Data Mesh is not an architecture, it is a virtuous approach to exposing the data of each business via API – period. And the Data Lake, whatever you call it, is neither has-been nor useless. It corresponds to a need to bring together parts of this data in a common base to ensure processing performance on large volumes and for so-called “data professions – those who create higher level data – which give a “macro” meaning from the more basic “job” data. And that too is a job that cannot be delegated to a federation of entities!
On the Generative AI side, this is an appetite never seen before by companies and which is reflected in the offers of all the players. No stand forgot to put “generative AI” in its communication and whatever the offer proposed. We now have generative AI to automatically create dashboards, generative AI to generate data flow code,… they are essential everywhere – and tools to create them are flourishing everywhere. They are also often the underlying technologies of LOW CODE, NO CODE. The entire show feels that this is a turning point in IT, a real revolution such as Google’s search engine was several decades ago. A real revolution in human-machine interaction that any application will have to integrate.