What is a modern Big Data and AI platform?
Infrastructures for Big Data and AI have “fundamental” hardware and software bases that allow processing of large volumes.
The first base is distributed storage (i.e. the data is distributed across several servers / attached disks) thus allowing parallelized calculations.
For these potentially long calculations, we will sometimes also resort to hardware acceleration. The obvious hardware acceleration is storage thanks to larger amounts of RAM than on standard servers. But we will also choose a few servers equipped with GPUs (yes the same graphics cards that are also used for video games!).
These GPUs have components dedicated to matrix calculations (in the mathematical sense of matrix multiplications) and therefore save considerable time in training AI models such as neural networks including the famous LLMs of ChatGPT like AIs (which are mostly matrix calculations). And then we will find a whole battery of tools for data management (catalogs, etc.), for their injection and standardization, for their visualization, etc. These tools are always the same, essential and they have different names depending on the players you choose to deploy your infrastructure or if you choose to build a “bare metal” infrastructure in your data center, in which case you can equip yourself with tools of the open source world – these same tools that are often also found in the clouds because they are de facto standards.
The diagrams of our standard architectures in the main clouds and outside is given at this URL: Build modern Data platforms.
However, don’t forget that you are all unique and that Hurence will be able to design tailor-made infrastructures and select from the offer what fits your needs.