What is the EDITO-Infra Data Lake? A Peek into its Marine Data Reservoir

In today’s digital age, with the expanding volume of data generated every second, the challenge isn’t just collecting data, but storing, managing, and analyzing it efficiently. This is where the concept of a “Data Lake” comes into play. 

What is the EDITO-Infra Data Lake?

The EDITO-Infra data lake will be an almost homogeneous collection of cloud-ready data origination from EMODnet and CMEMS to be used immediately with little to no file or format conversion. The lake relies on nearby compute resources to do the necessary analysis depending on the particular business need. It trades the sophistication of a database for greater flexibility in data analysis and processing of big data, leveraging the Cloud technologies of high speed and scalable storage and low cost on demand compute. 

Data lakes are designed to provide fast and scalable storage of vast amounts of data, making it accessible for different applications, from big data analytics to machine learning. Their architecture is optimized for the rapid ingestion of different types of data and the speedy retrieval necessary for analytical purposes. 

Diving into EDITO-Infra's Data Lake

The European Commission has mandated the primary operators of Copernicus Marine and EMODnet (MOi and VLIZ, respectively) to collaboratively design the core infrastructure for the forthcoming EU Digital Twin of the Ocean. EDITO-Infra will serve as an EU public infrastructure enabling the creation and operation of a wealth of local digital twins and innovative applications, catering to a diverse range of users. As a core component of the infrastructure, EDITO-Infra is stepping up the marine data game by creating one of the most innovative data lakes in the marine data sector, integrating key service components of Copernicus Marine and EMODnet

SOME FACTS

  • Size: The EDITO-Infra data lake will handle terabytes of data of scientific quality, making it a unique marine data resource.
  • Storage capacity: When it comes to storage, EDITO-Infra’s data lake will be a robust and scalable cloud solution. It is designed to scale on demand to provide storage capacity when it is needed, ensuring that it can accommodate the ever-growing influx of data without compromising on retrieval speeds or data integrity.
  • Performance: EDITO is a cloud platform and offers the security, scalability, and performance of the underlying public cloud open-source technologies.
  • Connection to Earth Observation Data Stores & HPC resources: One of the unique features of EDITO-Infra’s data lake is that it is linked to Geant, Europe’s high-speed research network, which enables high speed connections to the other European big data assets and HPC centers. This integration allows EDITO to tap into large Earth Observation data archive, providing invaluable insights for industries ranging from agriculture and forestry to urban planning and environmental conservation. It also aims to be fully interoperable with the EU Destination Earth (DestinE)

EDITO-Infra is not just storing Ocean data, but empowering users to derive actionable insights from it, paving the way for data-driven decision-making in diverse sectors. 

Amidst the data streams that EDITO’s lake will accommodate, a notable mention is the inclusion of new biological data streams, i.e., the recently kicked off DTO-BioFlow project where, amongst others, eDNA and tracking data will flow to the EDITO-Infra data lake. These streams are set to transform the way researchers, policymakers, and other relevant users and stakeholders perceive and engage with the oceanic ecosystem. 

The development of EDITO-Infra’s data lake is a testament to Europe’s vision of a digital, data-rich future. As a cornerstone of the EU Digital Twin Ocean initiative, this data lake promises to revolutionize marine research, policymaking, and business strategies. With its state-of-the-art hosting platform, and connections to Earth Observation data, the EDITO data lake is not just about sharing Ocean data; it’s about harnessing it for a better future. 

EDITO-Infra: The backbone of the European Digital Twin of the Ocean