The Data Lake Engineering team was established.
A first technical concept of Data Platform architecture was created. The concept was based on building data lake infrastructure on Azure cloud platform using native managed data services and Kylo as an data ingestion mechanism.
The technical concept was redesigned due to the decision of moving from Azure infrastructure to AWS infrastructure.
The new concept is based on using Apache NiFi as self-service data integration tool and AWS reference data lake architecture based on API Gateway, S3, Lambda Functions, Spark on EMR, Glue and Athena.
The first delivery is planned for Q3 2019.
The first use case was agreed. It is one of Machine Learning use cases currently worked on.
The team works closely with CISO in order to establish best practices and pragmatic solutions in area of data security and governance, system security and code security.
Adriana Firu-Tanasie (Engineer) Adriana.Firu-Tanasie@haufe-lexware.com, Drilon Konijufca (Engineer) Drilon.Konjufca@haufe-lexware.com, Jerzy Kott (Architect and Product Owner) Jerzy.Kott@haufe-lexware.com, Thomas Kotsch (PM) Thomas.Kotsch@haufe-lexware.com
Data platform is a type of IT solution that combines the features and capabilities of several data application and sources across a company's entire data landscape within a single solution. It is a platform that enables organization developing, deploying, operating and managing a data infrastructure /environment. Data platform generally consists of storage, servers, databases, big data solutions, BI/Analytics and data management utilities. It also supports integration with other systems.
Remove data silos by creating common Data Platform integrating all valuable data sources and data engines across company.
Enable scalable, agile and future-proof company-wide data architecture and ecosystem
Enable data democratisation and data-driven decisions by implementing self-service BI/Analytics/Reporting
Enrich products with embedded BI/Analytic/Reporting and analytics-driven functionality
Enable data-driven product development
Provides discovery and fast prototyping environment for data-driven experimentation and innovation.
Lowers maintenance costs and workload by reducing the complexity of multiple data integration vendors/solutions into a one cohesive solution
Removes direct dependences between data systems and applications ("decoupling"), allowing more flexible and agile new products and solutions development.
Removes currently existing bottlenecks for analysing, cleansing and delivering data.
Currently concept is under discussion with Data Lake being preferred solution.
Data Lake - centralised repository of enterprise data that can meet the needs of various application workloads that gain efficiency through the reusability and consistency of data.