HelloFresh Journey to the Data Mesh

Published in

HelloTech

11 min readOct 20, 2021

by Clemence W. Chee and Christoph Sawade

Background

HelloFresh is one of the most popular meal kits globally, delivering step-by-step recipes and fresh, affordable, pre-portioned ingredients right to customers’ doors. We are a fast-growing company with a mission to change the way people eat, serving 16 markets across three continents. From our start in 2011, data has been critical to HelloFresh’s success. For example, data scientists use analytics and forecasting models to anticipate customer demand and ensure smooth procurement processes for just-in-time deliveries without accumulating costly perishable overstock.

By 2015, HelloFresh had evolved what amounted to a classical centralized data management setup, which had grown organically over the years. There were a small number of internal and external sources and a centralized BI team with a handful of people producing executive reports to steer the company’s performance. Knowledge transfer happened through face-to-face conversations, and the people in the data warehouse team were considered the ETL wizards indicating the siloed understanding of data management. A centralized data warehouse team responsible for all domains gave HelloFresh the flexibility to grow fast as a company.

When HelloFresh went public in 2017, the company grew immensely. The tech department increased from approximately 40 to almost 300 engineers. Our business units also grew substantially as we continuously launched HelloFresh in new countries, opened new brands, and acquired other brands like Everyplate, GreenChef, and Factor. With that growth, the number of data requests and complexity increased up to the point that data engineers required deep domain understanding and suffered from context switching due to reprioritization across the business. In short, the central function became a bottleneck and slowed down innovation. As a result, analytics teams in business domains like marketing and supply chain management started to build their own data solutions. The data pipelines did not meet engineering standards and increased the central team’s maintenance and support. Over time, the knowledge about those pipelines left the company, and as a result, most of the data assets turned into debt, with tacitly decreasing data quality. That ultimately led to an increasing lack of trust in data.

In 2019, the company identified data as a key strategic asset that gives a unique competitive advantage by supporting and fully automating decision-making across our value chain. HelloFresh’s ultimate goal is to build data products that have a purpose and are provided as trustworthy assets to the rest of the organization with the best customer experience so that users can quickly discover, understand, and securely access high-quality data. As a result, we kicked off a phased reorganization to unlock analytical data at scale. Around the same time in May 2019, Zhamak Dehghani published a popular article about the data mesh concept stating four pillars of a modern distributed architecture.

Domain-oriented decentralized data ownership and architecture
Data is a product mindset
Self-serve data infrastructure as a platform, and
Federated computational governance.

The journey towards a data mesh requires a new approach to data management and data infrastructure. This blog post describes our journey of implementing those pillars and the practical challenges we have faced.

Phase 1: Inverse Conway Maneuver

In the middle of 2019, the continuously growing number and complexity of data requests with the outdated technical real estate ultimately led to a climax of what the team could afford. We kicked off an “embedded data engineering” program as a first attempt to give people more focus. The idea was to have a point person from the central data team responsible for each major domain. These specialists spent half of their time on the domain work prioritized by the business leads. The other half of the time was meant to contribute to the platform itself — reducing technical debt and contributing tooling that can be reused. After a couple of months, the learning was blatantly obvious: There is more work for a partly-embedded data engineer to be done besides accomplishing feature requests, improving the data pipelines to assure reliable data, and transferring knowledge. Unfortunately, that trapped us in a vicious cycle because of the lack of standardization, uncertainty around ownership, and constant firefighting.

At the beginning of 2020, we assessed the data organization and created an intermediate data strategy driving a systemic and cultural change to realize the total value of our data assets. The key pillars of the intermediate data strategy were motivated by the data mesh principles. Following the Inverse Conway Maneuver, we started to establish decentralized, end-to-end domain teams reflecting the pillars of the HelloFresh business to remove unintended friction. Specifically,

Cross-functional product and analytics teams are empowered with tools, capability, and knowledge to own their data assets and develop their data products.
A data platform team reduces domain-agnostic complexity by providing infrastructure, tools, and education on best practices, reducing the time to insights, and allowing domain teams to focus on building data products.
A governance team creates global standardization and decisions to maintain interoperability, observability and leverage the data potential.
Data chapters support the needs of all HelloFresh data professionals, including hiring, onboarding, performance reviews, definition and evangelism of standards and best practices, and knowledge sharing.

Representation of DataMesh Organisation in HelloFresh 2021: Distributed teams own and develop domain-specific data products supported by a self-serve platform and a global governance body.

Phase 2: Tackling the Devil in the Details

This section discusses the challenges we have faced with implementing the cornerstones of a data mesh organization.

Challenge #1: Knowledge Gap

Data products are software and thus require engineers to build and own them. As a prerequisite of handing over data ownership, domain teams had to be equipped with data engineers to develop and change data pipelines following best practices and standards. The distributed model does not decrease the need for people. In a centralized model, the high workload resulted in deprioritization and high mental capacity. In contrast, the decentralized model reduces team cognitive load and thus optimizes flow (see team topologies). Every domain requires dedicated people who can take ownership of the domain-specific data. To bootstrap the individual teams, we needed senior data engineers in each domain, drastically increasing the number of open headcounts. We have been lucky to have hired strong senior people supporting and evolving the technical vision while realizing data products for the core domains. Additionally, we staff teams with contractors and consultants while we hire for all the open positions. We still have a large number of exciting roles across several geographies and roles.

As the business environments become more complex and interconnected, the solution was to strike a better balance between centralization and decentralization, which had never been articulated before. Our goal was to create a more agile organization in which domain teams had the right mindset for data and the right capability to take ownership of their data product. Thus, we moved from central homogeneous teams to heterogeneous teams composed of all data professionals required to build end-to-end data products. However, the team lead in the business domains now focuses on driving “what gets done” as a value-creation manager. To drive the “how work gets done”, we created data chapters and split the classical management hierarchy into two separate, parallel lines of equal authority and accountability. The chapters support performance management, talent acquisition, the definition of career paths, and knowledge sharing. Other key responsibilities are providing standard tools, methods, and systems to enable efficiency, practice sharing, and effectiveness on a global scale.

In addition to hiring new talent and creating an environment to foster knowledge sharing, we saw the opportunity to drive the changes for our journey through a data literacy program and investing in our people’s overall data capabilities. As a result, we kicked off a Data Literacy Program in 2020, supporting the needs of HelloFresh employees (see blogpost). The effort started as a small experiment for analysts to build better dashboards and turned into a complete data literacy program, recognizing data as an integral part of our transformation journey. Following a non-invasive way of upskilling through onboarding, continuous learning, gamified learning paths (earning certifications and badges), we could teach and implement new procedures and processes much faster simultaneously across the business. As a result, we achieve better data-driven decisions and business growth while going through a digital transformation. From the initial data consumer and analyst-focused program, we expanded data-literacy content. We applied the distribution across all communication and knowledge exchange channels to lay the foundation for new data culture suiting the needs of a data-mesh. We invested in three different layers:

Continuous engagement across the board, e.g., data-literacy workshops and open sessions around how to build data product roadmaps, KPI fundamentals, data-driven storytelling, dashboard polishing, and data visualization and design tricks;
One-off initiatives as, e.g., our globally organized hackathon, which challenged cross-markets teams to work on a “standard data quality dashboard” (see blogpost);
Continuous engagement for focused user groups, e.g., our global Tableau Community (see video from Tableau LIVE Europe 2021).

The ultimate key to unlocking business value is the data understanding of each employee.

Challenge #2: Technical Debt

A data mesh is an architecture of scale. Since companies do not start big but grow organically, adapting to the different business requirements in each stage, the organization is transitioning to a data mesh and not implementing it in a green field. HelloFresh came to a point where we lost trust in our data, and innovation slowed down, which triggered the need for a more scalable setup. In addition, the transition implied dealing with the technical debt accumulated over the years.

Outdated Data Model: Core assets were generated using a collection of in-house tools that require highly specialized knowledge and significant time investment to fix when any piece of the system fails. Other assets are assembled via scheduled queries, which has led to an enormous uptick of contributions from users throughout the organization but has increased the maintenance cost as these queries contend for resources with the core data stack. Other pain points comprise:

Re-implementation of existing business logic led to data inconsistency and unnecessary complexity;
The non-standard approach required a lot of specialized knowledge and slowed down hand-over to domain owners;
Strong coupling and missing tracking of lineage prevented reusability and data consistency;
Missing data quality checks and ownership led to untrustworthy data assets.

Outdated Data Infrastructure: The underlying platform was built on top of a single cluster for ingesting, storing, processing, and accessing data, and the inelastic nature of this infrastructure was no longer able to support increasing demand. The performance has deteriorated significantly over 2020 despite several measures that attempted to stabilize the cluster. The technical and conceptual state of our data landscape did not allow a lift and shift approach. The transition to any modern platform introduces breaking changes. Each data pipeline has to go through fundamental architectural and code modifications to leverage the full advantage of the platform’s benefits and be cost-effective. As we progressed with modernizing data assets in each of the domains, there was a requirement to invest in the existing environment to ensure continuity of service.

We spend a significant amount of time simplifying and stabilizing the old tech debt, creating a vision of the future, and proactively driving modernization (instead of lift and shift). Since most of our reports still and will depend on old data pipelines, we consciously invested in upgrading, optimizing, and supporting our legacy infrastructure.

Challenge #3: Lack of Data Product Thinking

If data is a precious asset, it sounds irrefutable that it should be treated as a product. But what makes data a product? Following the first rule of Product Thinking 101 (Naren Katakam, April 2020), the starting point of a product is the problem, not the solution. That missing mindset brought us to the data spaghetti; data assets were developed in silos and continuously patched to fulfill ad-hoc requests without understanding the actual need of the stakeholders. At HelloFresh, we define a data product as a solution to a customer problem that delivers maximum value to the business and leverages the power of data. Examples at HelloFresh range from dashboards to monitor our error rates or recipe recommendations. Treating data as a product means providing it as a trustworthy asset to the rest of the organization with the best customer experience by making it discoverable, interoperable, addressable, and self-describing. In addition, we maximize trust in our data products by ensuring the quality and security of our data assets. To implement that goal, we introduced the role of a Data Product Manager to envision, design, and establish data product thinking as a core part of our business.

The starting point is ownership. In the past, the central team was in charge of all data business requests, i.e., building datasets and pipelines, investigating and fixing bugs, maintaining data collections (backfills, making sense of the business logic,..). This configuration was not scalable and was the primary motivation behind equipping the analytics teams with full-stack capabilities to take ownership of the business data in mid-2020. Ownership means to be responsible for hosting and serving data assets to a minimum acceptable standard and minimizing dependencies up/downstream. Under that definition, the process was not exclusively about transferring ownership, but in most cases, establishing ownership. A transition (erroneously) suggests that the datasets were owned before. In reality, the data platform team could make changes to the data pipelines. Still, the analytics teams always had the business understanding and use cases behind them. We had to go through the lengthy process to establish ownership since data pipelines had not been appropriately documented, were unstable, and were built inconsistently over the years from generations of people who had already left the company. We started with a document aiming to assign dataset ownership, had several technical onboarding sessions, and asked the teams to run an initial assessment to develop a roadmap to modernize the data assets of their domain.

Like in any other product or service that we offer to our customers, we have to guarantee to our consumers that the data is provided on time, complete, and accurately reflects the business. Service Level Agreements (SLAs), Service Level Objectives (SLOs), and Service Level Indicators (SLIs) respectively represent the promises we make to our consumers. These internal objectives help us keep those promises and the trackable measurements that tell us how we’re doing (see, e.g., SLA vs. SLO vs. SLI: What’s the difference?). We established and clearly defined ownership and data quality dimensions, including monitored SLAs as part of the data asset certification process and investing in self-serve tooling to observe a minimum set of requirements throughout the data life cycle. To support the move from data as a service to data as a product, we invest in automation and apply the same rigor and best practices well known from engineering to data. The central data platform implements those best practices into the self-serve tooling, e.g., anonymization of personally identifiable data (PII), configurations for data quality monitoring and alerting, and automatic updates of documentation and lineage in the data catalog.

Phase 3: Preparing for the Future

While we have established the basic principles of a data mesh to only a handful of teams throughout the organization, we are currently in the beginning stages of rolling out the model to the entire company. As we continue the roll-out, we apply the learnings so far and focus on

Fostering data product ownership,
Setting up teams with the right capabilities,
Increasing data literacy for our employees,
Modernizing data assets across domains,
Investing in self-serve tooling.

We recently presented our journey in the Global Data Mesh Learning Meetup (recording) and the closing session of the AWS Startups Showcase series (recording). We talked about the data history of HelloFresh, examples of how self-serve platforms are used as an enabler for domain data teams, exemplary data products in our supply chain domain and experimentation, and our experience with data literacy. If you are interested in more details, please watch the recordings.