Technology applied to Supply Chain Management

Published in

HelloTech

8 min readJun 21, 2022

By Ignacio Julve

This is the second instalment of a series of articles focused on Supply Chain Management (SCM) Technology at HelloFresh. You can read the first one here.

If you are not familiar with HelloFresh and the Engineering organisation, called HelloTech, you can read about it here.

Overview

As a rule of thumb, we mostly create services written in Go or Kotlin and deploy them to Kubernetes clusters. We leverage Kafka as a first tier for data orchestration. Our code is hosted in Github.

Let’s look at some of these in detail.

Services

When we wrote the first services for SCM Tech, we used procedural languages and stored data in PostgreSQL relational databases. Many things have changed since then, and while you can still find a few of these legacy services around, we are deprecating them.

Nowadays, we write most of the services in Golang or Kotlin. In general, most of our services expose a REST API that we specify with OpenAPI. In some cases, we are experimenting with gRPC, but it is not yet well spread. We rely on either JSON or protobuf for inter-service communication and Kafka is where we publish our events.

We follow a publish and subscribe architecture model, also known as pub/sub. We leverage the flexibility of that paradigm so that we have multiple subscribers receiving data in a completely asynchronous manner. That way, our system is flexible, robust and optimised for throughput, resilience and scalability. Having broadcast events becomes trivial as opposed to the complexity we had when we were broadcasting messages using synchronous requests to multiple REST APIs, for example.

For the Kotlin services, we use a different set of technologies and frameworks depending on different factors like the exact use cases that we need to solve, the experience of the team and some other technical limitations that might affect the decision. In any case, we are following a Pub/Sub architecture using Kafka and we try to follow the KISS principle.

For our HTTP Servers, we generally have Ktor and Spring Boot. For our HTTP Client retrofit2 and okhttp3 in many cases. Because we use Kafka we are using either Kafka libraries directly or Kafka Streams for async inter service communication.

In cases where we need to use a Database, we favour using jOOQ and PostgreSQL.

A few teams are also using Redis for caching.

We have services written in Golang, as well as some specific tools that use Python, Spark and Airflow, we might explore these in a future article.

Releasing Software

As we keep our code in GitHub and we use GitHub Actions for building, testing and deploying. This is a recent change in our company, in fact, we are still rolling it out and we will only be done by the end of 2022.

From a technical point of view, since we practice Infrastructure as Code, and we have all of our infra coded in Terraform, we use our CI/CD pipelines to ensure our software changes work well with our infrastructure changes, and our aim is to build, test and deploy to production automatically after every merge.

Most of our teams follow trunk-based-development, where the target is that our main branch is always in a releasable state, meaning that, after every pull request is merged, we trigger a deployment pipeline that will run all tests, including automated end-to-end integration tests in an isolated environment, deploying the code after that to staging and later to production. The end goal is to follow a Continuous Deployment model.

As we deploy changes to services used across multiple markets, in different time zones and with particularities (sometimes legal restrictions), we make heavy use of feature flags. This development technique lets us tackle the complexity problem of multiple markets while letting us deploy globally and giving the responsibility of handling the configuration of different features to the Product Owner, at the same time we can run experiments in production without affecting all users. In addition, we make the communication loop between stakeholders and the service itself, as small as possible. Fine Tuning some of the features of the services is no longer something that requires the development team.

Data Challenge

If you want to understand how Hellofresh handles data and read more about our journey to the data mesh, please check it here.

Nowadays, most services consume from topics or publish events to Kafka, but often still have databases in which they store data for later consumption, traceability, logging or other internal usage that does not require or benefit from any of the features provided by the use of a pub/sub solution.

In a nutshell, we favour solutions based on Apache Kafka, but we still rely on relational databases when it makes sense.

We try to use “the right tool for the right problem” and while we do not always choose the right one on the first try, we learn from our mistakes, fix and adapt.

User Interface

The vast majority of the services written by the SCM Tech teams are doing pure machine-to-machine communication. Still, a handful is used by our employees through different sites or applications integrated into our tools ecosystem.

We started building UIs systematically in SCM Tech more than five years ago. We tried different frameworks, libraries, and languages during this time and were evolving along with the industry. The backbone is the following:

To deploy many apps (30+ at the moment of writing) in production, we use so-called micro-frontends or fragments (as we call them in Hellofresh). One fragment is based on a web server responding for initial page rendering, authorization, and delivery of bundles. Usually, one fragment belongs to one domain and contains multiple applications. Currently, we use Next.js to build them.
The main part of each app is on the client-side. Traditionally, we use React, Redux, and many other additional libraries.
To facilitate app development and provide coherent UX, we maintain our design system based on the Material UI.
We use TypeScript as our default language.
We value test automation and have all kinds of tests, starting from unit tests and ending with E2E tests with Cypress.

We are constantly exploring how to improve our UI/UX and we have recently done a rework of our design system, we might explore that in a future article.

Ways of Working

Innovation

Learning and experimenting is at the core of what we do at HelloFresh. We are always trying to find better and faster ways to do what we do, that is why we research and test new things frequently.

For example, quite recently, a team faced with the challenge of consuming multiple Kafka topics that require post-processing consumption and aggregation of correlated information, decided to split their services into multiple smaller subcomponents for better scalability, resilience and adaptability. This explodes the complexity of monitoring and reliability but in exchange provides better adaptability as smaller changes on one tiny area do not impact the overall service mesh. Externally, the public service interfaces are kept as they were before. We continuously learn and adapt from such experiments, and when we see things have not improved, we change and try again.

Open collaboration

We encourage sharing information, ways of working, tips and tricks and code across teams.

There are multiple forums in our organisation in which we exchange ideas, discuss and argue about different topics in an open and transparent manner so that all engineers can contribute.

In our previous article we mentioned that we follow quite closely new posts from Dave Farley. In particular, one of these videos gave rise to a very interesting discussion in one of our slack channels, and we’d like to share some of the outcomes of that discussion here so you can get a glimpse of the kind of topics we talk about.

That video talks about the use of pull requests in teams and its use in open source development and within a team in an organisation.

How organisations structure their code is a complex topic which, as we all know (and fear), is often a reflection of the communication patterns which prevail between teams (i.e. Conway’s Law). The answers to this dilemma if you will lay in two paradigms that we talk about but don’t always implement well:

Domain-driven design: Here it is important to consider the design principle of a “bounded context” in which you divide large systems into small, manageable “components” or “contexts”. These are clearly delineated; each with its own unified model and with clearly identified and maintained boundaries. The relationships between bounded contexts then become the key elements. These boundaries need clearly defined and maintained interfaces when tightly coupled and consistency of data when loosely coupled. A bounded context is simply the boundary within a domain where a particular domain model applies.
Developer communication: How developers communicate with each other is something that they need to feel personally comfortable with and is not something that can or should be dictated from above. They can, however, consider some basic guidelines: surprises, breaking changes, and opaqueness are no-gos. Stakeholders, even in highly decoupled systems matter and they generally hate surprises. Another challenge is the rabbit hole problem. When there is a problem, asking for help in the first place is a sign of a mature development team but from teams with dependencies is even more critical. In a highly effective and well-functioning team asking for help is a sign of strength and should be encouraged.

Ultimately, a Pull Request as a concept in a company like ours is used only as a manual gate for code reviews. What Dave Farley was saying was NOT that code reviews should not take place, but if you really want to do CI, then the code review needs to happen as close to the writing of code as possible, i.e. in a pair programming environment in order to reduce the feedback loop to the least amount of time possible. This, among many other benefits, is why we practice mob and pair programming actively in our squads.

Openings

Our mission in SCM Tech is to build the world’s leading, scalable, fully integrated, food supply chain management platform.

Now that you have read about how we are working to solve that, are you interested in joining our team? Please see the vacancies here.