ProtoMock: Simple Kafka Testing by Generating Mock Data from Protobuf Schemas

Published in

HelloTech

3 min readSep 11, 2023

Here at HelloFresh, we use Kafka together with protobuf schemas extensively to fulfil our various supply chain use-cases. Hence, in order to stay agile, we needed a way to test effectively while being decoupled from production and staging clusters. By taking a “local-first” approach to testing our Kafka applications, developers can focus on actual development and not have to worry about integration with infrastructure to maximize agility.

ProtoMock

Enter ProtoMock: a CLI tool built in-house using Node.js that takes protobuf schemas and generates mock data for each field based on its defined data type. It can be configured to produce large amounts of data, making it a viable tool for load testing in addition to acceptance testing. Once a specified amount of messages have been generated, ProtoMock will then produce the protobuf encoded mock data, to a specified Kafka broker, most commonly a local Kafka instance hosted via Docker.

ProtoMock Demo

By using ProtoMock, a developer can not only test their consumer applications without relying on the quality of data in a production cluster, it completely eliminates the need for a deployment in order to test a Kafka application, significantly speeding up the development feedback loop.

Defining foreign key relationships

In some cases, an app could be using data from multiple topics that are closely related (e.g. a customer order and its corresponding delivery information). ProtoMock allows users to define these relationships which will then produce mock data that preserve these foreign key relationships.

Mock Data with Foreign Key Relationships

Producing mocked production-like data

In many cases, the format of a string field is important for testing business rules (i.e. SKU codes, emails, etc.). In this case, generating a random string would not be very helpful.

To deal with this, ProtoMock can also be provided with a file to sample from instead of generating a random string. This file can either be manually populated by the user or ProtoMock can generate a sample file by connecting to a production cluster.

Samples file, generated via ProtoMock (left) used to generate mock data (right)

Implementation

ProtoMock uses protobufjs and KafkaJS libraries internally for generating and producing mock data respectively. Once a user provides ProtoMock with a directory of protobuf schemas, ProtoMock uses protobufjs to recursively crawl through imported schemas to obtain the “leaf” nodes for each schema. Leaf nodes are any of the supported protobuf primitive data types. Once the leaf nodes have been identified, based on its specified data type, random data is generated for the fields and encoded into the protobuf format before producing it to a user-specified Kafka broker. If a foreign key relationship is enabled, then the generated mock data is cached for later use.

Summary

ProtoMock is an easy-to-use tool that requires minimal set up and is able to replicate production-like data and workloads, allowing developers to focus on development and testing their Kafka applications locally with no external dependencies, significantly increasing our developers’ agility.