Kafka is an open-source framework for streaming, processing, exchanging and storing data. A publish-subscribe based messaging system, Kafka-based applications have many applications and use cases. These include analytics consumption, system maintenance, real-time services, data matching, and more. However, Kafka Schemas don't have an established and widely-agreed upon structure, which can lead to communication errors and services breaking. This blog post will explain this challenge and provide four solutions used by engineering teams for documenting unstructured Kafka Schemas.

The Challenge: Documenting Kafka Schemas

There are several Kafka message formats (data serializations) that can be used for structuring the message data. These include Apache Avro (The Original), Google's Protocol Buffers (aka Protobuf) and JSON Schema (annotations on top of JSON).

However, Kafka doesn't monitor which Schema structure is used, or restrict users to a certain type. Developers could just as easily send XML or plaintext over the wire. Out-of-the-box Kafka events aren't validated at all.

As a result, users can send just about any data structure to Kafka, and get an "ok" message. But what actually happened? Was the message formatted correctly and in a way that can be consumed by the service? Since Kafka doesn't enforce a payload verification, or return an error in case of bad format, the burden lies elsewhere - usually on the developers' shoulders. They are responsible for ensuring the messages were consumed accurately and that services are compatible.

This can create friction, loss of data, bugs, regressions and breaking services. Not to mention the amount of time wasted, leading to a huge decrease in productivity.

4 Solutions for Kafka Schema Documentation

Let's look at four different solutions engineering teams can implement for overcoming the Kafka Schema management challenge.

1. Encode the Schema With the Message

So how do two parties share the schema they wish to comply with? The most straightforward solution is to pass the schema definition, or in other words, the service contract, in each message. This is the simplest approach, because it makes sure anyone reading it, now and forever, will understand the structure.

However, this is an inefficient and counterproductive method. It doesn't provide a predefined contract between the consumer and producer that can be quickly understood by anyone. Instead, users have to manually add the explanation, and then the other side has to figure it out themselves. This has to take place each and every time. In addition, it bloats the message size, which could increase latency and overhead.

2. Use an External Schema Registry

On the opposite corner lie the auxiliary Schema Registry solutions. These include Confluent Schema Registry, AWS Glue Schema Registry, or homegrown solutions. In this solution, both parties agree to document the schemas in an external source. This source, the registry, defines the structure, and both parties comply with it.

For example, when receiving the message type and version as Header, and before serializing/deserializing it, the consumer will fetch the schema definition from the Schema Registry.

But even with a Schema Registry in place, we're still not 100% agile. Kafka now validates that the message was sent and successfully parsed by the consumer. But, this requires a lot of communication and agreements between both parties and upgrading to support services. This can result in inefficiency and malfunctions.

3. Fully Embrace a Schemaless Architecture

The third option is to not build a robust Schema validation as part of the Kafka cluster at all. Instead, engineering teams rely on internal definitions, and hope for the best.

The most common use case for not using a specific Schema for Kafka events is that the application data models adhere to a higher-order schema. The application itself handles everything holistically whether data is passed through Kafka or stored in a database. So even if at first this seems like a bit of a "YOLO" approach, it does make some sense.

However, one of the caveats of not using any Schema validation is that it tightens the coupling between the application and its auxiliary services (e.g. Kafka). This violates the separation of concerns needed to build true microservices. In addition, this could lead to misinterpretation of messages and errors, which might not be found out until a customer sends an email complaining about a bug.

4. Validate Kafka With Schema Contracts

No matter which strategy you choose, inferring, tracking and validating Schemas is still a challenge. This is where tools like UP9 can help, by automatically inferring schema contracts for JSON, gRPC (Protobuf), and Avro messages. Once Schema contracts have been generated, UP9 continuously monitors the Kafka topics and alerts when a message does not conform to the schema or pattern. With this applicable observability, developers and DevOps engineers can instantly know about an issue, and gain intelligence to reduce the time it takes to fix it.

Furthermore, since Kafka is a part of a bigger system, UP9 can automatically test and validate the entire business flow to prevent regressions. This helps improve engineering productivity by providing visibility, troubleshooting, testing. and mocking tools around Kafka and Cloud-native systems.

No matter which option you choose, gaining control over your schemas is important for a well architected system. Such a system will be easier to modify, test and grow. To try our UP9, sign up for free, or request a demo.