Over the past week, we've all been hearing a lot about…
How does your data get to its destination? Is it on time? Can you identify the correct data strategy for different use cases in your organization? Being able to answer these questions is a major part of designing a sound data strategy.
There are many different ways to analyze data flow; today we’re going to look at three that we’ll call “Declare,” “Desire,” and “Distribute.” You may recognize these patterns by different names, but today we’ll use these terms to help break any old patterns we might be falling into. For this exercise we’ll think in terms of communication in a large group of people. We’ll need to understand how Declare, Desire, and Distribute describe ways of communicating, and whether there are one or many Speakers, and one or many Listeners.
A Declare architecture relies on the Speaker to initiate communication. Think of this like someone making an announcement, or coming up to tell you you’re needed in the other room. Communication occurs when the Speaker knows something and wants to share. In the data world this would be called “event-driven.”
Technologies that use the Declare strategy include streaming platforms such as Kafka, and integration platforms such as MuleSoft. The focus of these technologies are speed, redundancy, and reproducibility. Integration platforms are known for their ability to connect to disparate systems and enable a common format of data. Streaming platforms are more useful for use cases with high bandwidth requirements, or the need to replay data to reprocess it for a code change.
A Desire architecture depends on the needs of the Listener. Consider reaching out to someone to ask for their thoughts on an article you are writing, or a company commissioning a focus group to drive an upcoming brand campaign.
The Desire strategy is like a traditional API, where data is requested from the source only when it is needed. This has the advantage of only pulling the required data, but may miss certain insights if the caller is not aware of all the data available.
Components taking advantage of this strategy are moving toward microservice architectures, and are frequently managed using platforms such as Kubernetes or AWS Lambdas. This allows for the creation of lightweight applications that can prioritize processing power over data storage, leveraging many small calls to quickly provide information to a client or user.
A Distribute Architecture doesn’t rely on either the Speaker or the Listener, but asks as a third party to ensure that data is synchronized between all parties. Consider a daily bulletin or newspaper, which aggregates data from many Speakers and makes it available to all Listeners, whether they have requested it or not.
The Distribute strategy is what’s at play in a more traditional batch-oriented or ETL structure, where the data is moved and copied around to all the various systems on a schedule, without regard to the creation or needs of data. This is more common in systems that rely on having the data immediately available, or where there is a single central warehouse for all data to eliminate point-to-point communication of the same data again and again.
Integration and Streaming Strategy
Hopefully at this point it is clear that Declare, Desire, and Distribute strategies are not competitors, but rather distinct parts of a holistic approach to managing your data. The specific applications and types of data you are using will determine the best combination of technologies, not to mention practical factors such as cost, latency, and environment.
The idea of Declare/Desire/Distribute strategies helps us to break down our data architecture into specific problems, which can ensure that we’re using the right tool for the job. Most technologies can be adapted to solve any given problem, tempting us to find a one-size-fits-all solution. This leads to unnecessary complexity, a higher likelihood of updates introducing breaking changes, and overall higher maintenance costs. Analyzing your data architecture in terms of these strategies can help you to use tools suited to the particular problem, preventing us from reinventing the wheel and improving the maintainability of the system.
For example, an integration in MuleSoft can enable rapid 2-way communication between key components, utilizing their APIs through native Mulesoft connectors. This integration could additionally drop data on a Kafka topic, which feeds a streaming analytics process. The results from that analysis land in a staging database, which is then collected nightly to the warehouse by an automated ETL process. Each tool fills a particular role, and enables a flexible, high bandwidth streaming strategy.
Reach out to Zirous to make the most of your data and get it to where it’s needed. Our experts are versed both in integration and streaming strategy and in technologies like Kafka, Kubernetes, AWS, MuleSoft, and more.