apache kafka – High velocity streaming data enrichment with low velocity/slow changing data
My system consists of
- High velocity telemetry data generated by IoT devices.
- Relatively static/slow changing reference/lookup data – Alarm Rules
Each IoT device has 0 to 1 Alarm Rules. An Alarm Rule has average size of 1-2 KB.
Most Alarm Rules, once set, stay the same for weeks, months, or even a year or more.
Eventual consistency of Alarm Rules is also acceptable – if Alarm Rule is edited, it is acceptable for it to take effect in 15-30 minutes.
Question – What would be the best approach to enrich device telemetry stream with alarm rules?
Option 1 – RichAsyncFunction + in memory cache
Each time I receive a telemetry message from device, I execute RichAsyncFunction. It first checks if in memory cache has Alarm Rule. If no Alarm Rule is not found in cache, a request is sent to database. Cache items expire in 30 minutes.
Option 2 – KeyedProcessFunction + state object
Same logic as with option 1. Except instead of using in memory cache, I store Alarm Rule for each IoT device into ValueState<> and periodically refresh it using ctx.timerService().register… scheduler (what happens if this gets called multiple times? will onTimer function also get triggered multiple times or just once?).
Option 3 – CoProcessFunction/KeyedCoProcessFunction + 2 streams, one for telemetry, second for alarms
This option offers the highest throughput and lowest latency. I would consume Kafka Topic for alarm rules and update ValueState<> with the stream data.
What’s stopping me from implementing this solution is Kafka Topic message retention time. By default Kafka messages have 7 day retention time.
If I have alarm rule A for device B, and I send it to Kafka Topic, if the alarm rule A does not change over the next 7 days, the alarm rule will no longer be visible on 8th day. Basically, on 8th day, when consuming messages from device B, the system won’t see any device alarm rules.
I could increase retention time to longer period, but that does not seem like a reasonable time. Alternative would be external service that periodically emits all alarm rules to Kafka topic, say, every 6-7 days.
Read more here: Source link
