Always check your Lag when writing a Kafka consumer.
I've been working on some code for the last 2 months and I started as I typically do by writing my code and then deploying it in Fargate. This particular code was a Kafka consumer which made several api calls to enrich some data then made a final call to a third party api. The total process duration was just a a little over a second which I thought was fine for the amount of traffic I was receiving.
Wrong! After a week of running this, I started noticing I was nearly a day behind when processing the data. I went back and looked at our traffic and noticed we had a busy couple of days and had more traffic than I expected. I started digging into things and noticed my Kafka lag was nearly 25,000 so I ended up changing my code to read from Kafka and write to SQS where I had a Lambda(via event sourcing) do my processing.
On a side note, my throughput also went through the roof because I could have concurrent Lambda's writing to my third party api.
Lesson Learned: Always be cognizant of the lag when writing a Kafka consumer(or any streaming consumer) when designing your architecture and writing your code.