Always check your Lag when writing a Kafka consumer.

I've been working on some code for the last 2 months and I started as I typically do by writing my code and then deploying it in Fargate.  This particular code was a Kafka consumer which made several api calls to enrich some data then made a final call to a third party api.  The total process duration was just a a little over a second which I thought was fine for the amount of traffic I was receiving.

Wrong!  After a week of running this, I started noticing I was nearly a day behind when processing the data.  I went back and looked at our traffic and noticed we had a busy couple of days and had more traffic than I expected.  I started digging into things and noticed my Kafka lag was nearly 25,000 so I ended up changing my code to read from Kafka and write to SQS where I had a Lambda(via event sourcing) do my processing.

On a side note, my throughput also went through the roof because I could have concurrent Lambda's writing to my third party api.  

Lesson Learned: Always be cognizant of the lag when writing a Kafka consumer(or any streaming consumer) when designing your architecture and writing your code.