Application monitoring on GKE
Development
Over the last year, I have been spending a lot of time with Google Cloud Platform, and the more I used it, the more I liked it. I’m considering it now part of my standard stack to deploy projects I develop. Chiefly, it started with Google Kubernetes Engine, but there is so much more.
What I especially like about GCP is how it has PaaS like qualities with the seamless integrations of the various services. For example, GKE automatically puts STDOUT logs automatically to Stackdriver Logging that one can then one-click setup to BigQuery. You also get automatic performance metrics for your containers as well in Stackdriver Monitoring. If you use STDERR correctly, you get errors in Stackdriver Errors.
One thing that wasn’t as obvious to me was how to get application monitoring going on GKE. It’s nice to know system-metrics, but more often than not, several application metrics would be very valuable to be able to chart and get alerted on. Such metrics can include high-level KPIs such as user signups and user actions, but also more lower-level ones such as queue lengths, response times or errors.
Of course one can always self-host various solutions, but as said above, what I like above GCP is that there are some great 80/20 solutions available that come at very little (time-) cost. In this case, getting application metrics turned out to be very easy:
All you have to do is you get a container to expose your relevant metrics in the Prometheus format. But instead of self-hosting your own Promethus server, you can just use the excellent prometheus-to-sd image to pull the metrics from your container and push it into stack driver. Put of of those containers in a deployment like the one linked, and you are done!
You can leverage one of the many existing prometheus exporters, or build one easily yourself. See here for an example of mine where I am exporting the queue lengths from Resque: https://github.com/nambrot/resque-prometheus-exporter
With this method, you can easily get your metrics into Stackdriver Monitoring, from which you can build nice dashboards and alerts!