Startup Resources: Monitoring Tools
Learn how to keep track all of your server metrics, business metrics, and alerts. See the "Software Delivery" chapter in Part II, Technologies for more info.
- Logging Tools
- Availability Metrics
- Business Metrics
- Application Metrics
- Process Monitoring
- Code Metrics
- Server Metrics
- Alerting
- Further Reading
These startup resources are based on the book Hello, Startup: A Programmer's Guide to Building Products, Technologies, and Teams by Yevgeniy Brikman. These resources are a work in a progress. They are also open source, so you can add your contributions by submitting a pull request to the Hello, Startup GitHub Repository. To see how these resources fit into the bigger picture, check out the The Startup Checklist, which is a comprehensive collection of everything you need to do to launch a startup.
Logging Tools
Logging is your first layer of monitoring. Make sure you understand log levels, log formats, and log aggregation.
logstash
http://logstash.net/logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use (like, for searching). Speaking of searching, logstash comes with a web interface for searching and drilling into all of your logs. It is fully free and fully open source.
Availability Metrics
The most basic metric: can a user access your site / product or not?
Business Metrics
Tools to monitor what are your users are actually doing in the product. These are the metrics the CEO and product team look at.
Application Metrics
Tools to monitor what your application code, both on the server-side (QPS, latency, through put, error counts) and on the client-side (load time, payload size, crashes).
boomerang
http://www.lognormal.com/boomerang/doc/boomerang is an open source piece of javascript that you add to your web pages, where it measures the performance of your website from your end user's point of view. It has the ability to send this data back to your server for further analysis. With boomerang, you find out exactly how fast your users think your site is.
CoScale
http://www.coscale.com/CoScale provides full stack web performance monitoring, combining server and application metrics, page load times, and custom metrics and events. CoScale simplifies monitoring and troubleshooting with automated anomaly detection and contextual insights, so you can act proactively on performance changes that impact your business.
Process Monitoring
Tools to keep processes up and running after a crash or reboot.
Code Metrics
Tools to measure code coverage, bug counts, lines of code, etc.
Server Metrics
Tools to measure how the hardware is doing: CPU usage, memory usage, hard drive usage, and network traffic.
Alerting
Tools to alert you when your metrics are out of line.
Further Reading
More reading on monitoring and metrics
Agility Requires Safety
https://www.ybrikman.com/writing/2016/02/14/agility-requires-safety/To go faster in a car, you need not only a powerful engine, but also safety mechanisms like brakes, air bags, and seat belts. This is a talk that discusses the safety mechanisms that allow you to build software faster.