It’s hard to get a measure of organizational productivity. It’s often company, project, and goal-specific. But when it comes to engineering organizations, there are at least some metrics you can collect to help investigate a hypothesis or serve as red flags for further investigation. None of these is enough to diagnose success or failure, but they can be a good starting point.
Technical metrics to track in engineering organizations.
Service uptime. What’s the service uptime? The correct number depends on various factors — is it a customer-facing service? Is it in the critical path? Or is it an offline batch job?
Number of flaky tests. Do integration tests return false positives, reporting a failure despite the application functioning correctly, or have no changes been made to that area of code? These tests take significant developer focus and can slow down several critical pipelines (CI/CD, production deployments, and more).
Developer ramping time. How long does it take a new developer to run the software in a development mode locally? To set up a new machine with the necessary dependencies and configuration needed? To get the required permissions and authorization to function in their role?
Code review timing. How long does it take to get a change request reviewed by the relevant developers? Review time should be a function of change length (for the most part) and a red flag if the timing is sufficiently large.
Production deployment frequency. How often does code make it to production? It must be contextualized with the organization’s development cadence but should match the intended tempo.
Time to deployment. Given a change committed to the main branch, what’s the fastest it can show up in production? It is not always a necessary metric to track, but fixing if the delta is too long is essential. A related metric might also be CI/CD pipeline time: setup, unit tests, integration tests, and teardown. A long pipeline can complicate development.
Code reviews (per developer). A number that can easily be gamed is probably only relevant within teams and product groups. Most developers are expected to contribute code reviews (both peer and downward), and it can arguably be helpful for junior developers to contribute code reviews. It is mostly only a concern if a developer or team has a statistically significant deviation or two from the norm.
Lines of code written/deleted. A controversial one, but a measure I think can be helpful to at the extremes and when taken in context. Earlier in their career, developers should be expected to ship at least some code. Any developer or product team not writing any code should be a cause for further investigation. The more senior the developer, the more this rule falls apart and fails to work (senior developers' responsibilities might differ). Code deleted can be equally important.
Pull requests. Another controversial metric. A large number doesn’t necessarily correlate to quality, but a small number (or zero) can be an opportunity for improvement. Even for long-term projects and features, it’s better to get code reviewed and incrementally merged earlier in the cycle rather than later.