Breaking And Rebuiding KPIs

An engineer’s persepctive on KPIs and all that jazz.

Photo by Scott Graham on Unsplash

Introduction

It's that time of year again - the season of holidays, family time, year-end assessments, and those dreaded conversations with managers. It also means one thing: the season of Key Performance Indicators, or KPIs, is upon us. KPIs are essential in evaluating employee performance but can quickly become meaningless if not designed correctly. The truth is, there are many ways to get it wrong. The problem with KPIs is that they can be all-consuming, making us lose sight of the bigger picture. We've all experienced this before, whether it's obsessing over "caloric deficits" with notebooks and logarithm tables or fixating too much on the GPA in school. Ultimately, KPIs are just that - some metric we defined.

A Day In The Life Of A Developer

Engineers universally pick up tickets, estimate story points, write and test code, conduct code reviews, and participate in calls. All these tasks tie into their KPIs. Let's examine these factors before devising a way to break them.

Story Points

Story points capture the relative complexity of a ticket, measuring how tickets compare in difficulty. In its simplest form, it's often tied to the days needed to complete a ticket.

However, some argue against this. Story points measure a ticket's complexity relative to another. Once a team standardizes the definition of unit complexity, k, the time to complete a task of complexity k can be used to compare performance. Greater expertise reduces completion time. The complexity-k remains standard and independent of engineer expertise.

The decoupling of complexity definition from an engineer's expertise is what allows for comparison. Linking days needed to complete a ticket with the story points adds subjective opinion, introducing another variable: Engineer expertise. You cannot effectively compare if the measurement unit is subjective. For example, You can compare car speed to cycle using the time taken to travel a fixed distance (like complexity). Distance is independent and objective, enabling comparison.

Code Reviews

Code reviews ensure code quality and offer valuable feedback to engineers. They typically aren't included in KPIs. When they are, the assessment isn't thorough enough to fully capture an engineer's contribution to code reviews. Typical metrics include pull request approvals and comments made. Reviews fall into three categories, ranked by increasing importance and value:

Syntax Specific (Tier 3): This entry point allows engineers delving into code reviews to check syntactical elements like null checks, empty string checks, naming conventions, or anything that improves code readability. They are domain agnostic, so engineers at any expertise level can review them. "All you need is willingness."

Optimization Specific (Tier 2): Programming expertise comes into play here. It involves checking for code optimizations, database operations, indexes, blocking workflows, and other moderately advanced programming concepts. Even as a new engineer, you can look for items from this list once you overcome the challenge of reading someone else's code.

Domain Specific (Tier 1): Domain expertise and module ownership are crucial here. If you've worked on a module, you can quickly answer questions about that workflow and assess if a new pull request will cause issues. Experienced engineers typically have the most reviews of this type; the more senior you are, the broader your review scope. The number of reviews in each category can be used to evaluate the code review performance. More reviews in Tier 1 and 2 indicate greater value to the team.

Quick Calls

Let's clarify something before proceeding: Loch Ness monsters don't exist. Quick calls aren't quick. Paul Graham, Y Combinator founder, discusses this in "Maker's Schedule, Manager's Schedule". For managers, meetings are routine. For makers, they disrupt deep work where real work occurs. If a developer has two meetings an hour apart, they probably won't try to enter deep work mode. That's how developer productivity suffers. To say there should be no such calls is an overstatement, but lacking robust documentation for new engineers necessitates more calls, hurting developer productivity. They're often not reflected in KPIs—after all, how do you capture them? Overlooking them can significantly influence the incentive or disincentive for collaboration compared to other quantifiable KPIs like story points, PRs, etc.

Documentation

This is another often overlooked factor, behaviorally and from the KPI perspective. Documentation is crucial for two reasons: it can answer a question, provide more context, or defer it, thus reducing the need for another developer's involvement. Secondly, identifying patterns across modules is easier when they are listed, as our brain doesn't need to process everything simultaneously. Documentation reflects Web 2.0's asynchronous nature—you read others' work and use it to create your own. Calls, however, are synchronous; participants can't multitask. Robust documentation greatly aids team collaboration and onboarding.

Tests

Everyone believes in the importance of unit and integration tests. Yet this belief seems to be on a similar level to that in the presence of a deity. It is there, but really, it isn't. That is the only explanation for the lack of unit tests shipped with the code. Tests are necessary, not just for their end goal, but for what they signify. The systematic nature of writing tests is the first step in avoiding bugs. The point is that the process is much more important than the actual result. If you get the process right, having fewer bugs will take care of itself. If tests are not part of developer behaviour in an organization, it means the testing framework needs revisiting. Yesterday.

Breaking the KPIs.

Now that we've discussed daily life factors and their KPI coverage, let's try to break them. The strategy is to understand what KPIs incentivize and what they overlook. Typically, KPIs measure things such as story points delivered, code output, and code reviews. So, we must stay in our lane to get the maximum seemingly output without providing much real value. Let's see how:

Story Points

The standard advice to developers assigning story points to tickets is to choose what feels comfortable. This often correlates with the days needed to complete a ticket, which we'll use to our advantage. We can use the following story point in the series for the ticket's actual value. If we estimate 1, we'll choose 3; for 3, we'll pick 5. No one audits this; more is better. Most importantly, more is safer. This provides ample buffer without compromising the story point metric, allowing us to finish a 5-story point task in the time it takes to complete a 3-story point task, thus improving the metrics.

Reviews

Reviews are easy to game. Be a secondary approver: Provide approvals after someone else has reviewed the pull request and sorted out the issues. We can limit ourselves to Tier-3 or barely qualify Tier-2 comments. So we don't need to try to understand the code logic. We can repeat the same two comments on sorting imports, null check styles on all PRs, and maintain decent code review metrics.

PR/Commits

We can make numerous commits and PRs to demonstrate significant engineering involvement.

Collaboration | Documentation

KPIs don't capture peer reviews, so only our Stash and JIRA output matter. Why waste time helping others if it doesn't benefit our bottom line?

Tests

The same applies to tests. The sole metric we must monitor is that our tickets don't cause further issues. After basic testing, I can hand it off to QA/Product and trust them to ensure our code is issue-free. This way, we avoid the extra work of adding unit tests.

Redefining Metrics

With the current KPI framework, it's easier to make metrics look good without adding real value to the team. It'd be important to share how to redesign KPIs after discussing how to break them. Let's first redefine metrics to ensure they capture and encourage behaviour leading to the intended output.

Story Points

Decouple time to complete a task from the story point assignment. If other factors function, time metrics self-regulate. For defining the story points of a task, engineers should collectively assess task complexity, considering its components and required effort during sprint meetings. Clearer breakdowns yield clearer effort estimates. The collective assignment encourages discussion, enhancing clarity from an implementation standpoint.

Peer Reviews

In a world of stats and numbers, peer reviews add subjectivity—the eye test. If someone collaborates closely with other engineers, aiding task completion and team support, it can be captured here. Reviews offer engineers fresh perspectives and essential feedback. They should be anonymous while allowing engineers to respond to feedback, capturing the human elements stats missed.

Code Contribution

While it's tempting to evaluate code contribution by the number of lines, the number of commits, or even fancy measures such as Cyclomatic Complexity, all these factors should be considered with a pinch of salt. For example, when a code cleanup took ten commits because of comments received, it created additional noise. It could have been compressed in a single commit post comment addressal. Using the CQRS pattern increases the number of lines but decreases the complexity as it stays separate from the existing workflows. Package updates can make the PR look extremely complex when going the number of lines metric, but they aren't that. The takeaway is that the eye test should be given more weightage here compared to other metrics.

PR Comments

Analyzing comments manually to assess their importance is difficult. Natural language processing (NLP), a branch of AI that processes and understands text, can help. Yet, the simplest method to distinguish urgent comments (which might pose a problem) from non-critical ones is to add an identifier, such as !IMPORTANT. Comments with this tag can be quickly identified by software or database programs, clearly showing how many such comments an engineer provides. If a PR receives a comment suggesting the solution is incorrect and needs blocking, the author or reviewer can add a !BLOCKER tag, indicating it must not merge. This comment is extremely valuable, as it prevents a potential hotfix. The more an engineer provides such feedback, the more valuable and reliable they become.

Tests per PR

Tests are not just important from the perspective of preventing bugs; they provide two other benefits. One, they are a form of dynamic documentation that describes the accompanying code and are helpful during the review process. Secondly, they are important in what they signify: A methodical effort in testing the code from an engineer's perspective. This methodical approach, when perfected, helps in achieving the "fewer bugs goal".

Story Points Time vs. Review Evolution Time

As the engineer grows familiar with the code base, the time to complete a story ticket of complexity k decreases. However, the number of Tier-2 and Tier-1 comments should also increase. Something is amiss if the relationship is not inverse (less time per ticket, more reviews).

Documentation

Documentation directly improves onboarding and collaboration. Every team should document their modules. When facing unfamiliar issues, documentation proves invaluable. Some prefer calls over documentation, yet without it, a module requires repeated calls with various engineers. This is a fact. Questions will persist, newcomers will ask, and veterans must respond. Calls should always be considered in terms of cost. If three engineers join a call with a senior engineer, assuming engineers earn $40 per hour and the senior engineer $80 per hour, the call costs the company $200. Factor in the additional time wasted in preparation for this call and recovering from this call, and the cost is even more. This process repeats annually with staff turnover. Creating documentation requires a one-time effort of a few hours, thus saving costs.

Rebuilding KPIs

Tracing Backwards

Now that we've redefined the metrics, it's time to rebuild the KPI framework. Inspired by Amazon's customer obsession values, let's adopt the end user's perspective and list the things that we'd want:

  • More features delivered
  • Fewer bugs
  • Faster bug fix response time, especially for the urgent ones

Seems straightforward. Now, let's work backwards from that. How do you deliver more features?

  • Increased developer productivity, ensuring faster delivery.
  • Enhanced collaboration, enabling quicker contributions to new modules.
  • Easier onboarding due to improved collaboration.

How do you prevent bugs?

  • More unit tests.
  • More reliable reviews (Tier 1 and 2).

Finally, how do you ensure faster response times for urgent issues?

  • Employ sufficient reliable engineers to resolve issues quickly.

Constructing Metrics

Your new metrics would now be how you measure those.

Measures to evaluate developer productivity:

  • Time to complete complexity-k tasks (captured in JIRA)
  • Bugs created (captured in Related Bugs in JIRA)
  • Quick calls entertained (captured in Peer Reviews)

Ways to assess collaboration and onboarding contribution:

  • Documentation (captured in Pages contributed, access rate)
  • Quick calls entertained (captured in Peer Reviews)

Bug-related metrics would be:

  • Related Bugs (captured in JIRA)
  • Tests per PR (captured in Stash)
  • Tier 1 and 2 reviews (captured in Stash)

Response time metrics would be:

  • Number of Priority 1 or Priority 2 tickets worked on (captured in JIRA)
  • Number of hotfix tickets worked on (captured in JIRA)

Metrics Summarized

The Less time it takes to complete a task of complexity k, the better. The fewer bugs created, the better. An engineer's team collaboration efforts without a written record (i.e. Quick calls) can only be captured using peer reviews. Documentation and calls needed should have an inverse relationship; more documentation means fewer required call minutes. Finally, higher priority tickets and the hotfix contribution indicate higher engineer value.

Parting Thoughts

Using KPIs to understand the development process can be incredibly valuable. All those statistics and fancy dashboards can very quickly become a distraction. It must be reiterated that KPIs are not a way to evaluate individual engineers, nor should they be used as a means of control, to be, Big Brother. Instead, they should be used to identify areas for improvement and provide insights into the development process so that the teams can focus on delivering the projects faster.

Software Developer

My research interests include music information retrieval, recommendation systems and web.