Saturday 7 March 2020

Lead Time Driven Delivery - Part 2, Learning from data

In part 1 of this series we have explored the basic idea of Lead Time Driven Delivery. The main idea is to minimise Wait Time, Disruption Time, Task Time and thus Lead Time for the work that is flowing through the organisation. In this blog post we are going to explore metrics and processes that will help you to:

  • Verify if process changes that you are implementing are minimising Lead Time
  • Identify how much Lead Time can be further removed

Before we explore the core idea, let me introduce you to the Little's Law.

Little's Law

Imagine a busy, but a small web agency in Edinburgh building web products for it's clients. This web agency employees UX Designers, Web Developers, Testers and Cloud Developers. They work as one team. This web agency has made commitments to deliver number of features for a very important client, this work is committed to the queue, which means this work is "work in progress" (WIP). All work that is being done will need some time from all of the team members (UX, Web Devs, Testers, etc). As soon as the team takes the work of the queue the timer starts and the timer stops when team has stopped working on the task and the work is done. This is called Cycle Time. Finally there is Lead Time, Lead Time timer starts from the moment that the work is committed to the queue and timer stops when work is done.

Remember the Hot Feature A from last blog post? Well it has spent 1 week in the queue, then finally it was picked up by the UX, however Web Devs, Testers, etc were all busy working on other work. So it has slowly made its way from one person to another until it was completed 3 weeks later. So it has taken 1 month to complete overall, but it required only 12 hours worth of work. This is one poorly managed web agency!

Relationship in the above diagram can be described with Little’s Law:

Lead Time = WIP / Throughput

Web agency team on average completes 0.3 of a task per day. Team on average has 9 commitments in the backlog that they need to get through. That means (9 / 0.3) = 30 days lead time. To improve this delivery situation team has two options:

1. Reduce the amount of committed work in the queue. If team reduced committed queue size from 9 to arbitrary lower number such as 3, this would mean that lead time would go to (3 / 0.3) = 10 days.

OR

2. Team needs to improve the Throughput (Cycle time).

For more information around Little’s Law do check out this awesome article. This entire blog series focuses on improving Cycle Time and not reducing the WIP.

Core idea

Accelerate book suggests that one of the important metrics that should be tracked is Lead Time. This totally makes sense as this is what customer experience's and it impacts recovery time, experimentation speed, etc. This has very much inspired this entire blog post series. Accelerate book does recommend to track other additional metrics, if you are interested in knowing what they are then check out this summary / review of Accelerate book.

As a software delivery practitioner I find that Lead Time is a start, however it is very high level and it does not provide much detail for me to make the improvements or scheduling decisions.

Lead Time Driven Delivery is suppose to help by exposing Wait, Disruption and Task time. If you know how much Wait and Disruption there is in the system and where it is coming from then you can do something about it. By this point you might be wondering, how can I extract this information from my Application Lifecycle Management System (ALM)? Is it even possible to automatically get metrics for Wait Time, Disruption Time and Task Time variables? Answer is that you will need to use quantitative and qualitative techniques to extract data.

https://www.slideshare.net/Intellspot/qualitative-vs-quantitative-data-infographic

Quantitative analysis

This is the easy part. If your team is storing dev data in some ALM system then you can get this data in many different ways. You just need to make sure:
  • Only actual development time against the work is logged. In this web agency they have terrible internet speed, they love meetings and build server takes forever to run tests. So developer has taken 2 hours to do the actual work, but between all of the waiting, meetings and random requests the whole day passes (8 hours). In this instance developer should log only 2 hours of actual dev time and not 8 hours.
  • All work that needs to be done is grouped in a logical way so that it is possible to identify wait states between tasks for the deliverable.

Now you can create a two metrics Lead Time Resistance and Lead Time Spent Idle.


Lead Time Resistance

Lead Time Resistance measures how difficult it is to get work done. In the above digram Sam (UX) might take only 2 hours to do the design work (blue), however for the rest the day he is disrupted (orange) and before he knows two days have gone by. Lead Time Resistance calculation takes total actual time for the work and divides it by the total elapsed time for the work.

Feature in the diagram has taken 12 hours actual time, but it took 5 days total elapsed time. Lead Time Resistance for this is 1-[12 (hours) / [5 (days) * 8 (hours per day)]] = 70%, 70% was spent on Disruption and Wait Time i.e. stuff getting in the way, creating resistance.

Lead Time Spent Idle

Lead Time Spent Idle measures how well the work was planned. Sam (UX) has completed his work in 2 days. Work waited for ~1.5 days before it was picked up by John (Dev). After John was done, it has waited for another ~3.5 days before Dan (Test) picked it up. Total idle time of no activity is divided by the total elapsed time. Feature in the diagram has taken 20 (5 days * 4 weeks) business days to complete. Out of 20 business days it has spent 8.5 business days (1.5 days + 3.5 days + 3.5 days) in idle state, this means 8.5/20 = 42.5%, 42.5% was spent in idle state.

Now let's bring it all together. The feature in the diagram has encountered 70% of resistance and it has spent 42.5% of the time in the idle state. I don't know about you, but this is really useful information. Now that this is known, team can move on to the qualitative analysis and do some deeper analysis on what can they do to improve this situation.

Qualitative analysis

This is the hard part, this is where team needs to actually continuously question existing working practices and get creative about improvements. Quantitative analysis will expose a lot of variables, just take a look at this:


However, numbers alone will not tell you if Lead Time Resistance is large due to waiting around or chronic disruption culture. Additionally, it will not tell you how much time is lost due to poor work distribution, poor designs, lack of standards / components, staff turnover, etc. This is where teams should keep a daily log of all of the Wait Time, Disruption Time and Task Time. They should then use this information during retrospectives to review their current workflow setup and figure out how the future workflow should look like to improve the Lead Time.

Side Note

In Lean there is strong focus on waste elimination, just Google 8 wastes or check out my older blog post. Problem is that you need to look up what waste means in order to understand it. Then you need to translate it to knowledge work so that it is relevant. Personally I don’t think it is that relatable to knowledge work and once translated it does not stay in your mind for that long. Software pactioners have introduced "Waste Snake” while I like the concept, problem is still the same. "Waste" is a vague name. From my personal experience, I have seen teams use it for a while showing mainly disruptions. I don't know why but they have not focused on other more hidden wastes such as Wait and Task Time. It might sound less cool, but instead of "Waste Snake" create a "Lead Time Wall" and just stick on to it anything that impacts Wait, Disruption and Task Time with the amount of time lost.

Conclusion

Your team should automate the following metrics:

  • Lead Time
  • Cycle Time
  • Work in progress (WIP)
  • Lead Time Resistance
  • Lead Time Spent Idle

These automatic metrics are useful as they will expose Wait Time and they will tell you if you are going in the right direction with your process changes. To actually figure out what needs to change to improve Lead Time, your team will need to conduct constant qualitative analysis where you manually review Disruption and Task Time.