Saturday, 7 March 2020

Lead Time Driven Delivery - Metrics

In part 1 of this series we have explored the basic idea of Lead Time Driven Delivery. The main idea is to minimise Wait Time, Disruption Time, Task Time and thus Lead Time for the work that is flowing through the organisation. In this blog post we are going to explore metrics and processes that will help you to:

  • Verify if process changes that you are implementing are minimising Lead Time
  • Identify how much Lead Time can be further removed

Before we explore the core idea, let me introduce you to the Little's Law.

Little's Law

Imagine a busy, but a small web agency in Edinburgh building web products for it's clients. This web agency employees UX Designers, Web Developers, Testers and Cloud Developers. They work as one team. This web agency has made commitments to deliver number of features for a very important client, this work is committed to the queue, which means this work is "work in progress" (WIP). All work that is being done will need some time from all of the team members (UX, Web Devs, Testers, etc). As soon as the team takes the work of the queue the timer starts and the timer stops when team has stopped working on the task and the work is done. This is called Cycle Time. Finally there is Lead Time, Lead Time timer starts from the moment that the work is committed to the queue and timer stops when work is done.

Remember the Hot Feature A from last blog post? Well it has spent 1 week in the queue, then finally it was picked up by the UX, however Web Devs, Testers, etc were all busy working on other work. So it has slowly made its way from one person to another until it was completed 3 weeks later. So it has taken 1 month to complete overall, but it required only 12 hours worth of work. This is one poorly managed web agency!

Relationship in the above diagram can be described with Little’s Law:

Lead Time = WIP / Throughput

Web agency team on average completes 0.3 of a task per day. Team on average has 9 commitments in the backlog that they need to get through. That means (9 / 0.3) = 30 days lead time. To improve this delivery situation team has two options:

1. Reduce the amount of committed work in the queue. If team reduced committed queue size from 9 to arbitrary lower number such as 3, this would mean that lead time would go to (3 / 0.3) = 10 days.


2. Team needs to improve the Throughput (Cycle time).

For more information around Little’s Law do check out this awesome article. This entire blog series focuses on improving Cycle Time and not reducing the WIP.

Core idea

Accelerate book suggests that one of the important metrics that should be tracked is Lead Time. This totally makes sense as this is what customer experience's and it impacts recovery time, experimentation speed, etc. This has very much inspired this entire blog post series. Accelerate book does recommend to track other additional metrics, if you are interested in knowing what they are then check out this summary / review of Accelerate book.

As a software delivery practitioner I find that Lead Time is a start, however it is very high level and it does not provide much detail for me to make the improvements or scheduling decisions.

Lead Time Driven Delivery is suppose to help by exposing Wait, Disruption and Task time. If you know how much Wait and Disruption there is in the system and where it is coming from then you can do something about it. By this point you might be wondering, how can I extract this information from my Application Lifecycle Management System (ALM)? Is it even possible to automatically get metrics for Wait Time, Disruption Time and Task Time variables? Answer is that you will need to use quantitative and qualitative techniques to extract data.

Quantitative analysis

This is the easy part. If your team is storing dev data in some ALM system then you can get this data in many different ways. You just need to make sure:
  • Only actual development time against the work is logged. In this web agency they have terrible internet speed, they love meetings and build server takes forever to run tests. So developer has taken 2 hours to do the actual work, but between all of the waiting, meetings and random requests the whole day passes (8 hours). In this instance developer should log only 2 hours of actual dev time and not 8 hours.
  • All work that needs to be done is grouped in a logical way so that it is possible to identify wait states between tasks for the deliverable.

Now you can create a two metrics Lead Time Resistance and Lead Time Spent Idle.

Lead Time Resistance

Lead Time Resistance measures how difficult it is to get work done. In the above digram Sam (UX) might take only 2 hours to do the design work (blue), however for the rest the day he is disrupted (orange) and before he knows two days have gone by. Lead Time Resistance calculation takes total actual time for the work and divides it by the total elapsed time for the work.

Feature in the diagram has taken 12 hours actual time, but it took 5 days total elapsed time. Lead Time Resistance for this is 1-[12 (hours) / [5 (days) * 8 (hours per day)]] = 70%, 70% was spent on Disruption and Wait Time i.e. stuff getting in the way, creating resistance.

Lead Time Spent Idle

Lead Time Spent Idle measures how well the work was planned. Sam (UX) has completed his work in 2 days. Work waited for ~1.5 days before it was picked up by John (Dev). After John was done, it has waited for another ~3.5 days before Dan (Test) picked it up. Total idle time of no activity is divided by the total elapsed time. Feature in the diagram has taken 20 (5 days * 4 weeks) business days to complete. Out of 20 business days it has spent 8.5 business days (1.5 days + 3.5 days + 3.5 days) in idle state, this means 8.5/20 = 42.5%, 42.5% was spent in idle state.

Now let's bring it all together. The feature in the diagram has encountered 70% of resistance and it has spent 42.5% of the time in the idle state. I don't know about you, but this is really useful information. Now that this is known, team can move on to the qualitative analysis and do some deeper analysis on what can they do to improve this situation.

Qualitative analysis

This is the hard part, this is where team needs to actually continuously question existing working practices and get creative about improvements. Quantitative analysis will expose a lot of variables, just take a look at this:

However, numbers alone will not tell you if Lead Time Resistance is large due to waiting around or chronic disruption culture. Additionally, it will not tell you how much time is lost due to poor work distribution, poor designs, lack of standards / components, staff turnover, etc. This is where teams should keep a daily log of all of the Wait Time, Disruption Time and Task Time. They should then use this information during retrospectives to review their current workflow setup and figure out how the future workflow should look like to improve the Lead Time.

Side Note

In Lean there is strong focus on waste elimination, just Google 8 wastes or check out my older blog post. Problem is that you need to look up what waste means in order to understand it. Then you need to translate it to knowledge work so that it is relevant. Personally I don’t think it is that relatable to knowledge work and once translated it does not stay in your mind for that long. Software pactioners have introduced "Waste Snake” while I like the concept, problem is still the same. "Waste" is a vague name. From my personal experience, I have seen teams use it for a while showing mainly disruptions. I don't know why but they have not focused on other more hidden wastes such as Wait and Task Time. It might sound less cool, but instead of "Waste Snake" create a "Lead Time Wall" and just stick on to it anything that impacts Wait, Disruption and Task Time with the amount of time lost.


Your team should automate the following metrics:

  • Lead Time
  • Cycle Time
  • Work in progress (WIP)
  • Lead Time Resistance
  • Lead Time Spent Idle

These automatic metrics are useful as they will expose Wait Time and they will tell you if you are going in the right direction with your process changes. To actually figure out what needs to change to improve Lead Time, your team will need to conduct constant qualitative analysis where you manually review Disruption and Task Time.

Tuesday, 28 January 2020

Lead Time Driven Delivery - Basics

TL;DR Your organisation needs to minimise Wait, Disruption and Task Time so that work gets delivered quickly all the way to the customer. This means you need to minimise Lead Time.

I am very luck as I have been an Agile and Lean practitioner for a while. I have spent time reading about different delivery methodologies such as XP, Scrum, Kanban, etc and I had an opportunity to work in organisations where they have used these different methodologies. One thing that constantly stood out to me was how they all prescribe best practice with a very shallow reasoning behind these prescriptions. It looks like they are based on experience and environment where they were created. This complicates things as approaches end up being open to interpretation on what is Agile and what is someone's subjective opinion on the matter. For example, if someone has came up to you and said, I want to change our existing process from A to B. How would you test/verify that this new process is more "Agile"? Would you refer to Agile Manifesto? Use your experience/training? As far as I know there is no testable/verifiable way to measure Agile, this creates a communication and expectation problem with new and seasoned practitioners. If you promote someone to be a team lead or if you onboard a new member of staff you need to explain to them why you are doing something in a certain way. Saying to them "please read this Agile book and follow it" is not going to work. If you tell your new members of staff just do "what we do”, well that’s flawed because they don’t understand the reasoning behind the intents. Also, why should they follow it? This means that the moment you need to change how your company works you end up with organisation that starts to make inconsistent decisions between different teams and departments as no one really understands what behaviour and KPIs they are trying to minimise or maximise.

In this blog post (and eventually series) I am going to attempt to breakdown the core reasoning behind the Agile practice so that it is more verifiable. Hopefully this will mean that core Agile ideas can be explained quicker to people around you and that you and your team can confidently mature your own delivery practice. Let’s get started.

1. Anatomy of you sitting down and trying to do some work

Three factors that make up your work:

  • Wait Time - This is when you are waiting around for some knowledge that you don’t have, decisions that you can’t make and finally you are waiting around for someone else to complete some work before you can start yours.
  • Disruption Time - This is when you have to expedite some work, rework some work, corporate interruptions and mental health impact.
  • Task Time - Finally, this is the actual work that you are doing, pure sitting down and getting things done.

Imagine you are working on your own on your own start-up. You will have very little wait and disruption time. You are on our own, you can make all of the decisions. Also if you are lucky enough to work in a quiet environment you should experience very little or no disruptions. You get things done fast, your users are impressed with your company, new features just come out all the time. However, this changes the moment you hire your first employee in your start-up. The moment you do that, you create an organisation, that means you have created a system. In the system work no longer gets done by a single individual, it gets done by many individuals. You as the founder are unlikely to feel much impact by hiring this new person (apart from knowledge transfer burden), but if you are not careful your new employee will have to wait for your decisions, knowledge and task allocation. Their Wait Time will grow as they wait for you and they will probably be disrupted by you. You will wonder why they are not as a productive as you, it might be because they have not got enough autonomy to make decisions (maybe they don’t know your values so they don’t know what decisions to make on your behalf) also they might not be getting enough clarity about the desired outcomes. Most people are not founders, they are the employees and they struggle to do their best as they just don’t understand the reasoning framework and don’t get enough autonomy.

2. Anatomy of your Waterfall company trying to deliver some value to the customer

Image a company that does not follow any Agile process and instead they have departments of people per discipline. So Web devs in one department, API Devs are in another department, you get the point. Each department will have their own backlog, which means everyone has their own Lead Time, on top of that all individuals will experience disruptions (team meetings, urgent requests you know the drill) and there will be many handovers from one department to another. Work will also end up traveling backwards due to misunderstandings. So if a customer has requested a “Hot Feature A” they will have to wait for a long time for this work to travel through this type of organisation (system). Actual Task Time for "Hot Feature A” might be 12 hours of work in total, however given all of the Wait Time (handovers and lead times) and disruptions it might take up to 1 month before it gets shipped. So there is a big difference between 1 month Lead Time and 12 hours Task Time. However your customer will not care about the 12 hours of Task Time, they will just care that you took 1 month Lead Time. Overall in this type of organisation Lead Time for most work will be very high, fewer projects will be shipped, projects will very rarely go out on time and individuals will feel frustrated as there will be a lot of firefighting.

3. Anatomy of your Agile company trying to deliver some value to the customer

Now imagine another company that understands importance of Lead Time and works to remove as much Wait, Disruption and Task Time (more on Task Time later) from overall delivery process. They have decided to sit people together for a limited amount of time to deliver certain features and projects. They have done this as they want to remove handovers, the amount of project management is required, competing agendas, waiting for decisions, knowledge and organisational dependencies. They work as a team on one story at time and their main job is to push that one story through the system as fast as possible. Now, that story that took 1 month to deliver, in this new system will take 12 hours or even less. This is because you have removed all of the waiting around, disruptions (team lead and product owners act as defenders) and because this team is sitting together they can actually expose the unknowns faster, tame complexity, share their experience and share the burden of the work so they can actually deliver the work faster.

4. Anatomy of Task Time

I have left this till last for a reason. Management team needs to improve the overall system before they look at improving Task Time. Why? It is much healthier to focus on fixing the overall organisation before looking at how they can improve individual's performance. In companies where Lead Time is high, good talent might become disengaged. As you fix the systemic problems, you might find that people who were not performing that well start to really surprise you and that Task Time reduces naturally.

The actual Task Time is made up from eight factors which are dynamic:

  • Volume of work - This is just you sitting and typing, copy and pasting.
  • Unknowns - This is you identifying stuff that you did not consider when you were estimating the work.
  • Complexity - This is you figuring out an algorithm to solve a problem, the main thinking part.
  • Risk - This is how much testing you have to do given the risk level that is acceptable for the task at hand.
  • Skill - This is you improving your hard/soft transferable skills (programming, math, architecture, algorithm design, management, etc) or using your existing skills to get work done quicker.
  • Domain - This is you gaining new domain knowledge (HR, Logistics, Financial Trading, etc) or using your existing domain knowledge to get work done faster.
  • Attitude - This is how you perceive your work environment and tasks.
  • Aptitude - This is you having developed or have predisposed skills towards the work that you are doing.

I know this is obvious but I would like to stress one point. Most people will take different amount of time to get a task done. Why? They are different people, with different mindsets, skills, domain knowledge, aptitude and attitude. All of these things impact overall Task Time.

It will not surprise anyone that experienced teams (high skill and domain knowledge), are more likely to identify unknowns, reduce volume of work through some automation, tame complexity and as a result deliver high quality work quickly. If your organisation wants to improve Task Time then it needs to ensure that people stick around.

What does it all mean?

In priority order, everyone in your organisation should be working hard to minimise Wait, Disruption and Task Time and thus minimise Lead Time. Organisations will never achieve perfect Lead Time, however they need to constantly work towards it. To me this is what DevOps, Slack, XP, Scrum, Kanban, Lean, etc is all about.

If you take anything away from this blog post, then it would simply be this, start to measure Lead Time for work that is traveling through your organisation and find ways to minimise it.