Saturday, 26 September 2020

Lead Time Driven Delivery - Part 5, Practical and closing thoughts

It is time to close this mini-series with some practical and personal ideas.

Work in Progress - Reducing lead time through queue control

If your team has a backlog than applying Little’s Law will not make much sense to the backlog itself. Little’s Law can be applied only to a queue and in queue systems something needs to come in and then come out. If something stays there and never comes out, then Little’s Law will not hold. From my experience a backlog is not a queue as work enters the backlog and can stay there indefinitely. When someone enters a queue, they are committed and slowly or quickly make their way through it and eventually leave the queue. If you are using Sprints then you can apply Little’s Law to the Sprint, under a Sprint condition team commits to the work, work enters the Sprint and then leaves the Sprint. If you have number of projects that are going through the system that are committed, then you can use Little’s Law for that as well.

Little’s Law is very significant to the LTDD as average Lead Time = Work in Progress / Throughout. This means the more things we commit to the queue the higher average lead time grows. So, it makes sense to pick a strategy where it is possible to quickly make space for new high priority work. This way customers don’t experience long lead times while software teams deliver their long-term commitments. This also tells us something very interesting. Average delivery lead time can be brought under control if your work scheduling is respected, this stabilises average delivery lead time which makes it more predictable and it enables planning and forecasting. However, for an organisation to benefit from low lead time they need to understand what work is important to them and give that work priority.

Understanding what work is important

Most of us heard the famous thought experiment: “If a tree falls in a forest and no one is around to hear it, does it make a sound?”. If your software team delivers a brand-new “feature A” that customers don’t care about (as they don’t use it right now), but you don’t fix their bugs, customers cases and answer their questions. Will they perceive your organisation to be responsive (low lead time) or not? In this scenario bugs, customers cases and questions are visible, that is what customer cares about, and this new “feature A” is that tree that has fallen in forest.

Prioritisation will depend on what is the most important thing for your company at that point in time. Maybe other customers have been told that “feature A” is coming and it is in the contract. Maybe “feature A” is amazing and most customers will upgrade to a new tier to get that feature. So, revenue might play a critical role. Maybe your company believes in quality above everything else. This means company might be OK with delivering new functionality a little bit slower while they prioritise customer cases, bugs and answering questions. There are two personas here: internal stakeholders (investors and sponsors) and external stakeholders (customers and partners). Personally, I don’t think internal and external stakeholders are at all mutually exclusive. However, your organisation needs to make a decision, which stakeholders lead time needs to be minimised, internal or external. Of course, there are more levers then that and it is more nuanced, but priority needs to be established.

I think that there is something that we can all learn from the Manufacturing here. Manufacturers have found that health and safety has correlation with productivity, in fact Lockheed Martin stated that by focusing on Health and Safety they have experienced 24% productivity increase and 20% reduction in factory costs. Think about that for a second. Does that mean we can focus on delivering quality service and be even more productive? There is no trade off? The answer seems to be yes. Now when it comes to software products some companies choose to sweep bugs under the rug, accept security risks and not deal with quality problems in their products in order to get more features out. This creates more escalations, customer cases, late night calls, bugs, endless cycle of firefighting and hardship. Everyone has to work harder just to keep the lights on. What if software companies on average constantly prioritised "Operational Quality" over new features first. Is it possible that software companies would provide get ~24% productivity increase by working this way?

Minimising customer facing lead time

More work there is in progress the higher lead time grows. Let’s say your team is working on important strategic project that will increase company revenue. You are about to commit to a deadline. Before you do it is important to remember that if you commit without leaving any space for customer requests, bugs, or small features customers will have to wait until you have completed your important strategic project. The only way for you minimise customer or contractual (security, SLA, GDPR) lead time is by factoring in some “operational slack” to address customer or contractual concerns. Also please don’t confuse your contingency with “operational slack”. Project contingency deals with project-based risk (discovering unknowns, someone needs few unexpected days off) and “operational slack” is space to deal with day to day operational concerns. This does mean that important strategic project will experienced longer lead time overall, but not at the cost of customer facing lead time.

Throughput - Ideas to reduce your cycle time

As you look to improve throughput you will want some explicit examples on what techniques can be used to actually achieve this. Here is a cheatsheet on what you can do per each factor:

Wait Time

Minimise knowledge, decisions and work dependencies. Try the following:

  • Reducing handovers
  • Prioritised Backlog
  • Planning and Scheduling
  • Prioritised Product Portfolios
  • Project Management
  • Removing dependencies on others
  • Training and knowledge sharing
  • Empowering to make local decisions and create knowledge
  • Single piece flow
  • Self-service
  • Just-In-Time
  • Minimise supporting teams WIP

Disruption Time

Minimise expedite (reactive work), rework, interruptions and mental health impact. Try the following:

  • Building quality in to the development process and product
  • Prioritised Backlog
  • Planning and Scheduling
  • Prioritised Product Portfolios
  • Empowering to make local decisions and create knowledge
  • Aligning work to an individual and company objectives
  • Protecting team from disruptions and being proactive

Task Time

Minimise / Maximise volume of work, unknowns, complexity, experience, attitude, aptitude and risk. Try the following:

  • Training and knowledge sharing
  • Components re-use
  • Involve subject experts
  • Increase talent retention
  • Aligning work to an individual and company objectives
  • Maximise team’s strengths a minimise weaknesses
  • Continual learning and experimentation

Please remember that you can make some quick wins in throughput, however really important breakthrough improvements will take time. These improvements will never really stop either, as you make some improvements, you will find new improvements that were hiding. Then you will make those and find new ones, and so on. It is important to stress that process improvements don’t have to be expensive to implement and most likely you don’t even need to build any additional software to make these improvements happen. From my experience, process changes mostly require a lot of thinking and communication.

I strongly recommend that you log all of the Wait, Disruption and Task Time during a Sprint so that your team can discuss this during the retrospective. Your team needs to take just one improvement away (ideally the one that will make the biggest impact) and actually make the change, that is the key. If after each retrospective team actually reviews issues and implements just one improvement, then after a while there will be no stopping this team.

Constraints - Remove constraints, reduce wait time

I am a big fan of theory of constraints and Eliyahu M. Goldratt’s work. After years of use, I have realised that theory of constraints model does not translate literally into knowledge work (this debate is outside of the scope of this blog post). However, I believe that there are few useful mental shortcuts (heuristic) that can be applied to get the benefit from theory of constraints in knowledge work.

Dependency constraint heuristic

If you would like to know if there is an operational constraint in the system just listen to the people say: “We are constantly waiting for X”, “How can X be so slow?”, “I can never get hold of X”, “They are just so busy, but we really need X”, “They keep promising that they will get it done by it never happens”, “Quality from X is never good enough”, “X is constantly down, this really slows us down”, etc. These constraints slow down the whole system as it can’t perform to the optimal levels. This means these individuals (can also be technology .e.g. build servers) are not delegating enough (usually managers), not saying no to things enough (anyone who over commits) or there are not enough of people to do the work (hands on people i.e. not managers) to remove this constraint.

Change constraint heuristic

If you don’t see enough change in the process it normally means that people who are supposed to be implementing the change are not prioritising it as a top priority (they are either over committed, can’t delegate or prioritise). This creates a big problem with opportunity cost. By this individual being a constraint to a process change you cannot enjoy the benefits of the change and you will not get to the desired destination sooner. Assuming that these individuals need to make the change, you need to implement a "change circuit breaker", which individual just prioritises the changes in over everything else for a short while (gracefully without impacting the customer lead time of course). If this does not happen then opportunity cost will just keep growing.

Conclusion

Lead time consists of number of items in the queue and your team’s throughput speed. It is possible to provide low lead time to your customers by leaving space in the queue for customer requests (Little’s Law). However, your investors and sponsors will most likely want you to also focus on getting your throughput improved so that they get more for their investment. Throughput is made up from three factors: wait, disruption and task time, by eliminating wait and disruption and minimising task time you can finally increase throughput speed. Once you start to eliminate wait, disruption and minimise wait time it might force you to go beyond your existing agile framework methods. As you focus on results and not methods you might end up questioning your long-term beliefs about what actually makes your team and your organisation productive.

Monday, 21 September 2020

Lead Time Driven Delivery - Part 0, Introduction

Contents:

Lead Time Driven Delivery (LTDD) approach has emerged from personal need to improve software delivery teams speed, LTDD is an extension of your Agile framework and it attempts to fix the following problems:

  • Agile frameworks tend to be collection of methods from industry practitioners. Most of these methods do not have any real evidence behind them that they actually work. Agile frameworks don’t necessarily have clear focus on what result they are trying to achieve, that is apart from vague "delivering value to customer” which is hard to measure.
  • Once organisations roll out Agile framework, it is not happily ever after. Some organisations start to deliver software slower, some speed up. However, no matter what happens, organisation's sponsors expect continuous improvement, so what’s the next improvement? How do you know what you can and can’t change? Are you bound to the Agile framework methods?
  • New practitioners and managers starting in the industry should not need years of experience to learn (often arbitrary) methods to be able to understand the main delivery concepts of why they are following some method, how it is applied and how they can make further improvements.
  • Certain scientific manufacturing management paradigms and models such as Theory of Constraints, 8 Wastes, etc don’t translate well into knowledge work. In fact, some aspects are hurtful and damaging to the knowledge work.
  • Software engineering department is not the only department in your company, how do you integrate your Agile Framework with Sales? Customer support? Implementations?

As a practitioner if you have identified similar problems then you might be happy to know that you are not alone, maybe this short series will give you some ideas on how you can further improve your team and your organisation overall. LTDD is not a collection of specific delivery methods such as pair programming, sitting together, using story points etc, this is already covered in abundance. LTDD is a framework and a way of thinking, it frees you from the Agile method and it allows you and your organisation to choose the methods that minimise your organisation’s lead time.

The name "Lead time driven delivery" name comes from research book called “ACCELERATE Building and Scaling High Performing Technology Organisations", this book identified KPIs that seem to correlate with profitability of organisations, and lead time is one of them. This is hardly surprising, our sponsors and customers don’t care that you have taken 5 minutes to make a software change but have taken 3 months to ship this change to production, all your customers see is 3 months elapsed time and not that 5 minutes. So, if lead time makes your organisation respond to market changes faster and provide better customer experience than why is this not your number #1 KPI?

This short series will attempt to give you some tools to make a change, and a really great thing is that it does not matter where you are and it does not matter how long it will take you to reduce that elapsed time from 3 months to 5 minutes, what matters is that you make a start and work with your peers through these problems, that collaboration is the real transformation.

Thursday, 3 September 2020

Lead Time Driven Delivery - Part 4, Stabilise through embedded testing

Before you read this please read prerequisite Focus on results, not methods blog post as it briefly explains the basic scientific thinking that will be used here.

How do you know if a piece of process, software, hardware, concept or idea will behave in a correct way? Also, how do you know if this thing will meet the required quality, performance and reliability levels? Well, it is all about knowing how this thing will behave under certain conditions and more specifically it is about knowing when something will work and when it will fail.

To make this a bit more concrete let’s imagine that a customer with a lot of money went to two different software houses, one is called Henry’s Software and the other one called Adam’s Apps. Customer asked them to develop an identical software. To keep things simple, we will focus on a specific requirement. Here is what these two companies have written down for an identical requirement.

Henry’s Software: As a user when I open the mobile app for the first time, I would like to be able to quickly and easily connect to sign into my companies account.

Acceptance Criteria:
  • User enters information that he/she knows
  • User is directed to the relevant login screen
  • User puts in username and password
  • User successfully is directed to company account.

Adam’s Apps: As a user when I open the mobile app for the first time, I would like to be able to quickly and easily sign into my companies account.

Acceptance Criteria:
  • User knows the name of the (1) company that he/she works for or their (2) corporate email address.
  • Story needs to deliver an experience that facilitates login with just one piece of information, there is no need for 2.
  • If email address is used, full valid email address needs to be provided before the company is looked up.
  • If company name is used at least 3 letters need to be entered before company is looked up, this is done to slow down the enumeration attack.
  • If email address or company name detects more than 1 company, then list of companies is given so that user can select which one they will be given a login screen for.
  • Once user clicks on the company user gets redirected to companies configured authentication provider for login.
  • If user cancels out of the login screen, then they will get redirected back to the company selection screen.
  • After 6 attempts to provide company information or email user will be asked to wait for 30 seconds, then 1 minute then 2 minutes, following [30 seconds * num of attempts], all the way to 24 attempts.
  • Company look up approach must be discussed with senior customer support team member(s) to ensure that it will result in least amount of customer support calls. Their feedback needs to be documented.
  • All web requests must not take more than 2 seconds.

Remember, it is exactly the same requirement. Henry’s Software requirement documentation is vague, it does not provide specific information that can be used to verify and test what was delivered. While Adam’s Apps does capture user behaviour, expectation, delivery options, prerequisites and system performance. Adam’s Apps can establish specific test criteria for this feature and verify it when it is delivered.

Henry’s Software might say that Adam’s Apps documentation is heavy and too specific, their argument might be:

  1. The customer is always available, requirements will iteratively emerge or will be discussed (see below).
  2. Conversation over documentation, team knows what was discussed so we don’t need to be explicit also remember that customer is always available, it is possible to reconfirm.
  3. Team is trusted to make a right decision, there is no need to be so specific.

Of course, we do want customers or business analysts (who represent the customer) to be always available, but they are not due to holidays, meetings and competing corporate priorities. Individuals might have discussed this requirement with customer, however, these individuals might leave, go on a holiday, get sick, have to do other work, which means story might be picked up by someone who has no context. This means this individual will not be able to fill in the assumptions and in turn make mistakes which will cause rework. If business analysts or whoever writes the story knows the criteria, they should write it and not assume assumptions as known. Finally, number 3, trust does not mean that team should not take the vague requirement and refine it to be testable, at the end of the day they will still need to test it.

The main argument of this section is not about requirements documentation, it is about testing. Regardless of the format of how we write requirements down, I hope we can agree that when you know under what conditions something will work and fail then it is possible to test the piece of process, feature, concept, idea or hardware under specific test criteria.

As software people we all want to deliver great software experiences on time to our users. However, so many projects overrun, things go wrong for millions of reasons. Normally something goes wrong early in the process and then it cascades issues downstream, these issues are normally systemic i.e. they are bugs in your delivery process. They normally emerge because process is: non existent, not followed, opaque, regressed, out of date or poorly designed.

Volatility : liability to change rapidly and unpredictably, especially for the worse.

It can also be hard for managers to see where issue has actually stemmed from as people are looking at the whole thing and not the individual parts. One of the ways to increase predictability and transparency in your process is to break it down into component parts and then exposing each component part to a test criteria. Why would you want to do this? Remember the reason why we are doing any of this is because we are trying to reduce the lead time. Additionally, further the faulty feature travels through your development lifecycle more costly it is to fix it, it adds more lead time to the faulty feature, and it adds additional lead time to other features in the queue! This means we want to catch bugs as soon as possible and release work into the next stage only if it has passed all of the relevant test criteria. This way you will stabilise your delivery and reduce lead time across your entire development lifecycle.

You might be thinking, but we have documentation for all of this stuff. We know how things should work. We have definition of done, ready, test plans and so on. Yes, the problem is that normally this documentation is outside of your development lifecycle. It is a separate piece of document that you need to read and let’s face it, you probably don’t read it often enough. You probably refresh yourself once in a while just before the auditors knock on your door. The real challenge is to embed this process documentation into your development lifecycle so that system becomes self-checking, testing and auditing. To make what I am saying more concrete, here are few examples of what you can do to enable this:

  • Embedded checklist - If you are using some sort of Application Lifecycle Management software then consider embedding a version of your definition of done / ready into the Epic, Feature, Story or Task as a checklist or validation rules. When individual fills in the Epic in they have to confirm that they have done X, Y and Z, or they can’t complete the work until something is done.
  • Public self-accountability - I don’t know about you, but when I have to email large group of people update on the project, I 100% want to get my facts right. When we publicly report something, we tend to be more transparent, accountable and self-governing. Normally, we don’t want to lose face. This means it can be a good idea to get teams to send out fortnightly project updates to senior stakeholders. There are many ways that this can be used.
  • KPIs - If you know how you and your peers are being evaluated then you will change your behaviour around that evaluation. In this case you and your team should be evaluated against Lead Time.
  • Automation - It should be no surprise that when processes are automated correctly than reliability and speed of these processes drastically improve. Conceptually what you want to do is have your entire "Development Lifecycle As Code", this means that if process does not need human creativity it should be standardised and automated away (where appropriate).

Testing in development lifecycle is not just about writing down testable requirements like we did for Henry’s Software and Adam’s Apps above. It is about embedding quality controls throughout the process, and these quality controls might look nothing like you would expect. You might not even think of them as quality controls. Do you think of automation as a quality control? Routine email being sent out by an individual? Your weekly stakeholder update meeting? What about your morning stand-up? These are all form of quality controls designed to catch faults and problems in your process.