Test coach versus test specialist, impact on queues

I recently had a very interesting conversation with a group of skilled testers on whether or not there should always be a test specialist in a cross-functional team.

A lot of people say yes, my personal experience says not.

My personal experience tells me that the most effective and efficient approach uses a test coach that slowly makes himself scarce. My experience focuses on creating competencies for completing an activity (testing) in a collaborative context.

One aspect I am finding difficult to explain is the impact on queues of a test coach approach, I will use a series of Scenarios and pictures to facilitate my reasoning.

If you feel like substituting activity A = development and activity B = testing feel free, but this approach is activity agnostic in my experience as I have applied it to analysis, and development obtaining similar results.

Scenario 1 – Test specialist

In the presence of one specialist on Activity B and many specialists on activity A.
Problem: Long q
ueues form in Ready for Activity B (Case1) or worse specialist multitasks on many activities at the same time slowing down flow of work as per Little’s law (Case2)

Screen Shot 2017-05-29 at 10.40.47

Scenario 2 – Test coach

Stage 1: In the presence of one coach on Activity B and many specialists on activity A
Initially coach pairs on Activity B with people that do activity A. This way we obtain 2 benefits:

  1. Queue in “Waiting for Activity B” is reduced as one person normally performing activity A is busy pairing with coach on one activity B task
  2. By pairing on activity B feedback loops are shortened
  3. Person with activity A acquires new skills to perform activity B from pairing with coach
  4. Quality of activity B increases as it is a paired activity
  5. Flow improves because of 1 and 2

Screen Shot 2017-05-29 at 11.22.49

Stage 2: When cross-pollination of skills required for activity B starts to pay off, we have 2 benefits

  1. Normally some activity A person will show particular abilities with activity B, in this case this person can pair with another less skilled activity A person to perform activity B
  2. Queue in “Waiting for Activity B” is reduced as more people with activity A skills are performing activity B
  3. Flow of value improves lead time decreases
  4. More activity A people get skills on activity B

Screen Shot 2017-05-29 at 10.56.00

Stage 3: All activity A people are able to perform activity B
Activity B Coach can abandon the team to return only occasionally to check on progress. Benefits:

  1. Activity A and activity B can be performed by every member of the team
  2. the WIP limit can be changed to obtain maximum flow and eliminate the queue in Ready for Activity B.
  3. The flow of value is maximised
  4. The lead time is minimised

Screen Shot 2017-05-29 at 10.58.05

WARNING: I have applied this approach to help teams with testing for many years. It has worked in my context giving me massive improvements in throughput and reduction of lead time. This is not a recipe for every context, it might not work in your context, but before you say it won’t work, please run an experiment and see if it is possible.

This is not the only activity that a good test coach can help a team on, there are many shift left and shift right activities that will also reduce the dependency on activity B.

I have been told a million times “it will never work”, I never believed the people who told me and tried anyway, that’s why it worked.

Try for yourself, if it doesn’t work, you will have learned something anyway.


The Resource Utilisation Paradox – On why less is more

Aim for 100% resource utilization
Trad Mgmt: aim for 100% resource utilisation

Traditional management approaches have often put high resource utilisation levels at the top of managers agenda as a recipe for effective management, assuming that existing not fully utilised resources are waste. When managers do capacity planning, they focus on utilising all the resources they have to the max, and this approach has often been considered a universal best practice among managers.

For the purpose of this article a resource[1] is a person that works within a team on a product.

Now, let’s have a look at a system where 100% utilisation works well:

Think about an oil pipeline. It would be really wasteful to build a massive pipeline and just push a little quantity of oil through it. Engineers will calculate the optimal pressure and volume of oil that can be moved and the oil will fill the pipe in a constant stream. That’s what I refer to as 100% resource utilisation.

Based on a similar principle, management believe that they should utilise all their resources 100% so that they can push through a stream of value to be delivered to our customers. 

Oil Pipeline
Oil Pipeline

So, what’s the problem? You might say. It is easy to agree with that approach but the issue is that by applying the laws of fluid dynamics to people delivering value through software, we underestimate the impact of variability. Human beings introduce levels of complexity and variability that are of an order of magnitude higher to the ones found in an oil pipeline. Oil will flow in a pipe at a steady pace at 100% utilisation, on the other hand a system made of human beings delivering value through software will be affected by countless variables and will not work the same way at 100% resource utilisation.

So, for a better comparison with software development, let’s use a system that is slightly more similar to a software delivery team than an oil pipe is because it includes human variability. Let’s use traffic flow.

100% utilization = traffic jam
100% utilisation = traffic jam

Let’s now look at what happens when we try to get 100% utilisation, that in traffic terms means all lanes used with as close to constant as possible flow of vehicles. Human variability factors in this time include for example a driver that gets distracted and doesn’t move when there is space in front of him, or a car that breaks down, or a driver that tries to overtake in between lanes or even a driver that falls asleep at the wheel. The result is something like in the picture to the right.

What’s efficiency for a traffic system? I am simplifying but the throughput of cars would seem to me as a good metric. So let’s say that our measure of efficiency is C/h = number of cars that go through a section of the road in one hour. A traffic jam implies low throughput and high cycle time.

We have all been in a traffic jam and we should know that it is certainly not the most efficient way of using the capacity of the road. When you are in a queue and are not moving, you are not being efficient, you are wasting time and petrol. Starting and stopping is not efficient either as, when breaking, you waste your momentum you gained when accelerating, burning petrol and breaks.

Laess acthan 100% utilization = Highway flow
Less than 100% utilisation = Highway flow

Now look at the picture on the left and tell me, is that more efficient than the traffic jam? Will the C/h of the second picture be higher or lower than the one in the first one?

Believe it or not, the answer is higher.

But, we are not using all our resources! Look at that unused space between cars, surely this can’t be right!

Now look at the two pictures below and see if you can identify similarities with the traffic system.

Comparison 1
Comparison 1
Comparison 2
Comparison 2


This is a simple visual representation on how 2 similar systems can obtain higher throughput by limiting the work in progress (cars in transit). It is a counter intuitive idea, but limiting the amount of cars that can be on a road at the same time, can increase throughput. Similarly you can increase your software delivery throughput by limiting work in progress. For the moment I am going to leave it for your imagination to think and speculate whether this is true or not. In the next chapter of this blog post I will use mathematics to demonstrate it. Stay Tuned.

 [1]I generally don’t like referring to people as resources, but in this case I will make an exception because it makes the story I am about to tell you, easier to understand.

On empowerment and the 2 types of downtime


In a recent LinkedIn discussion, the topics of testing downtime and what should testers do during it was examined. I read and reread the answers a few times and I became convinced that I have a completely different view of what downtime is and how it should be utilised.

The first thing that I noticed was that test managers and leads had a substantial list of items to ask testers to do during their downtime but none of them seemed interested in asking the testers themselves what they thought they should do when in such situation.

This made me reflect on different levels of testers empowerment and respective managers levels of trust. Asking a tester to do something during downtime is not a bad thing “tout court”, I can see some situations where it could be helpful. On the other hand fostering an environment where we assume people are going to do nothing if not asked to do something can be a self fulfilling prophesy.

I believe that by fostering and rewarding a culture of awareness and collaboration while assuming people’s good intent, leaders can help and motivate people to do things because they want to, not because they are asked to do them.

The second observation that made me reflect is the fact that most of the activities proposed were activities that generally testers that work with me do as part of their normal day to day job and not as a once off during downtime. Among things of this type, LinkedIn users suggested:

analyse coverage and issues root cause, learn automation, cross skill with subject matter experts, review of tests, document procedures on wikis, read a work related book, blog, watch a video, self placed training, knowledge transfer, and other “improving” activities.

This point made me think that maybe my downtime is different from other people’s downtime. We do most of the activities suggested in the thread (apart from some wasteful ones) as part of our normal work, so let me tell you what my 2 types of downtime are.

Good downtime that we need to encourage and Bad downtime that we need to fight.

Good downtime: good downtime is planned downtime that we introduce into our process by limiting resource utilization for example by setting WIP limits and allowing for slack. This downtime is designed so that people don’t burn out. During this time people can do whatever they want from searching funny memes on the web, to going for a walk in the park, or if they prefer, studying something they want to learn, there is no limit to what people can do during good downtime. In my experience this downtime generally is of the form of 1 to 2-3 hours with cadence that depends on the specific context and can be fine tuned depending on project and people needs.

Bad downtime
Bad downtime

Bad downtime: This is unexpected downtime due to us waiting for something to do because the flow of work is blocked somewhere else in the system. Say for example the build is broken and can’t be progressed for us to do exploratory testing or the business analyst is on holidays and there is shortage of new user stories coming through. This is bad downtime because it is affecting the flow of value towards our customer. In this case t-shaped testers can help a lot to fix the issue, in fact they are able to support the developer stuck with the broken build or help in the creation of new user stories. When issues like the above happen, using tools like Kanban can be extremely helpful, in fact you will be able to visualize the issue immediately in the form of a bottleneck. The next thing to do is for the team (including but not limited to the people in downtime) to swarm around the bottleneck, reduce it and restart the flow.

If you want to continuously improve your flow, swarming and resolving bottlenecks is necessary but not sufficient. It is important that you resolve the root cause of the downtime and related bottleneck. One effective way to expose bad downtime so that we can identify patterns and fine tune our process is the waste snake, extremely easy to set up and use, I’d strongly recommend it.