Thursday, October 1, 2009

Large Team, 6 week Release Cycle...... Transition between Releases

In the past 5 years that I have been working with Agile teams (10-25 members; 4-7 month release cycle), I never had to think too much about transitioning the team from release n to release n+1. The transition has been quite simple so far. A release typically consists of a number of iterations followed by a regression and/or UAT phase. During the regression period the team will be fixing defects and stabilizing the application for deployment to production. As we progress into the regression and/or UAT phase, the number of defects reduce. At that point of time, we could branch off for the release. A small part of the team would continue to fix defects on the release branch. The number of defects fixed after branching is not too big. With regular merges from branch to trunk, we could easily make sure that all changes from branch are available in trunk as well.

The current team that I am working with, is 190 people strong. We have been working on this code base for more than 2 years. We follow a 6-8 week release cycle i.e. a release is deployed to production every 6 - 8 weeks. The above mentioned model/process fails because of the amount of changes that that this team accomplishes per day. Since the team and code base are big, we may fix up to 30 defects per day. If I try to replicate the process/model mentioned above for a small/mid size team, the merge becomes very painful as the amount of changes on trunk as well as the branch are too big to be merged effectively. Most of the merges will result in some defects either getting reopened or some new defects getting introduced. In order to reduce this problem, we should branch as late as possible. At the same time, if we have the entire team fixing defects, it is impossible to find work for all the dev-pairs on the team. So, from this perspective, we should branch as early as possible.

In order to make the transition as smooth as possible, we needed to get a period where we could fix defects and stabiles the build and at the same time continue to develop new functionality without impacting the release. Rajiv and I started looking into our story backlog to check if we can find such stories. We soon found that such stories do exist. Any story that is impacting/changing a feature not going live in the current release falls under this category. In our case, the application has 15 portals. If a portal is not getting delivered as part of release n, then we can continue to make changes without impacting the stability of the application. Other easy example is if a feature is hidden behind a permission and will not be delivered as part of release n, we can play stories that change/evolve this feature. While identifying these stories we need to evaluate if the story has a potential to impact other features as well. If the answer is yes, then such story should not be played before branching.

This approach provides and optimal balance between the need to branch as early and as late as possible. Using this approach we can ensure a smooth flow of stories through the pipeline even when the team transitions from release n to release n+1. The team never starts from an empty pipeline and thus the entire system is geared to maximize the throughput.

In addition to the above mentioned approach, GIT has helped us to considerably reduce the effort and time required for merging from a branch to trunk or across branches. I will share our experience of working with GIT in my next post.

5 comments:

Pawan Shah said...

Awesome blog sirjee, waiting to read about GIT from a manager's perspective

Anonymous said...

ThoughtWorks going to the dogs:

"Most of the merges will result in some defects either getting reopened or some new defects getting introduced."

I'm unsure why that happens ? Not enough test coverage ?

Anonymous said...

Good thinking...I would like to see some more coming on regular basis :) - MB

Unknown said...

nitin ..nice one...

Oliver Smith said...

I'm unsure why that happens ? Not enough test coverage ?

The old adage about blaming the test coverage eh! In my experience on a project of this scale moving at the pace you are describing the solution is never as simple as not enough test coverage. I suspect that we are looking at a collection of problems and that this is really a symptom of something more fundamental in your process.

The question you problems you need to think about in my opinion are:

1. How to limit the time between the branch and merge
2. Limiting the defect escape into your branch - finding the defects earlier and fixing them prior to the branching
3. Find a mechanism of identifying the defects that you are fixing in the branch and then validate that fix as part of your merge (or add a test case against is in the trunk which will fail until that defect is fixed - although this second part can be wasted effort in reality
4. In the short term create a physical card wall in your office to track the defects and ensure that they are retested at the point of the merge (manual or automated)