Wednesday, November 23, 2016

The Mythical Man-Month revisited.

One of the most intriguing stories published about culture’s influence on collaborative engineering projects can be found in “The Mythical Man-Month” by Professor Frederick P. Brooks. His book is a classic within Computer Science and also goes by the moniker “TMMM”.

Numerous empirical studies on software engineering projects have referenced TMMM, and every research paper in the relatively new Empirical Software Engineering [ESE 2011] discipline within computer science, starts with quoting it. How come?

Published in 1975, TMMM still provides valuable insight into why certain software projects are destined to fail from the beginning and what we can do to avoid a letdown.

A prime example showcasing the success which can be obtained by applying the knowledge in this book can be found through Microsoft Windows Vista and Windows 7.

Microsoft Research’s ESE used Brook’s wisdom to study what impacts quality of software produced by globally dispersed, culturally diverse teams. They not only studied, but applied their findings and the success attained by Windows 7 far exceeds the negativity surrounding Windows Vista.

Let’s take a look at TMM Chapter 7 “Why Did the Tower of Babel fail?”
According to the Genesis account, the tower of Babel was man’s second major engineering undertaking, after Noah’s ark. Babel was the first engineering fiasco.
The story is deep and instructive on several levels. Let us, however, examine it purely as an engineering project, and see what management lessons can be learned.
How well was their project equipped with the prerequisites for success?
Did they have:
  • A clear mission? Yes although naively impossible. The project failed long before it ran into this fundamental limitation.
  • Manpower? Plenty of it.
  • Materials? Clay and asphalt are abundant in Mesopotamia.
  • Enough time? Yes, there is no hint of any time constraint.
  • Adequate technology? Yes, the pyramidal or conical structure is inherently stable and spreads the compressive load well.
Clearly masonry was well understood. The project failed before it hit technological limitations.
Well, if they had all of these things, why did the project fail? Where were they lacking? In two respects – communication, and consequently, organization.
They were unable to talk to each other, which led to a failure in coordination and a subsequent break in workflow. 
From these events, we gather that lack of communication led to disputes, bad feelings, and group jealousies. Shortly after, the clans began to move apart, preferring isolation to wrangling.
The Tower of Babel project failed due to lack of collaboration. Cultural differences between the teams working on the project led to a lack of communication and consequent lack of organization required to get the job done.

So how does this relate to large global corporations and their employees? How does it relate to different "corporate clans" like IT, Marketing, Sales, etc. which are eventually spread across the globe?

Simply put: Communication and organization are two sides of the same coin. One cannot expect good communication within an inadequately structured organization.

According to Conway’s Law, “Organizations are limited to produce artifacts that reflect their communication structure”.

Good collaboration tools can ease the situation, but cannot resolve it. One cannot improve communication without imposing changes in organization.

When Microsoft Researchers found that physical distance doesn’t affect post-release fault rates, but distance in the organizational chart does [Nagappan et al (2008), Bird et al (2009)], Microsoft changed the structure of its development organization, which led to a superior quality product, Windows 7.

What would be needed to conduct similar empirical studies at your corporation to find the optimal organizational structure to drive down post-release fault-rates in your software systems? And what do you do to ensure the clans in your globally dispersed work force keep talking to each other?

Tl;dr

Have fun, -joke

[ESE 2011] "Empirical Software Engineering" in: American Scientist, 2011. http://www.americanscientist.org/issues/feature/2011/6/empirical-software-engineering/1

[Nagappan, et al (2008)],The Influence of Organizational Structure On Software Quality: An Empirical Case Study, January 1, 2008.
https://www.microsoft.com/en-us/research/publication/the-influence-of-organizational-structure-on-software-quality-an-empirical-case-study/

[Bird, et al (2009)], Does distributed development affect software quality? An empirical case study of Windows Vista, August 1, 2009.
https://www.microsoft.com/en-us/research/publication/does-distributed-development-affect-software-quality-an-empirical-case-study-of-windows-vista/