Note that in this case I decided
to use standard line chart since the information that I wanted to present is not
really that complex – I certainly don’t want the “cool charts” to become my
golden hammer. It is also worth noting that after using this data for some time
I decided to normalize Y-axis, so I can compare metric on different classes at
a glance. See for instance second example below, where you can clearly see that
over time number of both modifications and defects decreased – it’s not marked
as sharply as on stock market, but if you see a number of these charts (one
above being an example), the existence of a trend becomes obvious.
It’s still hard to say whether
this trend means good or bad – it is obvious that this class doesn’t get
changed, but it still can mean that certain problems exist, eg. the class got
too big and rarely any modification is justifiable by a business reason. To nail down the cause, you’d need other
metrics to provide you more context. I’ve been already thinking of possible extensions
that could exploit more sophisticated ways of visualization, when for example one
could be able to see how size of a class changes on the same timeline.
Theoretically it would be awesome to see coverage there as well but this is
unfortunately not possible in our case – if you however can do it, go for it,
I’d be thrilled to see that. And if I were to choose how to visualize that I’d
probably go for a chart a’la GapMinder which sort
of out of the box would enable having combined view of metrics for many classes
at the same time. Anyhow, I’m not sure which information would benefit you the
most, but it’s very much worth exploring :)
I was not planning to explore
this problem in an orderly fashion of any sort, because when I research a topic
I like to do a bit of jumping from one thing to another which helps me get a
better grasp on all aspects of the particular problem). I decided that for a
next step I want to go more into the correlations of classes across issues (I
again got inspired my Michael Feathers – http://michaelfeathers.typepad.com/michael_feathers_blog/2011/09/temporal-correlation-of-class-changes.html).
The first visualization that got created out of this concept was a graph of all
possible correlations (ie. classes that get changed together as a part of the
same issue) above certain threshold for whole project and it looked like that:
The size of a node (representing
a class) is proportional to the aggregate number of issues when a class was
changed together with other class, and correlation is depicted as a link
between the two. This visualization certainly looks cool and also is
interactive – you can pan the area, zoom, move nodes around… actually look foryourself on Protovis site.
What’s the downside then? There’s just too much information – it does show you
an overview of areas having a strong coupling (see yellow), it will highlight
boundaries of application modules (see blue), but it’s almost impossible to get
more specific information out of it. So it’s good as a start but you need a
next step here, something that would let you dig into the details why the
situations is as it is and whether you should do something about it.
I’m planning on describing ways
of resolving this in Part 3 so for now let me just show you another way of
visualizing the same information. What I’m going to present is IMHO much more
useful when you need to focus on the correlations (especially identify where
they don’t make any sense) rather than classes (correlations lower than 3 were
filtered out):
The concept is quite similar to a
previous one: nodes are classes, links are correlations. Then, around the whole
circle classes are positioned in a specific sort order – by package name.
Having them in such an order let’s you apply a simple heuristic – whenever
there’s a link between a two remote locations of a circle there is potential
unnecessary coupling between two separate packages… and while there may be a
relationship in the code it’s at least suspicious if these classes get changed
together too often (change frequency is represented by color, increasing from
green to red). On the other hand even if the correlations are close (in the
same package), but there are lots of it, it still can have negative meaning – for
example the package may be too large. I didn’t play much with this
visualization so there may be many other ways of analyzing and getting valuable
information out of it. Moreover there’s an amazing tool for doing much morepowerful visualization of this kind, and as soon as I learn
how to use it, I’ll write more on its potential.
In the next part… right, I’m not
gonna lie to you, I have absolutely no clue about the next part, besides that
there’s going to be one. Maybe I’m going to get more into the detail how I
decided to present information for a single class… or maybe I’ll describe
possible use cases to you can employ these charts for… or something totally
different. Not sure – stay tuned.
1. If you happen to have the length and scope of different issues wide-spreaded this metric will count all changes within single issue as one and result in overestimating importance of “quick fixes” and underestimating “long enhancements”. Because of that recently I modified this metric not to count all modifications in a single issue as one, but instead do it per-day basis. Then if a file is modified many times on different days, the number of days when it’s modified is the number we’re looking for.