Reading Nathan Yau's recent post about the Rise of the Data Scientist inspired me to take a look Ben Fry's dissertation on Computational Information Design in which he describes the process for understanding data as follows:
- acquire – the matter of obtaining the data, whether from a file on a disk or from a source over a network.
- parse – providing some structure around what the data means, ordering it into categories.
- filter – removing all but the data of interest.
- mine – the application of methods from statistics or data mining, as a way to discern patterns or place the data in mathematical context.
- represent – determination of a simple representation, whether the data takes one of many shapes such as a bar graph, list, or tree.
- refine – improvements to the basic representation to make it clearer and more visually engaging.
- interact – the addition of methods for manipulating the data or controlling what features are visible.
I took his process and created a diagram that maps my own skill set with the addition of Interaction Design (my current profession) which I believe covers the represent, refine, and interact steps.
While I don't disagree that these steps represent the process for understanding data for the individual creating the data visualization, they don't cover a step needed to create a design that is readily understood or that is persuasive to others.
User research and testing of the design is needed to verify that the representation is clear and appropriate. Although this could be considered part of the refine step, it may be needed at other points in the process (i.e. represent, or interact). For anyone who is interested in creating data visualizations for other people, it should be considered an important part of the design process.