A Culture of Checking: Open Climate Models (4) 	This is a continuation of my series on community approaches to climate science and, specifically, climate models. I am exploring the ideas of 
open innovation and open source communities. This article will be on verification, testing, evaluation, and validation. I know that this is, perhaps, a bit off the path of what I imagine is my main audience, but these ideas are at the core of climate science, and also part of the politicization (or de-politicization) of science. Before I dig into the arcane, I wanted to reference Jeff Master’s post 
on the major floods of the past few months. This combination of warm seas, warm air, water vapor and rain and geography and people is difficult to ignore, and I plan to revisit these floods in the context of the 
recent guest blog by Christine Shearer. Now, deeper into the world of how we build climate models. Here are links to the previous articles in the series: 
#1 in series, 
#2 in series, 
#3 in seriesVerification, Testing, Evaluation, Validation: Validation is an important part of the 
scientific method. The scientific method always relies on observations, in concert with the development of testable hypotheses, and experiments or predictions which provide the tests for the hypotheses. In practice the validation process includes not only a scientist evaluating the results of the experiment or prediction, but providing written documentation which describes their work and their methods of validation. Using this documentation other scientists are able to evaluate the work and design independent experiments or predictions and methods of validation. It is not until there is independent confirmation of a scientist’s work that the work is accepted into the scientific body of knowledge. Once in the body of knowledge, work is not recognized as fact - rather it sits as a contribution that is, often, continuously challenged. It is a harsh, competitive process – not sublime.
By the very definition of scientific practice, a certain level of transparency is required. The transparency allows those who are, essentially, competitors to examine, reproduce, and independently confirm or refute the work of the scientist who initiated the original study. This general process of validation, reporting, and independent certification is a remarkably conservative process. It is slow to interject major shifts and changes into the knowledge base. This conservativeness impacts strongly the way scientists speak to each other and the public; there are always nuances of uncertainty and equivocation (again, 
 see this entry). If you think about this culture of validation, a culture of checking, it has a lot in common with 
how markets are run and governance. Governance? A set of rules, buy in by participants, and a system of checks and balances. I have a number of previous articles on validation and transparency in a more general sense: 
Opinions and Anecdotal Evidence, 
Trust, but Verify, 
Uncertainty.
Verification, testing, evaluation, validation, I am grouping these words together to describe the culture of checking that is pervasive in the good practice of science. When I went to school, the idea of checking my arithmetic was taught again and again. It had to be taught again and again, because I was not smart enough to understand the value of checking as an abstract concept. As problems get more and more complex, strategies for checking get more and more sophisticated. When I used to hire a lot of new graduates at NASA, during the interviews we explored how they checked their work. During the first year of employment we were often coaching, teaching, and training people in how to check. 
How do we translate the skills of checking to climate models? I have been laboring over the words verification, testing, evaluation, and validation. What is the difference? To start, it is important to realize that climate models are computer codes, programs, software. And though many scientists object to the following categorization, the product that climate modelers produce is software. It is software for a purpose – the scientific investigation of climate. As suggested by the previous entries in this series; the software is complex; it represents individual sub-processes (like cloud formation), it represents an approximation to those sub-processes; it is developed by geographically dispersed individuals, who are generally not managed as a coordinated group.
I will start with the easy one. Verification is the process of assuring that the software you have written or implemented is doing what you intended for it to do. Suppose you are writing a simple computer program to calculate how long it will take you to drive from Limon, Colorado to DeKalb, Illinois. This is a simple equation of motion represented by distance traveled equals speed multiplied by time traveled. You might check your program (your model) by seeing how long it takes to drive 100 miles at 20 miles per hour – a simple problem for which you can confidently state the answer. Maybe you try different pairs of speeds and distances. If you are thoroughly scientific, you might collect some data with your car. Of course, that might raise questions of determining the accuracy of your speedometer and your odometer, another requirement for checking. With some confidence you can develop a program that, without driving from Limon to DeKalb, you can make a very good approximation of how long it will take at a given speed. You can perhaps add another question – how fast do I have to drive to make the trip in 24 hours? 18 hours? With this example you can imagine the process of verification, checking that your program is doing what it is supposed to do. You might also say that you are testing your code. Testing is another word in the culture of checking, which takes on more specific meaning in different processes.
Evaluation and validation are more difficult to explain. Both words are linked with a comparison with independent information, specifically, observed information. At the risk of being tedious, when I worked at NASA, there were different sub-cultures represented by those who made instruments and those who made models. Validation at NASA often defined the process by which people who took measurements from space assured that those measurements, say, measured temperature. This would require deploying different types of temperature measuring devices, like thermometers, to take concurrent measurements at the same place. The point here is that a new way to measure temperature was being evaluated with an accepted, established way to measure temperature. Within NASA, it was a widely held belief that models could not be “validated,” because in general there was not such a clean comparison to a standard of accepted knowledge. Hence, the word “evaluation” emerged as the way to state that the model was being compared with observed information. 
I will ultimately maintain that models can be validated in a formal sense. This remains an assertion that many of my colleagues disagree with. While, I accept the nuances of evaluation and validation and testing, it is important that climate science embrace the rigor implied by “validation” of models. Before I go on, I provide a link to couple of papers that were provided to me by 
Doug Post some time ago. These papers drew largely from the experience in the U.S. National Laboratories responsible for assuring the robustness and safety of our nuclear weapons through the use of computational models (think about that application!). These papers generated transient discussion in the climate community: 
Computational Science Demands a New Paradigm and 
Software Project Management …..
As happens with my blogs, they sometimes, get a bit long, so in the spirit of the medium, I am going to search for the take away message. The existence of the semantic arguments concerning the words evaluation and validation suggest that defining measures of quality assurance of climate models is a difficult process. It is not uniquely defined. It depends on what you are trying to do, for example, predict El Nino or how the ocean melts the bases of glaciers in Antarctica. The evaluation or validation process also depends on how a modeling system performs. This system is constructed from sub-components developed by individual scientists, in practice, spread all over the world. The migration of individual components is sometimes performed by those reading the literature and reproducing work as described in the literature; local adaptation of algorithms is often performed.
As I stated above, there is a culture of checking in our field. Individuals check their work at multiple levels. But as the components are brought together, the ability to check gets more and more difficult. Remember, these pieces are, themselves, neither unique nor absolutely accurate. As we consider the Earth’s climate, the question becomes, what do we check against? We get to the question of the quality of measurements - just how good are they? And we get to the social problem that as the climate model is built from its pieces developed by individuals how do we define and codify a process that rises to the standard of validation? As this process becomes more and more complex, we are often moved to using the word “art” to describe the process of building models (see 
The Art of Climate Modeling ). 
The final point that I want to make in this entry is that the culture of checking that scientists intuitively accept as individuals extends as an essential ingredient to the collective development of complex software systems. There are, in the development of these complex software systems, tensions that are not rationalized by convergent, deductive reasoning. These tensions might represent the choice of the quality of the oceanic circulation versus the quality of the atmospheric circulation in a computationally and human resource constrained environment. These tensions might abstract to conflicts between oceanographers and meteorologists, perhaps even the program managers that fund oceanographers and meteorologists. 
The description and codification of a validation plan for a climate model, therefore, extends far beyond the definition of a set of observations that uniquely or adequately define the Earth’s climate. There are judgments and decisions that need to be made. There are tensions, perhaps conflicts, which need to be reconciled. There are even philosophical discussions about whether or not climate models can be validated. If the open innovation communities I am exploring in this series are to be realized, then the description and codification of the validation process is necessary. Beyond the narrow world of scientists, we need to be able to point to the elements and measurements of validation in order to provide the foundation of the use of models in 
mitigation, adaptation, and geo-engineering.
r
 Figure 1.
Figure 1. From 
Online Math Tutor.