*imagines your kitties in their harnesses walking all pround amongst the other leatherfolk at Folsom Fair"
congrats to your Gram on ninety-nine years. that is quite a long time :)
So... unit testing.
Your example "here's how to unit test adding two numbers" is a bit of a strawman. Is that actually what was explained to you? The goal of unit testing is to test functional units. What are the smallest functional units you have in the programs you write? Do you have tests for them?
The principle values I see in unit testing is that they allow you to code with fewer bugs, but more importantly allow you to refactor, repurpose, and fix bugs in long-lived code with much greater code than if you don't have tests.
Googling for "unit testing examples", I have definitely seen adding two things as an example.
I get the basic idea of unit testing and I can see how it's useful if you're writing, say, a general-purpose library. My frustration is that it's not at all clear how it applies to scientific programming, and all the examples I can find are CS-oriented and therefore totally unhelpful.
What's the smallest functional unit in my code? I don't know! This is exactly the problem! I don't know how you decide what the smallest sensible chunk that's worth testing is, and nobody can give me any good guidelines.
I am vaguely starting to think that the answer is that unit testing per se is actually NOT very useful in scientific programming, that it's something that you use for what I'd call infrastructure programming, and that for scientific programming (which is usually not long-lived or refactored) you actually want to apply the same underlying principles but in a different way, doing something analogous that focuses not on the behavior of the code but instead on the structure of the data, because that's where all the longevity and mutation lie. But my ideas are all still very inchoate.
So... what makes "scientific programming" different from "programming"? I don't have any first-hand experience with this. If your code is really not long-lived, and you're only writing it to process a single data set then it seems silly to write unit-tests. Better to validate a number of key examples from your data set by hand.
I certainly don't write tests for all the code I write. However, if I have a module of code that I expect to be long-lived and called from multiple distinct applications, then I absolutely want tests for it that cover all the functionality of each application. Otherwise, it becomes really painful to modify the code to improve application A without worrying about breaking application B, and so on.
Today I wrote a program in NCL (a scripting language) that interpolates data from one grid to another using a particular library method. You give it the name of the file that has your input data in it, the name of the output file to store things in, and the name of a file that has all the weights for going from this grid to that grid. (Generated using scripts I wrote last week.)
This will be used to regrid a half-dozen different variables from a half-dozen different simulations, each of which has its own grid. I could code the looping over those different input into the script, but it's the kind of thing that I may re-use in a different context, so instead I make it generic, and just pass in the filenames from the command-line. (Using a convenience shellscript I wrote a long time ago that takes care of all the tedious argument quoting and so on.) I do the looping on the command line, so I have a file named NOTES that has chunks of commands I can cut-and-paste into an xterm. If that gets messy or long and I need to redo it a lot, that will probably mutate into a proper shell-script.
The interpolation script itself is pretty straightforward. It opens up the files and reads in data, hands it off to the library function, takes the results it gets back and cleans them up a little and tacks on some extra metadata, and then writes it all out to a file. If any of those steps goes wrong, the interpreter will probably spew a useful message and die. And I don't think adding tests buys me much -- either it's a problem with the way the code is being used (invocation / inputs), or it's a problem on par with "addition isn't working property".
If you view the entire program as being the smallest usable unit (going to a view where the programming really resides in the way that I composite together all these little scripts to apply to datasets, and viewing this one script as basically like a function), the problem is that generating a synthetic test case would be vastly more work than the analysis I'm trying to perform. And in some cases, it's not clear that I can even generate an automatically-validated synthetic test case. I can easily visually inspect actual data run through the regridding and verify that it looks "the same" before and after -- but I don't know what the numeric values ought to be if I were to gin up fake data to run through it. And if my output is not a datafile, but a visualization... how do you automatically test whether a plot looks sensible? And if the tests aren't automatable, what's the point?
So "read some data, call some library functions you don't own, write some data" seems too simple to test. But even such a simple task might have opportunities for testable modules.
For example, you might discover one day that your input files were corrupted in some way that your program passed through silently, giving you garbage data. And so maybe you want to write a validating input routine which will fail in these cases, that you will share among all modules that read the same input files.
Or maybe you own the library module and will be calling it from multiple scripts. In this case, you probably should write tests for it. In general, if you have a function or class that you expect to be long-lived and call from many different scripts, writing tests will help prevent that module from being brittle and keep the scripts running over time as you improve the module.
An example from my work which might be relevant: I write a lot of command line scripts that all take similar types of arguments. Some things I typically want to pass my scripts are dates, date ranges, sequences of primitive values (ints, floats, dates, strings), file paths, etc. I wrote a library of command-line argument validating routines. So now, in my scripts if I want to say "this argument is a float between 0 and 1" it's a one liner, but if the user supplies an invalid value the script fails immediately and gives a useful error message "argument foo should be between 0 and 1, you gave value X".
I wrote lots of test cases for these argument parsers to ensure that they handled various edge cases of valid and invalid input. I think it improved both the design and underlying code.
Thank you! That is really helpful!
Yeah, with my work, it's really not the code that needs testing, it's the data. Because that's what breaks things when it changes. Unfortunately it's not practical to validate it on the fly when I read it into a file. But I am doing a LOT of QC on the data files before they get analyzed, and it seems useful to think about the evolution of that QC process as being, essentially, TDD.
Definitely I will try to keep in mind that random snippets or functions I have that get used in more than one place are good candidates for unit testing, even if it wouldn't make sense to apply the framework to the whole program.