Here's the Deal: The Next Stack Frame: What Exactly are you Testing Anyway?

Static/dynamic language debates continue ... spurred on by the continued success of LAMP and RoR. These discussions often evolve into an admission that the real issue is "What is being tested?, What is doing the testing?, and When?". For example, Bruce Eckel (a C++ and Java guru among other things) wrote a canonical post in 2003 asserting that dynamic languages seemed to him to offer greater programmer productivity, and although certain things were not being checked before runtime, they were in any case being checked. Cf. the whole RuntimeException and strongly typed exception debate in the Java world.

The corollary to this conclusion of course is that you don't want your users finding your bugs at runtime, so something of a test-driven or at least strongly test-supported approach is necessary. Testing and TDD are great. Just as great as compile-time checking. The problem is that the two are not testing the same things, or with the same coverage.

The specifics: static compile time checking typically covers 99% of the code for most applications (it normally excludes casts, dynamically generated code, and reflection among other things). From a user/feature point of view, it covers, well, 0%. So it covers a whole lot of things that are helpful, but not immediately relevant to the feature set.

Now look at a solid TDD-inspired test suite. Assume it adheres to best practices meaning coverage of all (functional) features (the core of TDD), multiple failure cases, non-trivial tests, etc. So we have coverage of 100% of features (again I'm excluding non-functional requirements like security, performance, design for the time being). And it will, through sheer exercise of code, verify a significant percentage of the formal code correctness, including type compatibility.

At this point, the problem and the solution are obvious (if hard to implement). PROBLEM: The test engine, including test suites, UI test runners like Selenium, components that generate random input data or events, load testing facilities, security analysis probes, etc. are still limited to producing an a posteriori analysis of a small set of cases. But these are only the positive cases (where positive includes success and defined failure modes) plus a tiny fraction of the negative (unspecified failure) cases. Type checking in strongly-typed languages provides a comforting a priori data set over nearly all cases.

SOLUTION: We need to create better analysis tools for all languages. We need a big step beyond the formal syntax checking, type checking, and lint analysis of code. Some more sophisticated approaches are going to be necessary. At the minimum, it would likely involve recursive runtime use (GUI design tools have "cheated" like this for years), dictionaries of data types, ranges, precision, etc., and a probability engine that can handle breaking up workflows into blocks in order to keep permutations within the compute capability of the hardware, and then recombining them along with a risk analysis of what's thereby been left out.

A lot of work has already gone into finding the boundaries of this problem, especially in academia and where languages (e.g. Scheme) with heavy academic use are concerned. Traditional static analysis and more importantly some human user hints will need to set up the constraints, so that the testing does not devolve into "testing any string that can be passed to eval"