Expand/Shrink

unit_test

The file builtins/unit_test.e (an autoinclude) implements a simple unit testing framework for Phix.

Unit testing should be a vital weapon in any half-decent programmer’s arsenal, and can be key to a fail fast approach.
It should be just as easy to write a (permanent) unit test as it is to perform that test (once) manually.

The theory is simply that if all the components of a system are working correctly, then there is a much higher probability that a system using those components can be made to work correctly. I might also add that debugging a component in isolation is much easier than leaving it until everything else is ready.

Unit tests not only completely eliminate an otherwise extremely tedious phase of the release cycle, but also grant the confidence to make changes that could instead be just far too frightening to even contemplate. I can tell you with absolute certainty and utter seriousness that the phix compiler simply could not have been written without the help of unit testing, full stop.
While they do not actually use these routines, a quick scan of any of the sixty-odd tests\tnn***.exw files (which predate this) will reveal plenty of opportunities for using test_equal() and friends.
If I have to spend five or twenty-five minutes crafting the perfect test, and then never have to worry about it ever again, for me that sounds like an absolute bargain and a long-term timesaver that will repay that effort many times over in the months and years to come.

These routines are fully supported by pwa/p2js, except as noted below there is no way to write a logfile and no way to pause JavaScript execution, not that it should be particularly difficult to restructure some (console-based) code into "before" and "after" routines, with some (GUI-based) means of kicking off the "after" part, probably however needing some new and as yet unwritted "get_test_results" routine to explicitly enable/disable some buttons and hide/show some messages. The (un-paused) message displays do however all work just fine, and any program that uses these routines with 100% [hidden] success rates probably won’t need any modification at all.

Example:

test_equal(2+2,4,"2+2 is not 4 !!!!")
test_summary()

If all goes well, no output is shown, and the program carries on normally. You can easily force [summary/verbose] output, crash/prompt on fail, etc.

Note that I have used many of the same routine names as Euphoria, but the parameters are all different [esp their order] and therefore they are not compatibile...
In particular you have to give every test a name in Euphoria, whereas here such things are optional. Also, Euphoria works by putting tests in files named "t_*" and running eutest, whereas here they are part of the app, and will start failing on live systems (eg) if not properly installed.

constants

The following constants are automatically defined (in psym.e/syminit). TEST_QUIET is shown twice in italics to indicate it is part of all three sets, but obviously it is only actually defined once. The three sets correspond, in order, to the first three functions documented below them:

TEST_QUIET  = 0 -- (summary only when fail)
TEST_SUMMARY  = 1 -- (summary only [/always])
TEST_SHOW_FAILED  = 2 -- (summary + failed tests)
TEST_SHOW_ALL  = 3 -- (summary + all tests)

TEST_ABORT  = 1 -- (abort on failure, at summary)
TEST_QUIET  = 0 -- (carry on despite failure)
TEST_CRASH  = -1 -- (crash on failure, immediately)

TEST_PAUSE  = 1 -- (always pause)
TEST_QUIET  = 0 -- (never pause)
TEST_PAUSE_FAIL  = -1 -- (pause on failure)

routines

Apart from test_summary(), these are all optional.

procedure 
set_test_verbosity(integer level) -- set output verbosity

level: one of the first four TEST_XXX constants above, the initial default setting is TEST_QUIET

Note that (even) TEST_SHOW_ALL will not show successful tests with no name.
You can use integer level = get_test_verbosity() to retrieve the current setting.

procedure 
set_test_abort(integer abort_test) -- set test failure behaviour

abort_test: TEST_ABORT/TEST_QUIET/TEST_CRASH as follows

if TEST_ABORT then abort(1) on failure, after showing the summary,
if TEST_QUIET then quietly carry on (the default),
if TEST_CRASH then crash("unit test failure (name)"), immediately.
You may, of course, change this at will for critical and not-so-critical tests or test sections/modules.

The default setting of TEST_QUIET makes unit test failures a gentle but persistent reminder.
Setting TEST_CRASH ensures that the ex.err contains a direct link to the (first) offending test, and therefore causes Edita/Edix to jump directly to said source code line (admittedly that may not be so helpful when what you really want to do is undo/fix the last edit just made), but also forces you to fix any new problems immediately, before resuming work on anything else.
Setting TEST_ABORT can likewise inhibit working on anything else if the test_summary() is invoked early on, say as part of the initialisation routines, whereas, of course, if it is invoked as part of the shutdown process it may act more as a gentle reminder.
You can use integer abort_test = get_test_abort() to retrieve the current setting.

procedure 
set_test_pause(integer pause) -- set pause behaviour (at summary)

pause: TEST_PAUSE/TEST_QUIET/TEST_PAUSE_FAIL as follows

if TEST_PAUSE always pause,
if TEST_QUIET never pause,
if TEST_PAUSE_FAIL pause on failure (the default).
You can use integer to_wait = get_test_pause() to retrieve the current setting.
Under pwa/p2js the module always behaves as TEST_QUIET and invoking routine this has no effect whatsoever, since there is no way to pause JavaScript execution for keyboard input (doing so would be at odds with the underlying browser’s [gui] event loop), not that it should be particularly difficult to restructure some console-based code into something more GUI-friendly.

procedure 
set_test_logfile(string filename) -- set a log file

filename: the output file

If this routine is not called, all output is to stderr only.
The file is automatically closed via test_summary().
You can use integer log_fn = get_test_logfile() to retrieve the current open log file handle, or 0 if none,
in case there is something else you want to output to it (as '\n'-terminated plaintext lines).
Under pwa/p2js any calls to this routine are quietly ignored, since a web browser cannot (easily/realisticly) write a log file.

procedure 
set_test_module(string name) -- start a test section (optional)

name: the test section name

The section/module name is simply a hint to the programmer about where to go to fix some problem just introduced/detected, eg after

        set_test_module("logical")
        ...
        set_test_module("relational")
        ...
        set_test_module("regression")
        ...
        test_summary()

if you get (say)
          
        relational:
          test12 failed: 2 expected, got 4
         20 tests run, 19 passed, 1 failed, 95% success
 

then you know to look for the "test12" test in the relational section. When required, test_summary(false) is automatically invoked by set_test_module(), and you must invoke test_summary() at the end, or risk getting no output whatsoever.
Just for my sanity, set_test_section() is an alias of set_test_module() (in psym.e) and therefore obviously behaves absolutely identically.
Obviously you are free to use any appropriate section names, and if you never invoke set_test_module() they are all lumped in together.
Likewise it is perfectly fine if you cannot be bothered to name the individual tests, just potentially a bit more of a hunt should they trigger.

Suppose you have set_test_module("pGUI:paranormalise") - not actually using these routines but you can find a nice little unit test set there - I suggest you terminate it with set_test_module("pGUI:paranormalise ends"). Now, if there are no unit tests before the next call to set_test_module() [or test_summary()] then the "xxx ends" remains empty and is quietly ignored. Should anything land in such, then it is someone else’s problem for not invoking set_test_module() before starting their tests.

procedure 
test_equal(object a, b, string name="", [sequence args={},] bool_or_string bApprox=false, bool eq=true)

test two values for (approximate) equality

a, b: the two values which should be equal.
name: the test name (optional).
args: (optional) if not sequence-but-not-string params[5..6] := params[4..5], iyswim, otherwise if not {} a name=sprintf(name,args) is performed.
bApprox: use approximate equality, see below (optional).
eq: for internal use only: true means it is test_equal() rather than test_not_equal(), and vice versa.

When bApprox is false, everything must match exactly (the default). A value of true is the same as "%g", and other sprintf() format strings such as "%5.2f" can be specified, in which case it uses a recursive/nested (a==b) or (sprintf(fmt,a)==sprintf(fmt,b)) test, in other words the sprintf() are only triggered when actually needed and only apply to pairs of atoms in the same place/depth in (as yet at least) identical shapes/structures.

Omitting the name is probably only sensible when a and/or b already contain sufficient information to identify the failing test.
Normally the lowest-level error logging prints a and b using "%v", however when they are both "\nhelpful strings" that begin with '\n' (which could in fact also be the not-deliberately-planned consequence of a bApprox applied to two atoms) it will use "%s", which will effectively hide the '\n' but may hopefully help make some error messages much easier to read. Conversely there may be cases when you need to deliberately crop or artificially prefix leading '\n' to get the more precise form back.

procedure 
test_not_equal(object a, b, string name="", [sequence args={},] bool_or_string bApprox=false) -- test two values for inequality

a, b: the two values which should not be (approximately) equal.

Note this simply invokes test_equal(a,b,name,args,bApprox,false).

procedure 
test_true(bool success, string name="", sequence args={}) -- test something is true

success: a value which should be true
name: the test name (optional but recommended)
args: if not {}, name=sprintf(name,args) is performed

Unless set_test_abort(TEST_CRASH) is in force, omitting the name may make it harder to identify the failing test, although when there are only a few tests [in a section/module], that is unlikely to be a significant issue.
Invokes exactly the same private internal routine as the second half of test_equal() and the following three public routines (but all with differing args).

procedure 
test_false(bool success, string name="", sequence args={}) -- test something is false

success: a value which should be false

Equivalent to test_true(not success,name,args)

procedure 
test_pass(string name="", sequence args={}) -- a test has succeeded

Equivalent to test_true(true,name,args)

Typically used when some condition does not easily fit on one line, the main/only point of invoking this directly would be to see the number of passed tests (briefly) pop up or lie waiting as proof in some log file.

procedure 
test_fail(string name="", sequence args={}) -- a test has failed

Equivalent to test_true(false,name,args)

Typically used when some condition does not easily fit on one line.

procedure 
test_summary(bool close_log=true) -- show test summary (if appropriate)

close_log: do not provide/for internal use: allows set_test_module() to keep any log file open.

Optionally prints eg "20 tests run, 19 passed, 1 failed, 95% success\n", along with an optional "Press any key to continue..." pause, closes any log file, and/or aborts.
Should you forget to invoke this routine, and TEST_QUIET or TEST_SUMMARY is in force, it is quite possible that all failures may go completely unnoticed, and even with a higher setting they may flash up on the screen but disappear too quickly to be read, especially that is after set_test_pause(TEST_QUIET).
Use set_test_module() for all but the last in preference to invoking this several times, except just prior to abort(), crash(), etc.


Regarding BDD, TDD, etc.

Tests are brilliant, they really are, and inadequate tests are, well, just inadequate. TDD/BDD both risk approaching the problem from 100% the wrong angle, not that I have any objection to writing maybe 10% or even 20% of tests up-front.
The very best tests are those that promise (as best anything can) some customer will not re-experience the (apparently) exact same problem this time next week/month - every (non-trivial) maintenance task should result in a new unit test, whether that uses unit_test.e or is simply a crash() (/assert) statement. [Of course unit_test.e is intended to alert you about a problem without unnecessarily hindering development, whereas a crash() can often stop it dead in its tracks.]
Nearly as good are tests that actively assist in the development process by preventing the re-emergence of bugs that just bit you.
Utterly useless tests are the ones that can never trigger, and the big problem with TDD/BDD is they are placed front and centre, and worse should they in any way actively deter writing tests that utilise knowledge learnt during implementation.
Where TDD/BDD shines is when you know up-front that nothing will be learnt and there will be absolutely no innovation, which is the very definition of soul-destroying. The same applies to some forms of OOP that insist on the creation of entire classes so banal and trivial that TDD/BDD suffices. One good thing about TDD is that it deters the use of toolchains with primitive run-time diagnostics, ie C++, C, Assembler, vs. everything from Java and Python and of course Phix, which at least try to be helpful to the developer.