Variable declarations have a type name followed by a list of the variables being declared. For example,
object a global integer x, y, z procedure fred(sequence q, sequence r)The types: object, sequence, string, atom, and integer are predefined. Variables of type object may take on any value. Those declared with type sequence must always be sequences. Those declared with type string must always be strings (with every element fitting in a byte). Those declared with type atom must always be atoms.
Those declared with type integer must be atoms with integer values from -1073741824 to +1073741823 inclusive. You can perform exact calculations on larger integer values, up to about 15 decimal digits, but declare them as atom, rather than integer.
To augment the predefined types, you can create user-defined types. All you have to do is define a single-parameter function, but declare it with type ... end type instead of function ... end function. For example,
type hour(integer x) return x>=0 and x<=23 end type hour h1, h2 h1 = 10 -- ok h2 = 25 -- error! program aborts with a messageVariables h1 and h2 can only be assigned integer values in the range 0 to 23 inclusive. After each assignment to h1 or h2 the interpreter will call hour(), passing the new value. The value will first be checked to see if it is an integer (because of "integer x"). If it is, the return statement will be executed to test the value of x (i.e. the new value of h1 or h2). If hour() returns true, execution continues normally. If hour() returns false then the program is aborted with a suitable diagnostic message.
"hour" can be used to declare subroutine parameters as well:
procedure set_time(hour h)set_time() can only be called with a reasonable value for parameter h, otherwise the program will abort with a message.
A variable’s type will be checked after each assignment to the variable (except where the compiler can predetermine that a check will not be necessary), and the program will terminate immediately if the type function returns false. Subroutine parameter types are checked each time that the subroutine is called. This checking guarantees that a variable can never have a value that does not belong to the type of that variable.
- Notes:
- Technically a user defined type can only be used to prove an object is definitely not an instance of that type. While 4 is certainly an integer, it could be an hour, or a minute, or second; there is no way to tell. If a more precise implementation is required, instances must be explicitly tagged, say {HOUR,4} or {MINUTE,4} where HOUR and MINUTE are application-unique constants (you could in theory use the routine_id of the type definitions, or strings). Such a scheme might still yield "false positives" if the application contains any untagged sequences. In my experience such "more advanced" type systems catch very few extra bugs, and indeed sometimes implement more bugs than they ever catch - but don’t let my opinion dissuade you - properly tagged user defined types remain a perfectly valid tactic in the war against bugs, and indeed [temporary] human-readable tags (ie strings) can greatly simplify debugging.
Of course C and C-derived languages force the programmer to specify the exact low-level type of every value over and over and over again, and astonishingly as soon as they make the slightest mistake the compiler quietly "fixes" it for them using something called type coercion, which almost inevitably leads to a catastrophic failure that can be extremely difficult to track down. Obviously a programmer with decades of experience handles such things with ease, but that is clearly not the case for newcomers. Phix turns the whole notion of types on it’s head: they are there to help the programmer, not punish them, and in no cases whatsoever, under any circumstances, do phix types alter the meaning of a value by discarding precision or treating a slightly negative number as a massive positive number, or any other such nonsense.Type checking can be turned on or off between subroutines using the with type_check or without type_check special statements. It is initially on by default.
- Note to Benchmarkers:
- When comparing the speed of phix programs against programs written in other languages, you should specify without type_check at the top of the file. This gives phix permission to skip run-time type checks, thereby saving some execution time. All other checks are still performed, e.g. subscript checking, uninitialized variable checking etc. Even when you turn off type checking, phix reserves the right to make checks at strategic places, since this can actually allow it to run your program faster in many cases. So you may still get a type check failure even when you have turned off type checking. Whether type checking is on or off, you will never get a machine-level exception. You will always get a meaningful message from phix when something goes wrong. ( This might not be the case when you poke directly into memory, or call routines written in C or machine code. )
For small programs, there is little advantage to defining new types, and beginners may wish to stick with the five predefined types, or even declare all variables as object.
For larger programs, strict type definitions can greatly aid the process of debugging. Logic errors are caught closer to their source and are not allowed to propagate in subtle ways throughout the rest of the program. Furthermore, it is much easier to reason about the misbehavior of a section of code when you know the variables involved always have a legal/plausible value, albeit perhaps not precisely that desired.
Types also provide meaningful, machine-checkable documentation about your program, making it easier for you or others to understand your code at a later date. Combined with the subscript checking, uninitialized variable checking, and other checking that is always present, strict run-time type checking makes debugging much easier in phix than in most other languages. It also increases the reliability of the final program since many latent bugs that would have survived the testing phase in other languages will have been caught by phix.
- Anecdote 1:
- In porting a large C program to Euphoria (on which Phix is based), a number of latent bugs were discovered. Although this C program was believed to be totally "correct", Rob found: a situation where an uninitialized variable was being read; a place where element number "-1" of an array was routinely written and read; and a situation where something was written just off the screen. These problems resulted in errors that were not easily visible to a casual observer, so they had survived testing of the C code.
- Anecdote 2:
- The Quick Sort algorithm presented on page 117 of Writing Efficient Programs by Jon Bentley has a subscript error! The algorithm will sometimes read the element just before the beginning of the array to be sorted, and will sometimes read the element just after the end of the array. Whatever garbage is read, the algorithm will still work - this is probably why the bug was never caught. But what if there isn’t any (virtual) memory just before or just after the array? Bentley later modifies the algorithm such that this bug goes away -- but he presented this version as being correct. Even the experts need subscript checking!
- Performance Note:
- When typical user-defined types are used extensively, type checking adds only 20 to 40 percent to execution time. Leave it on unless you really need the extra speed. You might also consider turning it off for just a few heavily-executed routines. Profiling can help with this decision.