Expand/Shrink

Using Types

So far you have already seen some examples of variable types but now we will define types more precisely.

Variable declarations have a type name followed by a list of the variables being declared. For example,
    object a
    global integer x, y, z
    procedure fred(sequence q, sequence r)
The types: object, sequence, string, atom, and integer are predefined. Variables of type object may take on any value. Those declared with type sequence must always be sequences. Those declared with type string must always be strings (with every element fitting in a byte). Those declared with type atom must always be atoms.

Those declared with type integer must be atoms with integer values from -1,073,741,824 to +1,073,741,823 inclusive on 32 bit, or -4,611,686,018,427,387,904 to +4,611,686,018,427,387,903 on 64 bit. You can perform exact calculations on larger integer values, up to about 15 decimal digits, but declare them as atom, rather than integer.

To augment the predefined types, you can create user-defined types. All you have to do is define a single-parameter function, but declare it with type ... end type instead of function ... end function. For example,
    type hour(integer x)
        return x>=0 and x<=23
    end type
    hour h1, h2
    h1 = 10      -- ok
    h2 = 25      -- error! program aborts with a message
Variables h1 and h2 can only be assigned integer values in the range 0 to 23 inclusive.
After each assignment to h1 or h2 the interpreter will call hour(), passing the new value.
The value will first be checked to see if it is an integer (because of "integer x").
If it is, the return statement will be executed to test the value of x (i.e. the new value of h1 or h2).
If hour() returns true, execution continues normally.
If hour() returns false then the program is aborted with a suitable diagnostic message.

"hour" can be used to declare subroutine parameters as well:
     procedure set_time(hour h)
set_time() can only be called with a reasonable value for parameter h, otherwise the program will abort with a message.

A variable’s type is checked after each assignment to the variable, except where the compiler can predetermine that such a check is not necessary, and the program terminates immediately should the type function return false. Subroutine parameter types are checked each time that the subroutine is called.
This checking guarantees that a variable can never have a value that does not belong to the type of that variable.
pwa/p2js:
Note that JavaScript is a typeless language, you cannot even specify that a variable is or should be (say) an integer, and hence both the builtin and any user defined types are "in name only" (or perhaps more accurately "in comment only") under pwa/p2js, in that you can declare and explicitly check variables, but they are not automatically checked on assignment or modification, and hence do not (except in a tiny number of extremely rare cases) trigger runtime errors. Generally speaking the developer is expected to iron out any typecheck errors and other bugs on the desktop first, before attempting to run the program in a web browser - should you skip that and adopt a more direct edit/browser development cycle, that’s on you.
Notes:
Technically a user defined type can only be used to prove an object is definitely not an instance of that type.
While 4 is certainly an integer, it could be an hour, or a minute, or a second; there is simply no way to tell.

If a more precise implementation is required, instances must be explicitly tagged, say {HOUR,4} or {MINUTE,4} where HOUR and MINUTE are application-unique constants (you could in theory use the routine_id of the type definitions, or strings).
Such a scheme might still yield "false positives" if the application contains any untagged sequences.

In my experience such "more advanced" type systems catch very few extra bugs, and indeed sometimes implement more bugs than they ever catch, but don’t let my opinion dissuade you: properly tagged user defined types remain a perfectly valid tactic in the war against bugs, and indeed [temporary] human-readable string tags can greatly simplify debugging. Besides, your coding habits may very well differ from mine sufficiently to tip the balance of usefulness quite dramatically. One case where I did properly tag types was mpfr/gmp, and that proved to be very useful indeed.

Of course should you use classes or structs to hold your hours, minutes, and seconds, then precisely that kind of tagging and more is all performed automatically, however some other human-readability aspects are lost, as the ex.err typically show references/pointers instead of the actual values, and there can be significant additional overheads compared against the simpler native and more direct types.
Unlike other languages, the type of a variable does not affect any calculations on the variable. Only the value of the variable matters in an expression. The type just serves as an error check to prevent any "corruption" of the variable.
Of course C and C-derived languages quite literally and conservatively speaking have hundreds of millions of types, forcing the programmer to specify the exact low-level type of every value over and over and over again. Typically a programmer is expected to perform several unsafe type casts on almost every programming statement. Astonishingly as soon as they make the slightest mistake the compiler quietly "fixes" it for them using something called type coercion, aka applying an implicit or incorrectly specified unsafe cast, which rams the thing into place with a hammer if necessary and almost inevitably leads to a catastrophic failure that can be extremely difficult to track down. Obviously programmers with decades of experience handle such things with ease, but clearly not so for newcomers. Phix turns the whole notion of types on its head: they are there to help the programmer, not punish them. In fact it does not even have any real notion of type casting, since if you remember correctly it only has just five core builtin data types, and in no cases whatsoever, under any circumstances, do Phix types alter the meaning of a value by discarding precision or treating a slightly negative number as a massive positive number, or cause say 1,000,000 * 1,000,000 to be stored as 1,874,919,424 or any other such nonsense. Fear not however, it is perfectly straightforward to explicitly match such things should you really need to, say in a hash function.
It might not be entirely unfair to say that Phix user defined types are an explict embodiment of duck typing:
type duck(object x)
    return gait(x)=WADDLE and noise(x)=QUACK and shape(x)=DUCK
end type
duck d = wolf // crashes, eg `typecheck error: d is "wolf"`
              // aka you cannot put a wolf in a duck house
However duck typing usually means the polar opposite: taking control away from the programmer and bravely/stupidly carrying on long after things go wrong. A typical duck-typed language such as Python will quietly build a duck house bigger than the pond it is meant to be sitting on, and come Sunday lunch (assuming you are carnivorous) it will happily let you try and wring that wolf’s neck and pluck it, which certainly ain’t gona end well for you. On the other hand, strictly typed languages (such as anything C-based) may give you a clear compile-time error, which is great (and Phix will often do too), but anything that gets past the compiler will [as per 100*100 resulting in (byte) 16, from 1e4 being #2710 in hex] perform some horrendous cartoon-level violence to ram that big old hairy wolf into that tiny little duck house, and in the debugger things will likely be completely unrecognisable and all rather messy. In contrast, Phix types are designed to stop (leaving the poor old wolf unharmed and your good self uneaten) with a clear and readable (and catchable/loggable) runtime error as soon after and as close as possible to the initial error/mistake.

Type checking can be turned on or off between subroutines using the with type_check or without type_check special statements. The latter means that the above hour() routine does not get called and the x>=0 and x<=23 test is not performed, however it does not allow for instance a string to be stored in a variable declared as integer or cause the internal checking (or fatal errors) of that sort of thing to be disabled. Type checking is initially on by default.
Note to Benchmarkers:
When comparing the speed of Phix programs against programs written in other languages, specify without type_check at the top of the file, which gives Phix permission to skip run-time type checks, thereby saving some execution time.
All other checks are still performed, e.g. subscript checking, uninitialized variable checking etc.
Even when you turn off type checking, Phix reserves the right to make checks at strategic places, since this can actually allow it to run your program faster in many cases.
So you may still get a type check failure even when you have turned off type checking.
Whether type checking is on or off, you will never get a machine-level exception.
You will always get a meaningful message from Phix when something goes wrong. (This might not be the case when you poke directly into memory, or call routines written in C or machine code.)
The Phix method of defining types is much simpler than what you will find in most other languages, yet it provides the programmer with greater flexibility in defining the legal values for a type of data. Any algorithm can be used to include or exclude values. You can even declare a variable to be of type object which will allow it to take on any value. Routines can be written to work with very specific types, or very general types.

For small programs, there is little advantage to defining new types, and beginners may wish to stick with the five predefined types, or even declare all variables as object.

For larger programs, strict type definitions can greatly aid the process of debugging. Logic errors are caught closer to their source and are not allowed to propagate in subtle ways throughout the rest of the program. Furthermore, it is much easier to reason about the misbehavior of a section of code when you know the variables involved always have a legal/plausible value, albeit perhaps not precisely that desired.

Types also provide meaningful, machine-checkable documentation about your program, making it easier for you or others to understand your code at a later date. Combined with the subscript limits, uninitialized variable, and other checking that is always present, strict run-time type checking makes debugging much easier in Phix than in most other languages. It also increases the reliability of the final program since many latent bugs that would have survived the testing phase in other languages will have been caught by Phix.
Anecdote 1:
In porting a large C program to Euphoria (on which Phix is based), a number of latent bugs were discovered. Although this C program was believed to be totally "correct", Rob found: a situation where an uninitialized variable was being read; a place where element number "-1" of an array was routinely written and read; and a situation where something was written just off the screen. These problems resulted in errors that were not easily visible to a casual observer, so they had survived testing of the C code.
Anecdote 2:
The Quick Sort algorithm presented on page 117 of Writing Efficient Programs by Jon Bentley has a subscript error! The algorithm will sometimes read the element just before the beginning of the array to be sorted, and will sometimes read the element just after the end of the array. Whatever garbage is read, the algorithm will still work - this is probably why the bug was never caught. But what if there isn’t any (virtual) memory just before or just after the array? Bentley later modifies the algorithm such that this bug goes away -- but he presented this version as being correct. Even the experts need subscript checking!
Performance Note:
When typical user-defined types are used extensively, type checking adds only 20 to 40 percent to execution time. Leave it on unless you really need the extra speed. You might also consider turning it off for just a few heavily-executed routines. Profiling can help with this decision.