Glossary

Let me know of anything else that should be clarified here, or anything that needs further simplification for those completely new to computers and programming, obviously within reason and assuming they have access to several other sources of educational material. Of particular importance are any words and phrases used elsewhere in this document and/or the sources of Phix that might be slightly ambiguous.

abstraction The simple act of giving something a clear and intuitive name. Often this term is explained in the most baroque, confusing, and unhelpful way possible, especially layers within layers to hide and protect things. I would argue it is all about careful choices that will avoid distraction and let you focus on other matters, and that the real benefits are not usually immediate but months or years from now.
allocation Can mean reserving a block of memory (see allocate()) or a register.
assignment The simple act of storing a value in a variable.
atom A variable that can hold an integer or a floating point value.
BOM A Byte Order Mark is a special byte sequence at the start of a text file which indicates UTF8, UTF16LE/BE, or UTF32. See edita.exw/readFile() and ptok/loadFile() for supported values.
calling convention Most entry points in the run-time VM (virtual machine) require parameters in specific registers and precise stack content. These are documented in the sources (builtins\VM). I use builtins\VM\pHeap.e\::pGetMem as my go-to place for a quick recap on the various operating-system-api-specific calling conventions.
compilation In Phix, this specifically means creating an executable file. See also interpretation.
compiler directive A command line option or programming statement that does not directly generate code but instead instructs the compiler how to subsequently generate code. Examples include "with trace", "-c", and "format".
dword-sequence Specifically used to indicate a sequence which is not a string but (on 32-bit) 4 bytes per element. On 64-bit, they are (of course) technically qword-sequences, 8 bytes per element, but are still usually referred to as dword-sequences. You cannot natively declare a variable as dword-sequence-but-not-string, however the following user defined type has the same effect:
    type dword_sequence(object o)
        return (sequence(o) and not string(o))
    end type

You can (sometimes) alternatively use a #isginfo{} statement to instruct the compiler to perform an equivalent compile-time check.
era The effective return address, a low-level implementation detail of how the debugger actually works. Normally, in day-to-day use of Phix, the era is correctly translated to a hll line number and is of no concern to you.

Quite often the era is just the same as a usual return address. However, suppose that opApnd invokes opDealloc, and something goes wrong in the latter. Rather than using the real return address into the middle of opApnd, we want an era that can be converted to a hll line number. Or perhaps during the statement s[1][2][3][4][5][6] = x, we might find that s[1][2][3][4] is 0, and need to issue an "attempt to subscript an atom" error when 4 out of 6 subscripts have been processed (popped from the stack). While it may have little or no impact on runtime performance, correctly maintaining an era (propagating it through nested calls) may consume a not entirely insignificant amount of effort when modifying the low-level back end. If an error message shows line number -1, your first suspicion should be that some low level code is not getting the era quite right.

Note that era often have a -1 applied. Consider the following (fictional) listing fragment
                    112     n = 
                                ...
                                call :%pStoreFlt        ;#004020145
                    113     s = t
                                add [ebx+esi*4-16],1    ;#00402014A
            
ie pStoreFlt preserves all registers, and the first instruction for line 113, assuming the compiler knows that [t] is already in esi and that t is not unassigned, is to increase the reference count on it. If we catch an exception at #402014A, we want to point the user at line 113, however if there is a problem in pStoreFlt, then rather than use the actual return address of #402014A we use an era of #4020149, to point the error message at line 112. Naturally the above instructions would normally work just fine, but if n/t/esi have been corrupted somehow, we need to say something.

All memory allocations also have an era stored against them for use in memory leak checking and suchlike.
false positives When an antivirus(AV) program proclaims there is a problem in a file, but in fact there is nothing wrong with it. The ever increasing rate of malware production has forced AV makers to adopt "heuristic" and "reputation" based mechanisms, with an unavoidable hike in false positives, and is the lastest bane of many a developer. One in particular has caused me problems: Avast/evo-gen. The good news is that once a program "matures", or reaches a certain size, the problem stops. Obviously I report these as and when I see fit, but response times are painfully slow. While submission may or may not achieve miracles, not doing so almost guarantees that things will simply never get fixed. See also Recommended Tools for links to several on-line scanners, which obviously would not exist if all or even any AV were perfect.

Some of the (smaller) included demos have a trailing "--/**/include ..\test\t02parms.exw" or similar as that can make some false positives go away. My hope is that any such measures prove to be temporary.
float Specifically used to indicate an atom which is not an integer but (on 32-bit) a 64 bit floating point value, ranging from approximately -1.79e308 to +1.793308 with around 15 decimal digits of precision. On 64-bit, floats are 80-bit (tbyte), ranging from roughly -1.18e4932 to +1.18e4932 with about 19 decimal digits of precision. You cannot natively declare a variable as float-but-not-integer, however the following user defined type has the same effect:
    type float(object o)
        return (atom(o) and not integer(o))
    end type

You can (sometimes) alternatively use a #isginfo{} statement to instruct the compiler to perform an equivalent compile-time check. Note however that the result of 1.5+1.5 is the integer 3, not the float 3.0, hence the above type is highly likely to cause a wholly unnecessary type check failure at some point and rather rudely terminate the application.
gvar Internal compiler/back-end term. Constants, file-level, global, and any unnamed temporary variables in top-level code are all gvars. They are stored in a static table at the start of the data section, see pemit2.e/filedump.exw for more details. See also/contrast with tvar.
integer A variable that (on 32-bit) can hold a 31-bit signed integer value in the range -1073741824..1073741823 (#C0000000..#3FFFFFFF). On 64-bit, the 63-bit signed integer range is -4611686018427387904..4611686018427387903 (#C000000000000000..#3FFFFFFFFFFFFFFF).
interpretation Running a program "from source". Technically speaking, interpretation is in fact compilation, but without creating an executable file. Some other steps are also omitted, in particular populating the symbol table with actual names instead of ternary tree indexes, unless there is an error that needs reporting, and the re-use of some builtin routines that are already available as part of the compiler. In many cases an interpreted program runs just as fast as a compiled one, quite unlike other interpreted programming languages.
isginfo Not recommended for general use. An internal compile-time type check. Statements of the form #isginfo{name,0b0101,MIN,MAX,integer,-2} are checked at the end of compilation in an attempt to ensure a variable is being assigned the right kinds of things and will require the minimum of run-time type-checks. They were introduced to streamline the compiler development, where they are still quite widely used, but proved less than spectacularly successful. All too often the only sensible way to create a new statement is to copy an existing one and then copy the actual details from the resulting compiler error message. They can often trigger in subtle and irrelevant ways and can be extremely difficult to debug. That said, sometimes the compiler ones do indeed catch stupid coding errors in a fairly helpful and immediate manner. There is also a full (t49, 800+ lines) test file, run as part of p -test, dedicated entirely to #isginfo{}. By the time you know enough to use these in anger, you won’t need me to tell you what each of the individual fields mean, and if you have to ask you’re probably not ready.
lint The command line -lint option is not formally supported. It performs a few extra checks which may be helpful, but if anything untoward happens my advice is simply going to be "well, don't use -lint then".

When you compile a program, some extra analysis takes place which gives scope for a few more error messages. By far the biggest use of the internal flag behind the -lint option is to make an interpret perform the same checks as a compile, and obviously that is not massively helpful - you may as well just use -c instead.

Otherwise, -lint causes additional warnings when files are auto-included, routines are implicitly forward referenced, or when functions may modify something being modified on return, and disables "without warning". Occasionally implicit forward references can go wrong (especially wrt local/global assumptions) and adding the appropriate include/explicit forward declarations can help. Sometimes a statement such as "table[i] = modify_table()" is going to use an outdated index when the function call returns, and of course a misjudged "without warning" may be hiding something you really ought to see.

More often than not, "fixing" such messages will achieve absolutely nothing. Wasting time getting a "clean lint" is unlikely to be very productive, except for ensuring that the next time you use -lint it only shows new findings.
namespace A namespace allows the programmer to explicitly specify in which source file, or source sub-tree, a particular identifier is declared. This is most useful when a particular global identifier occurs in more than one file, but might just be there to clarify intent. Traditionally a namespace is declared using "as <namespace>" on an include statement, but it can also be declared at the start of the included file. Note that a namespace is always deemed local rather than global and may therefore require a source file to be re-included by any source files that want to qualify a reference with it. See scope.
qword-sequence See dword-sequence. The use of qword-sequence implies a 64-bit-only situation.
register In a 32-bit x86 application, one of eight temporary storage places in the heart of the physical CPU (Central Processing Unit), or 16 in a 64-bit X64 application. Careful selection of these ("register allocation") can significantly improve performance.

The existing method (a naive, most recently used affair, in pilx86.e) is adequate rather than exceptional and is overdue for a complete rewrite (to a suitably lightweight linear scan or something similar, but definitely not graph colouring).

There are also several floating-point and SSE (etc) registers, however the use of an unqualified "register" usually means one of the integer-only registers in the main CPU.
scope The term scope is a general concept used when describing how the compiler resolves references to identifiers. In most cases it is simple and intuitive, involving nothing more than plain old common sense. It may be helpful to think of "in scope" as simply meaning "does not cause a compilation error".

The scope of an identifier is where, in terms of which lines of code, it can be referenced.
The scope of a variable starts at the point of declaration and ends at the end of the declaring block, routine, file, or in the case of globals, on the last line of the main file.
The scope of a routine is anywhere in the file it is declared in, or in the case of global routines, anywhere in the entire application.
Namespaces and explicit forward declarations can be used to further qualify and clarify matters. See scope.
sequence A variable-length array of elements, which can be integer, float, string, or a nested sub-sequence, to any depth. A variable declared as sequence can also hold a string but not vice versa.
string A variable-length array of 8-bit characters/bytes/integers with values from 0 to 255. Strings can be stored in variables declared as sequence, however variables declared as string cannot hold a dword-sequence. Strings can also be used to hold raw binary data, though you would normally use allocate() for that. Likewise unicode text other than UTF8 is normally best kept in raw memory or a dword-sequence.
tbyte When I use tbyte, I really mean ten-byte, ie an 80-bit float. This is the same meaning as FASM, and has been copied into the inline assembler of Phix. Thankfully it does not cause any known conflicts, but in the crazy C/C++-land, tbyte or TBYTE is often used to mean "an 8 or 16-bit byte" (I kid thee not). We are probably all wrong.
TCB Thread Control Block. Used in and private to builtins/VM/pHeap.e, contains tables of owned and non-owned lists of free blocks, which minimises lock contention by allocate and free and their low-level internal equivalents (:%pAllocStr etc).
technicalia A made-up word loosely meaning one of "technical details you do not need to know for day-to-day use" or "some useless trivia" or "extremely unlikely scenario" or "covering my ass", or all four. Several pages in this document end with a technicalia drop-down. They are initially hidden to avoid breaking the flow of my (or more acurately Rob Craig’s) otherwise achingly beautiful prose. They can be skipped on first reading, may help when you hit a difficult problem, and above all are not worth getting all hot and bothered about, he says hopefully.
threadstack An outdated internal compiler/back-end term, from before making tvars ebp-relative. Where this is still used, it probably just means the gvar table, or quite likely should just be completely ignored.
tvar Internal compiler/back-end term. Routine parameters, local variables, and any unnamed temporaries required inside a routine are all tvars. They are stored relative to ebp, with storage for them created by opFrame and destroyed by opRetf, see the comments in builtins/VM/pStack.e for more details. See also/contrast with gvar. Note that while gvars can have an associated compile-time value for constant and assignment-on-declaration purposes, that is not the case for tvars, which only have run-time values.
UTF8 An international standard for holding non-ascii text. Phix source files may be stored as UTF8 (but not UTF16) and Edita can edit UTF8/16 files seamlessly, as long as they begin with a proper BOM. Note that UTF8 string characters can be difficult to process using normal subscripts; since unicode characters can be composed of more than one byte, the third character (for instance) is not necessarily stored at s[3]. However substring matching, and subsequent replacement, generally works perfectly well, without any known issues.
VM The term Virtual Machine has several meanings in the computer world; quite commonly it is used for a software imitation of an entire operating system running inside another. However in Phix it is simply a collection of software components that augment the physical hardware and wrap many (but not all) OS-specific requirements. These components can be found in the builtins\VM directory. For example the hardware add can only operate on two integers and fadd on two floats, whereas :%opAdd (in builtins\VM\pMath.e) can operate on any combination of integer and float, and yield either an integer or floating point result.
word When I use "word", I usually mean a 16-bit value (and dword/qword for 32/64 bits). Many academic papers use it to mean "machine word", which of course these days means "32 or 64 depending on how you compiled it", as opposed to the physical hardware.