abstraction |
The simple act of giving something a clear and intuitive name. Often this term is explained in the most baroque,
confusing, and unhelpful way possible, especially layers within layers to hide and protect things. I would argue
it is all about careful choices that will avoid distraction and let you focus on other matters, and that the real
benefits are not usually immediate but months or years from now.
|
allocation |
Can mean reserving a block of memory (see allocate()) or a register.
|
assignment |
The simple act of storing a value in a variable.
|
atom |
A variable that can hold an integer or a floating point value.
|
BOM |
A Byte Order Mark is a special byte sequence at the start of a text file which indicates UTF8, UTF16LE/BE,
or UTF32. See edita.exw/readFile() and ptok/loadFile() for supported values.
|
calling convention |
Most entry points in the run-time VM (virtual machine) require parameters in specific registers and precise stack content.
These are documented in the sources (builtins\VM). I use builtins\VM\pHeap.e\::pGetMem as my go-to place for a quick
recap on the various operating-system-api-specific calling conventions.
|
compilation |
In phix, this specifically means creating an executable file. See also interpretation.
|
compiler directive |
A command line option or programming statement that does not directly generate code but instead instructs the compiler how to
subsequently generate code. Examples include "with trace", "-c", and "format".
|
dword-sequence |
Specifically used to indicate a
sequence which is
not a
string but (on 32-bit) 4 bytes per element.
On 64-bit, they are (of course) technically qword-sequences, 8 bytes per element, but are still usually referred to as dword-sequences.
You cannot natively declare a variable as dword-sequence-but-not-string, however the following user defined type has the same effect
(in fact pGUI.e has a very similar private dword_seq type).
type dword_sequence(object o) return (sequence(o) and not string(o)) end type You can (sometimes) alternatively use a #isginfo{} statement to instruct the compiler to perform an equivalent compile-time check. |
era |
The effective return address, a low-level implementation detail of how the debugger actually works. Normally, in day-to-day use of
phix, the era is correctly translated to a hll line number and is of no concern to you.
Quite often the era is just the same as a usual return address. However, suppose that opApnd invokes opDealloc, and something goes wrong in the latter. Rather than using the real return address into the middle of opApnd, we want an era that can be converted to a hll line number. Or perhaps during the statement s[1][2][3][4][5][6] = x, we might find that s[1][2][3][4] is 0, and need to issue an "attempt to subscript an atom" error when 4 out of 6 subscripts have been processed (popped from the stack). While it may have little or no impact on runtime performance, correctly maintaining an era (propagating it through nested calls) may consume a not entirely insignificant amount of effort when modifying the low-level back end. If an error message shows line number -1, your first suspicion should be that some low level code is not getting the era quite right. Note that era often have a -1 applied. Consider the following (fictional) listing fragment 112 n =ie pStoreFlt preserves all registers, and the first instruction for line 113, assuming the compiler knows that [t] is already in esi and that t is not unassigned, is to increase the reference count on it. If we catch an exception at #402014A, we want to point the user at line 113, however if there is a problem in pStoreFlt, then rather than use the actual return address of #402014A we use an era of #4020149, to point the error message at line 112. Naturally the above instructions would normally work just fine, but if n/t/esi have been corrupted somehow, we need to say something. All memory allocations also have an era stored against them for use in memory leak checking and suchlike. |
false positives |
When an antivirus(AV) program proclaims there is a problem in a file, but in fact there is nothing wrong with it.
The ever increasing rate of malware production has forced AV makers to adopt "heuristic" and "reputation" based
mechanisms, with an unavoidable hike in false positives, and is the lastest bane of many a developer.
One in particular has caused me problems: Avast/evo-gen. The good news is that once a program "matures", or
reaches a certain size, the problem stops. Obviously I report these as and when I see fit, but response times
are painfully slow. While submission may or may not achieve miracles, not doing so almost guarantees that things
will simply never get fixed. See also Recommended Tools for links to
several on-line scanners, which obviously would not exist if all or even any AV were perfect.
Some of the (smaller) included demos have a trailing "--/**/include ..\test\t02parms.exw" or similar as that can make some false positives go away. My hope is that any such measures prove to be temporary. |
float |
Specifically used to indicate an atom which is not an integer but (on 32-bit) a
64 bit floating point value, ranging from approximately -1.79e308 to +1.793308 with around 15 decimal digits of precision.
On 64-bit, floats are 80-bit (tbyte), ranging from roughly -1.18e4932 to +1.18e4932 with about 19 decimal digits of precision.
You cannot natively declare a variable as float-but-not-integer, however the following user defined type has the same effect:
type float(object o) return (atom(o) and not integer(o)) end type You can (sometimes) alternatively use a #isginfo{} statement to instruct the compiler to perform an equivalent compile-time check. Note however that the result of 1.5+1.5 is the integer 3, not the float 3.0, hence the above type is highly likely to cause a wholly unnecessary type check failure at some point and rather rudely terminate the application. |
gvar |
Internal compiler/back-end term. Constants, file-level, global, and any unnamed temporary variables in top-level code are
all gvars. They are stored in a static table at the start of the data section, see pemit2.e/filedump.exw for more details.
See also/contrast with tvar.
|
integer |
A variable that (on 32-bit) can hold a 31-bit signed integer value in the range -1073741824..1073741823 (#C0000000..#3FFFFFFF). On 64-bit, the 63-bit signed integer range is -4611686018427387904..4611686018427387903 (#C000000000000000..#3FFFFFFFFFFFFFFF). |
interpretation |
Running a program "from source". Technically speaking, interpretation is in fact compilation, but without creating an executable file.
Some other steps are also omitted, in particular populating the symbol table with actual names instead of ternary tree indexes, unless
there is an error that needs reporting, and the re-use of some builtin routines that are already available as part of the compiler.
In many cases an interpreted program runs just as fast as a compiled one, quite unlike other interpreted programming languages.
|
isginfo |
Not recommended for general use. An internal compile-time type check. Statements of the form
#isginfo{name,0b0101,MIN,MAX,integer,-2} are checked at the end of compilation in an attempt
to ensure a variable is being assigned the right kinds of things and will require the minimum
of run-time type-checks. They were introduced to streamline the compiler development, where
they are still quite widely used, but proved less than spectacularly successful. All too often
the only sensible way to create a new statement is to copy an existing one and then copy the
actual details from the resulting compiler error message. They can often trigger in subtle and
irrelevant ways and can be extremely difficult to debug. That said, sometimes the compiler ones
do indeed catch stupid coding errors in a fairly helpful and immediate manner. There is also a
full (t49, 800+ lines) test file, run as part of p -test, dedicated entirely to #isginfo{}.
By the time you know enough to use these in anger, you won’t need me to tell you what
each of the individual fields mean, and if you have to ask you’re probably not ready.
|
lint |
The command line -lint option is not formally supported. It performs a few extra
checks which may be helpful, but if anything untoward happens my advice is simply
going to be "well, don't use -lint then".
When you compile a program, some extra analysis takes place which gives scope for a few more error messages. By far the biggest use of the internal flag behind the -lint option is to make an interpret perform the same checks as a compile, and obviously that is not massively helpful - you may as well just use -c instead. Otherwise, -lint causes additional warnings when files are auto-included, routines are implicitly forward referenced, or when functions may modify something being modified on return, and disables "without warning". Occasionally implicit forward references can go wrong (especially wrt local/global assumptions) and adding the appropriate include/explicit forward declarations can help. Sometimes a statement such as "table[i] = modify_table()" is going to use an outdated index when the function call returns, and of course a misjudged "without warning" may be hiding something you really ought to see. More often than not, "fixing" such messages will achieve absolutely nothing. Wasting time getting a "clean lint" is unlikely to be very productive, except for ensuring that the next time you use -lint it only shows new findings. |
namespace |
A namespace allows the programmer to explicitly specify in which source file, or source sub-tree, a particular identifier is declared.
This is most useful when a particular global identifier occurs in more than one file, but might just be there to clarify intent.
Traditionally a namespace is declared using "as <namespace>" on an include statement, but it can also be declared at the start
of the included file. Note that a namespace is always deemed local rather than global and may therefore require a source file to be
re-included by any source files that want to qualify a reference with it. See scope.
|
qword-sequence |
See dword-sequence. The use of qword-sequence implies a 64-bit-only situation.
|
register |
In a 32-bit x86 application, one of eight temporary storage places in the heart of the physical CPU (Central Processing Unit),
or 16 in a 64-bit X64 application. Careful selection of these ("register allocation") can significantly improve performance.
The existing method (a naive, most recently used affair, in pilx86.e) is adequate rather than exceptional and is overdue for a complete rewrite (to a suitably lightweight linear scan or something similar, but definitely not graph colouring). There are also several floating-point and SSE (etc) registers, however the use of an unqualified "register" usually means one of the integer-only registers in the main CPU. |
scope |
The term scope is a general concept used when describing how the compiler resolves references to identifiers. In most cases it
is simple and intuitive, involving nothing more than plain old common sense. It may be helpful to think of "in scope" as simply
meaning "does not cause a compilation error".
The scope of an identifier is where, in terms of which lines of code, it can be referenced. The scope of a variable starts at the point of declaration and ends at the end of the declaring block, routine, file, or in the case of globals, on the last line of the main file. The scope of a routine is anywhere in the file it is declared in, or in the case of global routines, anywhere in the entire application. Namespaces and explicit forward declarations can be used to further qualify and clarify matters. See scope. |
sequence |
A variable-length array of elements, which can be integer, float, string, or a nested sub-sequence, to any depth.
A variable declared as sequence can also hold a string but not vice versa.
|
string |
A variable-length array of 8-bit characters/bytes/integers with values from 0 to 255. Strings can be stored in
variables declared as
sequence, however variables declared as string
cannot
hold a
dword-sequence.
Strings can also be used to hold raw binary data, though you would normally use allocate() for that.
Likewise unicode text other than UTF8 is normally best kept in raw memory or a dword-sequence.
|
tbyte |
When I use tbyte, I really mean ten-byte, ie an 80-bit float. This is the same meaning as FASM, and has been
copied into the inline assembler of phix. Thankfully it does not cause any known conflicts, but in the crazy
C/C++-land, tbyte or TBYTE is often used to mean "an 8 or 16-bit byte" (I kid thee not).
We are probably all wrong.
|
TCB |
Thread Control Block. Used in and private to builtins/VM/pHeap.e, contains tables of owned and non-owned
lists of free blocks, which minimises lock contention by allocate and free and their low-level internal
equivalents (:%pAllocStr etc).
|
technicalia |
A made-up word loosely meaning one of "technical details you do not need to know for day-to-day use" or
"some useless trivia" or "extremely unlikely scenario" or "covering my ass", or all four. Several pages in
this document end with a technicalia drop-down. They are initially hidden to avoid breaking the flow of my
(or more acurately Rob Craig’s) otherwise achingly beautiful prose. ![]() |
threadstack |
An outdated internal compiler/back-end term, from before making tvars ebp-relative. Where this is
still used, it probably just means the gvar table, or quite likely should just be completely ignored.
|
tvar |
Internal compiler/back-end term. Routine parameters, local variables, and any unnamed temporaries required
inside a routine are all tvars. They are stored relative to ebp, with storage for them created by opFrame
and destroyed by opRetf, see the comments in builtins/VM/pStack.e for more details.
See also/contrast with gvar. Note that while gvars can have an associated compile-time value for constant
and assignment-on-declaration purposes, that is not the case for tvars, which only have run-time values.
|
UTF8 |
An international standard for holding non-ascii text. Phix source files may be stored as UTF8 (but not UTF16)
and Edita can edit UTF8/16 files seamlessly, as long as they begin with a proper BOM. Note that UTF8 string
characters can be difficult to process using normal subscripts; since unicode characters can be composed of
more than one byte, the third character (for instance) is not necessarily stored at s[3]. However substring
matching, and subsequent replacement, generally works perfectly well, without any known issues.
|
VM |
The term Virtual Machine has several meanings in the computer world; quite commonly it is used for a software
imitation of an entire operating system running inside another. However in phix it is simply a collection of
software components that augment the physical hardware and wrap many (but not all) OS-specific requirements.
These components can be found in the builtins\VM directory. For example the hardware add can only operate on
two integers and fadd on two floats, whereas :%opAdd (in builtins\VM\pMath.e) can operate on any combination
of integer and float, and yield either an integer or floating point result.
|
word | When I use "word", I usually mean a 16-bit value (and dword/qword for 32/64 bits). Many academic papers use it to mean "machine word", which of course these days means "32 or 64 depending on how you compiled it", as opposed to the physical hardware. |