Subscripts
A single element of a sequence may be selected by giving the element number
in square brackets. Element numbers start at 1. Many programmers weaned on 0-based indexes may automatically
and strenuously object to 1-based indexes, but personally I find that s[0] always being a subscript error
is a massive bonus in that it catches probably at least half of all those pesky off-by-1 bugs from the very
first time I run my program.
Phix uses 1-based indexing because:
In phix, s[1..1] is a length-1 slice containing the first element. In other (0-based) languages the equivalent is s[0:1] and while I am sure you can quickly get used to it, and it makes some sense in terms of length calculation and when viewing slice extents as the gaps between elements (without implying that fairness applies to the indexes to the elements themselves), the idea that start is 0-based and end is 1-based is clearly the very definition of deliberate insanity. Apart from matching that madness, I certainly cannot think of anything nice to say about the logic behind something like
There have been many debates about the use of 0-based compared to 1-based indexing:
Sequences with mixed values and strings work the same; both sequences and strings are mutable.
Non-integer subscripts are rounded down to an integer.
For example, if x contains {5, 7.2, 9, 0.5, 13} then x[2] is 7.2. Suppose we assign something different to x[2]:x[1][3] since x[1] is not a sequence. There is no limit to
the number of subscripts that may follow a variable, but the variable must contain sequences that are nested deeply enough.
The two dimensional array, common in other languages, can be easily represented with a sequence of sequences:
An expression of the form x[i][j] can be used to access any element, or if you prefer x[i,j] has exactly the same meaning.
Personally, I use "," when the subscripts have some equivalence, eg points in 3D space deserve [x,y,z], whereas I use "][" when they are logically distinct - almost always the case when one is variable and the other constant, such as shape[idx][COLOUR].
Dimensions are not symmetric however, since an entire "row" can be selected with x[i], whereas you need to use a library routine to select an entire column, such as vslice(), or perhaps columnize() with columns as a plain integer, aka phix is row-major.
Other logical structures, such as n-dimensional arrays, arrays of strings, structures, arrays of structures, and similar can also be handled easily and flexibly:
The number of elements in a sequence can be found by calling the length() function. For example:
Elements of s can also be referenced using -1 to -length(s), counting backwards from the end of the sequence, for example:
Attempts to reference s[-6] and below cause an index out of bounds error.
You can also use the $ shorthand to refer to the last element, for example
Exactly the same result is produced by x[-1][-1] and x[$][$].
Some earlier releases also allowed x[end][end], technically still available if you set constant ORAC in pmain.e to 0 and rebuild the compiler, but obviously it is just much easier to simply use -1 or $ instead.
Dot subscripts are also allowed, eg s.i.j is equivalent to s[i][j] (provided the constant ORAC in pmain.e is set to 1).
Using subscripts to alter a variable defined as a string to contain a non-character will cause a typecheck error. For a variable defined as a sequence that is currently assigned a string, the same operation will quietly auto-expand the string (one byte per character) to a dword-sequence so that the substitution can take place.
Phix data structures are almost infinitely flexible. Arrays in other languages are constrained to have a fixed number of elements, and those elements must all be of the same type. Phix eliminates both of those restrictions. You can easily add a new structure to the employee sequence above, or store an unusually long name in the NAME field and phix will take care of it for you. If you wish, you can store a variety of different employee "structures", with different sizes, all in one sequence.
Not only can a phix program easily represent all conventional data structures but you can create very useful, flexible structures that would be extremely hard to declare in a conventional language. See Phix vs Conventional Languages.
In general any expression may be subscripted, for example it would be perfectly legal to write
Phix uses 1-based indexing because:
- the first item of a (non-empty) sequence s is 1 and the last is length(s)
- numbering from the head 1..n is symmetrical to numbering from the tail -n..-1
- when searching a result of 0 means "item not found" (and certainly not -1 == "last")
- a slice is inclusive from the first index to the last index (and not half-in-half-out)
- there are no off-by-one complications as found with 0-based indexing
- there is no need to explain 1-based indexing to anyone and you certainly don’t need a diagram
- avoiding fencepost errors via the clumsy math of half-open intervals is just plain stupid
"idx<len"
checking both bounds, instead "idx>=1 and idx<=len"
must be used, plus abs() to permit -ve indexes.In phix, s[1..1] is a length-1 slice containing the first element. In other (0-based) languages the equivalent is s[0:1] and while I am sure you can quickly get used to it, and it makes some sense in terms of length calculation and when viewing slice extents as the gaps between elements (without implying that fairness applies to the indexes to the elements themselves), the idea that start is 0-based and end is 1-based is clearly the very definition of deliberate insanity. Apart from matching that madness, I certainly cannot think of anything nice to say about the logic behind something like
for i in range(0,7)
stopping when i is 6 (In phix you would just use
for i=0 to 6 do
or maybe for i in {0,1,2,3,4,5,6} do
and with some irony I accept that
tagstart(0,7) gets you that length-7 {0,..6}, and at least that 7 is explicitly a length and
not an "end of range", but otherwise there is no such "unexpected cropping").
Both a phix "i=1 to 5"
and a C "i=0; i<5;"
iterate 5 times, the stupid in the last point
refers to claims that "5-0"
is easy but "(5-1)+1"
is hard. The fencepost errors mentioned
are usually caused by performing mental arithmetic when none is needed at all... or worse, thinking you can
do that kind of math without thinking about it...There have been many debates about the use of 0-based compared to 1-based indexing:
- Zero-based indexing considered harmful
- Again on 0-based vs. 1-based indexing
- Lua, a misunderstood language
- Is Index Origin 0 a Hindrance? Roger Hui
- Thread on Julia google groups
Sequences with mixed values and strings work the same; both sequences and strings are mutable.
Non-integer subscripts are rounded down to an integer.
For example, if x contains {5, 7.2, 9, 0.5, 13} then x[2] is 7.2. Suppose we assign something different to x[2]:
x[2] = {11,22,33}Then x becomes: {5, {11,22,33}, 9, 0.5, 13}. Now if we ask for x[2] we get {11,22,33} and if we ask for x[2][3] we get the atom 33. If you try to subscript with a number that is outside of the range 1 to the number of elements, you will get a subscript error. For example x[0], x[-99] or x[6] will cause errors, as will
The two dimensional array, common in other languages, can be easily represented with a sequence of sequences:
x = { {5, 6, 7, 8, 9}, -- x[1] {1, 2, 3, 4, 5}, -- x[2] {0, 1, 0, 1, 0} -- x[3] }where we have written the numbers in a way that makes the structure clearer.
An expression of the form x[i][j] can be used to access any element, or if you prefer x[i,j] has exactly the same meaning.
Personally, I use "," when the subscripts have some equivalence, eg points in 3D space deserve [x,y,z], whereas I use "][" when they are logically distinct - almost always the case when one is variable and the other constant, such as shape[idx][COLOUR].
Dimensions are not symmetric however, since an entire "row" can be selected with x[i], whereas you need to use a library routine to select an entire column, such as vslice(), or perhaps columnize() with columns as a plain integer, aka phix is row-major.
Other logical structures, such as n-dimensional arrays, arrays of strings, structures, arrays of structures, and similar can also be handled easily and flexibly:
3-D array:
y = { {{ 1,1}, {3,3}, {5,5}}, {{ 0,0}, {0,1}, {9,1}}, {{-1,7}, {1,1}, {2,2}} } -- y[2][3][1] is 9
Array of strings:
s = {"Hello", "World", "Phix", "", "Last One"} -- s[3] is "Phix" -- s[3][2] is ’h’
A Structure:
employee = { {"John","Smith"}, 45000, 27, 185.5 }To access "fields" or elements within a structure it is good programming style to make up a set of constants that name the various fields. This will make your program easier to read. For the example above you might have:
constant NAME = 1 constant FIRST_NAME = 1, LAST_NAME = 2 constant SALARY = 2 constant AGE = 3 constant WEIGHT = 4You could then access the person’s name with employee[NAME], or if you wanted the last name you could say employee[NAME][LAST_NAME].
Array of structures:
employees = { {{"John","Smith"}, 45000, 27, 185.5}, -- a[1] {{"Bill","Jones"}, 57000, 48, 177.2}, -- a[2] -- .... etc. } -- employees[2][SALARY] is 57000
Accessing Sequence Elements:
The number of elements in a sequence can be found by calling the length() function. For example:
s = {’a’,’b’,’c’,’d’,’e’} j = length(s) -- j is now 5Individual elements of s can be referenced using an expression which returns a single positive integer from 1 to length(s), for example:
x = s[1] -- x is now ’a’ s[3] = ’Z’ -- s is now {’a’,’b’,’Z’,’d’,’e’}Attempts to reference s[0] or s[6] (and above) cause an index out of bounds error.
Elements of s can also be referenced using -1 to -length(s), counting backwards from the end of the sequence, for example:
x = s[-1] -- x is now ’e’ s[-3] = ’c’ -- s is now {’a’,’b’,’c’,’d’,’e’}In this way, negative indexes are simply an exact mirror image (right to left) of the more common (left to right) positive indexes.
Attempts to reference s[-6] and below cause an index out of bounds error.
You can also use the $ shorthand to refer to the last element, for example
x = {{1,2},{3,4}} y = x[$] -- y is now {3,4} y = x[end] -- ditto z = x[$][$] -- z is now 4The last line is equivalent to z = x[length(x)][length(x[length(x)])], and is obviously much clearer and shorter.
Exactly the same result is produced by x[-1][-1] and x[$][$].
Some earlier releases also allowed x[end][end], technically still available if you set constant ORAC in pmain.e to 0 and rebuild the compiler, but obviously it is just much easier to simply use -1 or $ instead.
Dot subscripts are also allowed, eg s.i.j is equivalent to s[i][j] (provided the constant ORAC in pmain.e is set to 1).
Using subscripts to alter a variable defined as a string to contain a non-character will cause a typecheck error. For a variable defined as a sequence that is currently assigned a string, the same operation will quietly auto-expand the string (one byte per character) to a dword-sequence so that the substitution can take place.
Phix data structures are almost infinitely flexible. Arrays in other languages are constrained to have a fixed number of elements, and those elements must all be of the same type. Phix eliminates both of those restrictions. You can easily add a new structure to the employee sequence above, or store an unusually long name in the NAME field and phix will take care of it for you. If you wish, you can store a variety of different employee "structures", with different sizes, all in one sequence.
Not only can a phix program easily represent all conventional data structures but you can create very useful, flexible structures that would be extremely hard to declare in a conventional language. See Phix vs Conventional Languages.
In general any expression may be subscripted, for example it would be perfectly legal to write
"AEIOU"[vowel_number]
, however this is not compatibile with Euphoria, plus it could
be quite wasteful to write something like {5+2,6-1,7*8,8+1}[3]. There are some exceptions, for
example you may not immediately follow a slice (see next) with a subscript.
(You could use parenthesis as a work-round, eg (s[2..4])[2] is equivalent to s[3], however the
latter would be significantly faster, as well as being much easier to read and write in the first place.)