utf8_to_utf32
Definition: | sequence utf32 = utf8_to_utf32(string utf8, integer fail_flag=0) |
Description: | Convert a UTF-8 string to sequence of UTF-32 code points.
Returns a string, if the input was pure-ascii (all characters 0..127), or a sequence. |
pwa/p2js: | Note this is actually a null operation in JavaScript, since that supports utf8 strings natively. |
Comments: |
The output should not contain any elements outside the range 0..#10FFFF, or values in the range #D800..#DFFF
(since that range is reserved, across the board, for UTF-16 surrogate pairs). Any such values are replaced
with the element value #FFFD.
If the optional fail_flag is -1, the routine returns -1 rather than performing such substitutions (and obviously the result variable should be declared as type object). Note, however, the input can legally contain the substring "\#EF\#BF\#BD", which maps directly to #FFFD without error. String returns represent two possible savings: a potential four or eight-fold space saving, and no requirement to invoke utf32_to_utf8() when you’re done. |
See Also: | utfconv, utf32_to_utf8 |