Expand/Shrink

utf8_to_utf32

Definition: sequence utf32 = utf8_to_utf32(string utf8, integer fail_flag=0)
Description: Convert a UTF-8 string to sequence of UTF-32 code points.

Returns a string, if the input was pure-ascii (all characters 0..127), or a sequence.
pwa/p2js: Note this is actually a null operation in JavaScript, since that supports utf8 strings natively.
Comments: The output should not contain any elements outside the range 0..#10FFFF, or values in the range #D800..#DFFF (since that range is reserved, across the board, for UTF-16 surrogate pairs). Any such values are replaced with the element value #FFFD.

If the optional fail_flag is -1, the routine returns -1 rather than performing such substitutions (and obviously the result variable should be declared as type object).

Note, however, the input can legally contain the substring "\#EF\#BF\#BD", which maps directly to #FFFD without error.

String returns represent two possible savings: a potential four or eight-fold space saving, and no requirement to invoke utf32_to_utf8() when you’re done.
See Also: utfconv, utf32_to_utf8