Expand/Shrink

utf32_to_utf8

Definition: string utf8 = utf32_to_utf8(dword_sequence utf32, integer fail_flag=0)
Description: Convert a UTF-32 sequence to a UTF-8 string.

Returns a string.
pwa/p2js: Note this is actually a null operation in JavaScript, since that supports utf8 strings natively.
Comments: The input should not contain any elements outside the range 0..#10FFFF, or values in the range #D800..#DFFF (since that range is reserved, across the board, for UTF-16 surrogate pairs). Any such values will be replaced with the substring "\#EF\#BF\#BD".

If the optional fail_flag is -1, the routine returns -1 rather than performing such substitutions (and obviously the result variable should be declared as type object).

Note, however, the input can legally contain #FFFD, which maps directly to "\#EF\#BF\#BD" without error.

If utf8_to_utf32() (or whatever) returned a string, because the original value was pure-ascii (all characters 0..127), and nothing has been done to auto-expand it to a dword_sequence, the resulting value does not need be passed back through utf32_to_utf8(), but if it is then it is simply returned unaltered.

Note that ExpandTabSpecials() in demo\edix\src\tabs.e inserts space-preserving #B7|#B6|#BB|#A7 marks and later uses <string chunk> = utf32_to_utf8(<string chunk>,+1) to convert those to the correct "\#C2\#B7"|"\#C2\#B6"|"\#C2\#BB"|"\#C2\#A7" before display; if chunk is not a string but utf32, it is perfectly standard behaviour, and obviously when it is a string the +1 (in the fail_flag argument) prevents both the error and the return of -1.
See Also: utfconv, utf8_to_utf32