utf32_to_utf8
Definition: | string utf8 = utf32_to_utf8(dword_sequence utf32, integer fail_flag=0) |
Description: | Convert a UTF-32 sequence to a UTF-8 string.
Returns a string. |
pwa/p2js: | Note this is actually a null operation in JavaScript, since that supports utf8 strings natively. |
Comments: |
The input should not contain any elements outside the range 0..#10FFFF, or values in the range #D800..#DFFF (since
that range is reserved, across the board, for UTF-16 surrogate pairs). Any such values will be replaced with the
substring "\#EF\#BF\#BD".
If the optional fail_flag is -1, the routine returns -1 rather than performing such substitutions (and obviously the result variable should be declared as type object). Note, however, the input can legally contain #FFFD, which maps directly to "\#EF\#BF\#BD" without error. If utf8_to_utf32() (or whatever) returned a string, because the original value was pure-ascii (all characters 0..127), and nothing has been done to auto-expand it to a dword_sequence, the resulting value does not need be passed back through utf32_to_utf8(), but if it is then it is simply returned unaltered. Note that ExpandTabSpecials() in demo\edix\src\tabs.e inserts space-preserving #B7|#B6|#BB|#A7 marks and later uses <string chunk> = utf32_to_utf8(<string chunk>,+1) to convert those to the correct "\#C2\#B7"|"\#C2\#B6"|"\#C2\#BB"|"\#C2\#A7" before display; if chunk is not a string but utf32, it is perfectly standard behaviour, and obviously when it is a string the +1 (in the fail_flag argument) prevents both the error and the return of -1. |
See Also: | utfconv, utf8_to_utf32 |