Expand/Shrink

get_text

Definition: object res = get_text(object fn, integer options=GT_WHOLE_FILE)
Description: Reads an entire file into memory.
pwa/p2js: Not supported.
Comments: Suitable for relatively small text files, for example configuration (ini) files, and most of the files used by editors and compilers, at least the text ones that is. It is not suitable for large files, over say 5MB, or most forms of binary file (database, video, executable etc - no absolute bar though). This routine is deliberately limited to 1GB (see technicalia).

Larger files should be processed [one char/byte/line at a time] by getc/gets/seek/puts, which have a (predicted) limit of 8192 TB, thousands of times larger than the biggest currently available hard drives.

If fn is an integer it should be an open file with read access (slight difference for binary/text mode and GT_WHOLE_FILE).
If fn is a string the file is opened and closed automatically, with -1 returned on failure. In 0.8.2 and earlier, the default was to open in binary mode, it now opens in text mode by default, unless the GT_BINARY bit is set in options.

The following constants are automatically defined in psym.e/syminit():

Constant Value Description
GT_WHOLE_FILE 0 get whole file as one long string, plus a final '\n' if missing and fn is a file handle opened in text mode.
GT_LF_STRIPPED 1 returns a sequence of '\n'-stripped lines.
GT_LF_LEFT 2 returns a sequence of lines with '\n' left on.
GT_LF_LAST 4 returns a sequence of lines with '\n' left on, and put on last if missing.
GT_KEEP_BOM 8 Retain leading utf8 byte order mark, see notes below.
GT_BINARY #10 If file is a string (no effect otherwise), open in binary mode else (the default) open in text mode.

 
GT_WHOLE_FILE leaves any embedded CR,LF,CRLF,LFCR as-is, whereas no CR are returned from the other options. GT_WHOLE_FILE is however the fastest way to read a large file (GT_WHOLE_FILE is what p.exw uses).

There is no way to determine whether the original file had a trailing \n when using GT_LF_STRIPPED, GT_LT_LAST, or (GT_WHOLE_FILE plus a file handle opened in text mode), should that be in any way important.

GT_KEEP_BOM: by default a leading utf8 byte order mark is automatically removed, as most applications do not need to treat utf8 any differently to ascii (all bytes below #80 have identical meaning, and no multi-byte encoding contains any bytes <#80). If however you need to preserve/write it back/display things differently/etc, then you should specify GT_KEEP_BOM and test for/handle the thing yourself. Note that no such handling occurs for utf16(be/le), utf32, or any other byte order marks, quite deliberately, as those will always require very different treatment to plain ascii files.
Note that GT_KEEP_BOM is meant to be added to one of the other constants; GT_KEEP_BOM on its own obviously therefore happens to behave (for no special reason/by accident only) exactly the same as GT_WHOLE_FILE+GT_KEEP_BOM.
Example:
r = get_text("myapp.ini",GT_LF_STRIPPED)  -- r is eg {"debug=1","Font=Courier","Window Position=160,200"}
p = get_text("image.png",GT_WHOLE_FILE+GT_BINARY) -- p is an unmangled binary string (can contain '\0' etc.)
Implementation: via :%opGetText / fget_text() in builtins\VM\pfileioN.e (an autoinclude) - be warned however it is low-level complicated stuff that you do not need to know.
Update: this routine has been split, with an outer wrapper in builtins\pfile.e, and the now slightly smaller internal fget_text() remaining in pfileioN.e
Compatibility: There is no equivalent routine in Euphoria, though it does have a read_lines() routine offering somewhat similar functionality, which is partly replicated.
Expand/Shrink