get_text
Definition: | object res = get_text(object fn, integer options=GT_WHOLE_FILE) | |||||||||||||||||||||
Description: | Reads an entire file into memory. | |||||||||||||||||||||
pwa/p2js: | Not supported. | |||||||||||||||||||||
Comments: |
Suitable for relatively small text files, for example configuration (ini) files, and most of the files used by editors and
compilers, at least the text ones that is. It is not suitable for large files, over say 5MB, or most forms of binary file
(database, video, executable etc - no absolute bar though). This routine is deliberately limited to 1GB
(see technicalia).
Larger files should be processed [one char/byte/line at a time] by getc/gets/seek/puts, which have a (predicted) limit of 8192 TB, thousands of times larger than the biggest currently available hard drives. If fn is an integer it should be an open file with read access (slight difference for binary/text mode and GT_WHOLE_FILE). If fn is a string the file is opened and closed automatically, with -1 returned on failure. In 0.8.2 and earlier, the default was to open in binary mode, it now opens in text mode by default, unless the GT_BINARY bit is set in options. The following constants are automatically defined in psym.e/syminit():
GT_WHOLE_FILE leaves any embedded CR,LF,CRLF,LFCR as-is, whereas no CR are returned from the other options. GT_WHOLE_FILE is however the fastest way to read a large file (GT_WHOLE_FILE is what p.exw uses). There is no way to determine whether the original file had a trailing \n when using GT_LF_STRIPPED, GT_LT_LAST, or (GT_WHOLE_FILE plus a file handle opened in text mode), should that be in any way important. GT_KEEP_BOM: by default a leading utf8 byte order mark is automatically removed, as most applications do not need to treat utf8 any differently to ascii (all bytes below #80 have identical meaning, and no multi-byte encoding contains any bytes <#80). If however you need to preserve/write it back/display things differently/etc, then you should specify GT_KEEP_BOM and test for/handle the thing yourself. Note that no such handling occurs for utf16(be/le), utf32, or any other byte order marks, quite deliberately, as those will always require very different treatment to plain ascii files. Note that GT_KEEP_BOM is meant to be added to one of the other constants; GT_KEEP_BOM on its own obviously therefore happens to behave (for no special reason/by accident only) exactly the same as GT_WHOLE_FILE+GT_KEEP_BOM. |
|||||||||||||||||||||
Example: |
r = get_text("myapp.ini",GT_LF_STRIPPED) -- r is eg {"debug=1","Font=Courier","Window Position=160,200"} p = get_text("image.png",GT_WHOLE_FILE+GT_BINARY) -- p is an unmangled binary string (can contain '\0' etc.) |
|||||||||||||||||||||
Implementation: |
via :%opGetText / fget_text() in builtins\VM\pfileioN.e (an autoinclude) - be warned however it is low-level complicated stuff that you do not need to know.
Update: this routine has been split, with an outer wrapper in builtins\pfile.e, and the now slightly smaller internal fget_text() remaining in pfileioN.e |
|||||||||||||||||||||
Compatibility: |
There is no equivalent routine in Euphoria, though it does have a read_lines() routine offering somewhat similar
functionality, which is partly replicated. |
