regex.e

The file builtins\regex.e (not an autoinclude) provides routines for handling regular expressions.

Those seeking full PCRE compatibility (or any other library) are, well, advised to use PCRE.

These routines implement a fast pikevm, as well as a (potentially exponentially slower) recursive backtrackingvm.

Whilst I might not recommend fiddling about with the innards of (any) regular expression engine, I would rather not prohibit anyone from doing so.
Those interested in such matters may also like to peruse demo\regex_dfa.exw and regex_bc.exw (but be warned they currently lack 95% of features).

Regular expressions are normally written in `backtick strings`, to avoid standard escape interpretation (eg `\w` rather than "\\w")

regex syntax - a table summarising the supported elements
regex_options - set regular expression handling options
regex_compile - compile a regular expression for repeated use (optional)
regex - apply a regular expression to a target string
gsub - (draft) global substitution routine
gmatch - (draft) global match routine

 
builtins\regex.e also declares the global procedure regexp_list, strictly for the benefit of test\t63regex.exw.

Long story short:
I am not an advocate of regular expressions, but I get that they have their uses. For instance, antivirus engines and network intrusion systems can hardly be anything other than the highly optimised and simultaneous application to files and data streams of millions of regular expressions (albeit very carefully crafted ones, completely unambiguous and without any messy backtracking, or for that matter any greedyness hints).

Not finding anything adequate in the archives (although pattern.e by dcuny came close), I stumbled across a series of articles by Russ Cox and immediately (as no doubt intended) fell in love with the pikevm. This component is an implementation of that, built for simplicity, speed, and ease of experimentation, rather than a head-on competitor to the likes of PCRE. Credit is also due to regexp_pikevm.rb as found here and I also recommend reading this gamedev article. Fanboys of regular expressions might want to consider a wrapper around a pcre.dll/so instead, but for my target audience that would mean foregoing any ability to tinker.

Caution

Some people, when confronted with a problem, think 'I know - I’ll use regular expressions.' Now they have two problems.
Jamie Zawinski (flame war on alt.religion.emacs)
Sure, there are cases where a regular expression is the best option. However they should not be adopted as a "weapon of choice" in the same way that newbies past latched onto goto.
Beyond a certain point it becomes faster to write, execute, and in particular, debug longhand code, with loops/find/match/etc.

It should not be lost that you cannot in any real sense debug a regular expression, other than to throw yet more test cases at it.
Long term maintenance quickly becomes much harder as regular expressions grow in length and complexity.
Regular expressions are often at their best when thrown away immediately after use, or if you prefer,
the perfect band-aid for minor duties, but just not meant for anything more serious.

TIP: HTML cannot be parsed with regular expressions - you are wasting your time trying.
Regular expressions can match regular languages - but HTML is not a regular language.
It is a context-free language, which is a language that regular expressions are not fit to parse.

Example: `<TAG\b[^>]*>(.*?)</TAG>` matches the opening and closing pair of a specific HTML tag, however it will not match nested tags properly, such as <TAG>one<TAG>two</TAG>three</TAG>. Hence while `<(\w+)\b[^>]*>.*<\1>` or similar may look just dandy for html processing, it really is not.

One last quote:
I think regular expressions are overused in scripting languages (like Perl, Python, Ruby, ...) because their C/ASM powered implementation is usually more optimized than those languages itself, but Go isn’t such a language. Regular expressions are usually quite slow and are often not suited for the problem at all.
I hear that. It is vitally important to know when to stop using regular expressions and go with normal longhand code instead.

Here is your shiny new hammer. Now please just remember that not everything is a nail.