diff --git a/README.md b/README.md index b6c36e6..57ddecd 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -# XRegExp 5.0.0-next +# XRegExp 5.0.1 [![Build Status](https://github.com/slevithan/xregexp/workflows/Node.js%20CI/badge.svg)](https://github.com/slevithan/xregexp/actions) diff --git a/docs/api/index.html b/docs/api/index.html new file mode 100644 index 0000000..20a31f9 --- /dev/null +++ b/docs/api/index.html @@ -0,0 +1,1218 @@ + + + + + API :: XRegExp + + + + +
+ +
+ + + + + + + +

API

+ +

XRegExp(pattern, [flags])

+ +

Creates an extended regular expression object for matching text with a pattern. Differs from a + native regular expression in that additional syntax and flags are supported. The returned object + is in fact a native RegExp and works with all native methods.

+ + + + + + + + + + + + +
Parameters: +
    +
  • pattern {String|RegExp}
    + Regex pattern string, or an existing regex object to copy. +
  • +
  • [flags] {String}
    + Any combination of flags.
    + Native flags: +
      +
    • g - global
    • +
    • i - ignore case
    • +
    • m - multiline anchors
    • +
    • u - unicode (ES6)
    • +
    • y - sticky (Firefox 3+, ES6)
    • +
    + Additional XRegExp flags: +
      +
    • n - explicit capture
    • +
    • s - dot matches all (aka singleline) - works even when not natively supported
    • +
    • x - free-spacing and line comments (aka extended)
    • +
    • A - astral (requires the Unicode Base addon)
    • +
    + Flags cannot be provided when constructing one RegExp from another. +
  • +
+
Returns: + {RegExp}
+ Extended regular expression object. +
+ +

Example

+
// With named capture and flag x
+XRegExp(`(?<year>  [0-9]{4} ) [-\\s]?  # year
+         (?<month> [0-9]{2} ) [-\\s]?  # month
+         (?<day>   [0-9]{2} )          # day`, 'x');
+
+// Providing a regex object copies it. Native regexes are recompiled using native (not
+// XRegExp) syntax. Copies maintain extended data, are augmented with `XRegExp.prototype`
+// properties, and have fresh `lastIndex` properties (set to zero).
+XRegExp(/regex/);
+
+ +

For details about the regular expression just shown, see Syntax: Named capture and Flags: Free-spacing.

+ +

Regexes, strings, and backslashes

+

JavaScript string literals (as opposed to, e.g., user input or text extracted from the DOM) use a backslash as an escape character. The string literal '\\' therefore contains a single backslash, and its length property's value is 1. However, a backslash is also an escape character in regular expression syntax, where the pattern \\ matches a single backslash. When providing string literals to the RegExp or XRegExp constructor functions, four backslashes are therefore needed to match a single backslash—e.g., XRegExp('\\\\'). Only two of those backslashes are actually passed into the constructor function. The other two are used to escape the backslashes in the string before the function ever sees the string. The exception is when using ES6 raw strings via String.raw or XRegExp.tag.

+ +

The same issue is at play with the \\s sequences in the example code just shown. XRegExp is provided with the two characters \s, which it in turn recognizes as the metasequence used to match a whitespace character.

+ + +

XRegExp.addToken(regex, handler, [options])

+

Extends XRegExp syntax and allows custom flags. This is used internally and can be used to create XRegExp addons. If more than one token can match the same string, the last added wins.

+ + + + + + + + + + + + +
Parameters:
    +
  • regex {RegExp}
    + Regex object that matches the new token. +
  • +
  • handler {Function}
    + Function that returns a new pattern string (using native regex syntax) + to replace the matched token within all future XRegExp regexes. Has access to persistent + properties of the regex being built, through this. Invoked with three arguments: +
      +
    1. The match array, with named backreference properties.
    2. +
    3. The regex scope where the match was found: 'default' or 'class'.
    4. +
    5. The flags used by the regex, including any flags in a leading mode modifier.
    6. +
    + The handler function becomes part of the XRegExp construction process, so be careful not to + construct XRegExps within the function or you will trigger infinite recursion. +
  • +
  • [options] {Object}
    + Options object with optional properties: +
      +
    • scope {String} Scopes where the token applies: 'default', 'class', or 'all'.
    • +
    • flag {String} Single-character flag that triggers the token. This also registers the + flag, which prevents XRegExp from throwing an 'unknown flag' error when the flag is used.
    • +
    • optionalFlags {String} Any custom flags checked for within the token handler that are + not required to trigger the token. This registers the flags, to prevent XRegExp from + throwing an 'unknown flag' error when any of the flags are used.
    • +
    • reparse {Boolean} Whether the handler function's output should not be treated as + final, and instead be reparseable by other tokens (including the current token). Allows + token chaining or deferring.
    • +
    • leadChar {String} Single character that occurs at the beginning of any successful match + of the token (not always applicable). This doesn't change the behavior of the token unless + you provide an erroneous value. However, providing it can increase the token's performance + since the token can be skipped at any positions where this character doesn't appear.
    • +
    +
  • +
+
Returns: + {undefined}
+ Does not return a value. +
+ +

Example

+
// Basic usage: Add \a for the ALERT control code
+XRegExp.addToken(
+  /\\a/,
+  () => '\\x07',
+  {scope: 'all'}
+);
+XRegExp('\\a[\\a-\\n]+').test('\x07\n\x07'); // -> true
+
+ +

Show more XRegExp.addToken examples. ↓

+ + + + +

XRegExp.build(pattern, subs, [flags])

+ +

Requires the XRegExp.build addon, which is bundled in xregexp-all.js.

+ +

Builds regexes using named subpatterns, for readability and pattern reuse. Backreferences in the + outer pattern and provided subpatterns are automatically renumbered to work correctly. Native + flags used by provided subpatterns are ignored in favor of the flags argument.

+ + + + + + + + + + + + +
Parameters:
    +
  • pattern {String}
    + XRegExp pattern using {{name}} for embedded subpatterns. Allows ({{name}}) as shorthand for (?<name>{{name}}). Patterns cannot be embedded within character classes. +
  • +
  • subs {Object}
    + Lookup object for named subpatterns. Values can be strings or regexes. A leading ^ and trailing unescaped $ are stripped from subpatterns, if both are present. +
  • +
  • [flags] {String}
    + Any combination of XRegExp flags. +
  • +
+
Returns:
    +
  • {RegExp}
    + Regex with interpolated subpatterns. +
  • +
+
+ +

Example

+
const time = XRegExp.build('(?x)^ {{hours}} ({{minutes}}) $', {
+  hours: XRegExp.build('{{h12}} : | {{h24}}', {
+    h12: /1[0-2]|0?[1-9]/,
+    h24: /2[0-3]|[01][0-9]/
+  }, 'x'),
+  minutes: /^[0-5][0-9]$/
+});
+
+time.test('10:59'); // -> true
+XRegExp.exec('10:59', time).groups.minutes; // -> '59'
+
+ +

See also: Creating Grammatical Regexes Using XRegExp.build.

+ + +

XRegExp.cache(pattern, [flags])

+ +

Caches and returns the result of calling XRegExp(pattern, flags). On any subsequent call with + the same pattern and flag combination, the cached copy of the regex is returned.

+ + + + + + + + + + + + +
Parameters:
    +
  • pattern {String}
    + Regex pattern string. +
  • +
  • [flags] {String}
    + Any combination of XRegExp flags. +
  • +
+
Returns: + {RegExp}
+ Cached XRegExp object. +
+ +

Example

+
let match;
+while (match = XRegExp.cache('.', 'gs').exec('abc')) {
+  // The regex is compiled once only
+}
+
+const regex1 = XRegExp.cache('.', 's'),
+const regex2 = XRegExp.cache('.', 's');
+// regex1 and regex2 are references to the same regex object
+
+ + +

XRegExp.escape(str)

+ +

Escapes any regular expression metacharacters, for use when matching literal strings. The result + can safely be used at any position within a regex that uses any flags.

+ +

The escaped characters are [, ], {, }, (, ), -, *, +, ?, ., \, ^, $, |, ,, #, and whitespace (see free-spacing for the list of whitespace characters).

+ + + + + + + + + + + + +
Parameters:
    +
  • str {String}
    + String to escape. +
  • +
+
Returns: + {String}
+ String with regex metacharacters escaped. +
+ +

Example

+
XRegExp.escape('Escaped? <.>');
+// -> 'Escaped\?\u0020<\.>'
+
+ + +

XRegExp.exec(str, regex, [pos], [sticky])

+ +

Executes a regex search in a specified string. Returns a match array or null. If the provided + regex uses named capture, named capture properties are included on the match array's groups + property. Optional pos and sticky arguments specify the search start position, and whether + the match must start at the specified position only. The lastIndex property of the provided + regex is not used, but is updated for compatibility. Also fixes browser bugs compared to the + native RegExp.prototype.exec and can be used reliably cross-browser.

+ + + + + + + + + + + + +
Parameters:
    +
  • str {String}
    + String to search. +
  • +
  • regex {RegExp}
    + Regex to search with. +
  • +
  • [pos=0] {Number}
    + Zero-based index at which to start the search. +
  • +
  • [sticky=false] {Boolean|String}
    + Whether the match must start at the specified position only. The string 'sticky' is accepted as an alternative to true. +
  • +
+
Returns: + {Array}
+ Match array with named capture properties on the groups object, or null. If the namespacing feature is off, named capture properties are directly on the match array. +
+ +

Example

+
// Basic use, with named backreference
+let match = XRegExp.exec('U+2620', XRegExp('U\\+(?[0-9A-F]{4})'));
+match.groups.hex; // -> '2620'
+
+// With pos and sticky, in a loop
+let pos = 2, result = [], match;
+while (match = XRegExp.exec('<1><2><3><4>5<6>', /<(\d)>/, pos, 'sticky')) {
+  result.push(match[1]);
+  pos = match.index + match[0].length;
+}
+// result -> ['2', '3', '4']
+
+ + +

XRegExp.forEach(str, regex, callback)

+ +

Executes a provided function once per regex match. Searches always start at the beginning of the string and continue until the end, regardless of the state of the regex's global property and initial lastIndex.

+ + + + + + + + + + + + +
Parameters:
    +
  • str {String}
    + String to search. +
  • +
  • regex {RegExp}
    + Regex to search with. +
  • +
  • callback {Function}
    + Function to execute for each match. Invoked with four arguments: +
      +
    1. The match array, with named backreference properties.
    2. +
    3. The zero-based match index.
    4. +
    5. The string being traversed.
    6. +
    7. The regex object being used to traverse the string.
    8. +
    +
  • +
+
Returns: + {undefined}
+ Does not return a value. +
+ +

Example

+
// Extracts every other digit from a string
+const evens = [];
+XRegExp.forEach('1a2345', /\d/, function (match, i) {
+  if (i % 2) evens.push(+match[0]);
+});
+// evens -> [2, 4]
+
+ + +

XRegExp.globalize(regex)

+ +

Copies a regex object and adds flag g. The copy maintains extended data, + is augmented with XRegExp.prototype properties, and has a fresh lastIndex property (set to + zero). Native regexes are not recompiled using XRegExp syntax.

+ + + + + + + + + + + + +
Parameters:
    +
  • regex {RegExp}
    + Regex to globalize. +
  • +
+
Returns: + {RegExp}
+ Copy of the provided regex with flag g added. +
+ +

Example

+
const globalCopy = XRegExp.globalize(/regex/);
+globalCopy.global; // -> true
+
+function parse(str, regex) {
+  regex = XRegExp.globalize(regex);
+  let match;
+  while (match = regex.exec(str)) {
+    // ...
+  }
+}
+
+ + +

XRegExp.install(options)

+ +

Installs optional features according to the specified options. Can be undone using XRegExp.uninstall.

+ + + + + + + + + + + + +
Parameters:
    +
  • options {Object|String}
    + Options object or string. +
  • +
+
Returns: + {undefined}
+ Does not return a value. +
+ +

Example

+
// With an options object
+XRegExp.install({
+  // Enables support for astral code points in Unicode addons (implicitly sets flag A)
+  astral: true,
+
+  // Adds named capture groups to the `groups` property of matches
+  // On by default in XRegExp 5
+  namespacing: true
+});
+
+// With an options string
+XRegExp.install('astral namespacing');
+
+ + +

XRegExp.isInstalled(feature)

+ +

Checks whether an individual optional feature is installed.

+ + + + + + + + + + + + +
Parameters:
    +
  • feature {String}
    + Name of the feature to check. One of: +
      +
    • astral
    • +
    • namespacing
    • +
    +
  • +
+
Returns: + {Boolean}
+ Whether the feature is installed. +
+ +

Example

+
XRegExp.isInstalled('astral');
+
+ + +

XRegExp.isRegExp(value)

+ +

Returns true if an object is a regex; false if it isn't. This works correctly for regexes + created in another frame, when instanceof and constructor checks would fail.

+ + + + + + + + + + + + +
Parameters:
    +
  • value {*}
    + Object to check. +
  • +
+
Returns: + {Boolean}
+ Whether the object is a RegExp object. +
+ +

Example

+
XRegExp.isRegExp('string'); // -> false
+XRegExp.isRegExp(/regex/i); // -> true
+XRegExp.isRegExp(RegExp('^', 'm')); // -> true
+XRegExp.isRegExp(XRegExp('(?s).')); // -> true
+
+ + +

XRegExp.match(str, regex, [scope])

+ +

Returns the first matched string, or in global mode, an array containing all matched strings. + This is essentially a more convenient re-implementation of String.prototype.match that gives + the result types you actually want (string instead of exec-style array in match-first mode, + and an empty array instead of null when no matches are found in match-all mode). It also lets + you override flag g and ignore lastIndex, and fixes browser bugs.

+ + + + + + + + + + + + +
Parameters:
    +
  • str {String}
    + String to search. +
  • +
  • regex {RegExp}
    + Regex to search with. +
  • +
  • [scope='one'] {String}
    + Use 'one' to return the first match as a string. Use 'all' to + return an array of all matched strings. If not explicitly specified and regex uses flag g, + scope is all. +
  • +
+
Returns: + {String|Array}
+ In match-first mode: First match as a string, or null. In match-all + mode: Array of all matched strings, or an empty array. +
+ +

Example

+
// Match first
+XRegExp.match('abc', /\w/); // -> 'a'
+XRegExp.match('abc', /\w/g, 'one'); // -> 'a'
+XRegExp.match('abc', /x/g, 'one'); // -> null
+
+// Match all
+XRegExp.match('abc', /\w/g); // -> ['a', 'b', 'c']
+XRegExp.match('abc', /\w/, 'all'); // -> ['a', 'b', 'c']
+XRegExp.match('abc', /x/, 'all'); // -> []
+
+ + +

XRegExp.matchChain(str, chain)

+ +

Retrieves the matches from searching a string using a chain of regexes that successively search + within previous matches. The provided chain array can contain regexes and or objects with regex + and backref properties. When a backreference is specified, the named or numbered backreference + is passed forward to the next regex or returned.

+ + + + + + + + + + + + +
Parameters:
    +
  • str {String}
    + String to search. +
  • +
  • chain {Array}
    + Regexes that each search for matches within preceding results. +
  • +
+
Returns: + {Array}
+ Matches by the last regex in the chain, or an empty array. +
+ +

Example

+
// Basic usage; matches numbers within <b> tags
+XRegExp.matchChain('1 <b>2</b> 3 <b>4 a 56</b>', [
+  XRegExp('(?is)<b>.*?</b>'),
+  /\d+/
+]);
+// -> ['2', '4', '56']
+
+// Passing forward and returning specific backreferences
+const html = `<a href="http://xregexp.com/api/">XRegExp</a>
+              <a href="http://www.google.com/">Google</a>`;
+XRegExp.matchChain(html, [
+  {regex: /<a href="([^"]+)">/i, backref: 1},
+  {regex: XRegExp('(?i)^https?://(?<domain>[^/?#]+)'), backref: 'domain'}
+]);
+// -> ['xregexp.com', 'www.google.com']
+
+ + +

XRegExp.matchRecursive(str, left, right, [flags], [options])

+ +

Requires the XRegExp.matchRecursive addon, which is bundled in xregexp-all.js.

+ +

Returns an array of match strings between outermost left and right delimiters, or an array of + objects with detailed match parts and position data. An error is thrown if delimiters are + unbalanced within the data.

+ + + + + + + + + + + + +
Parameters:
    +
  • str {String}
    + String to search. +
  • +
  • left {String}
    + Left delimiter as an XRegExp pattern. +
  • +
  • right {String}
    + Right delimiter as an XRegExp pattern. +
  • +
  • [flags] {String}
    + Any combination of XRegExp flags, used for the left and right delimiters. +
  • +
  • [options] {Object}
    + Lets you specify valueNames and escapeChar options. +
  • +
+
Returns:
    +
  • {Array}
    + Array of matches, or an empty array. +
  • +
+
+ +

Example

+
// Basic usage
+let str = '(t((e))s)t()(ing)';
+XRegExp.matchRecursive(str, '\\(', '\\)', 'g');
+// -> ['t((e))s', '', 'ing']
+
+// Extended information mode with valueNames
+str = 'Here is <div> <div>an</div></div> example';
+XRegExp.matchRecursive(str, '<div\\s*>', '</div>', 'gi', {
+  valueNames: ['between', 'left', 'match', 'right']
+});
+/* -> [
+{name: 'between', value: 'Here is ',       start: 0,  end: 8},
+{name: 'left',    value: '<div>',          start: 8,  end: 13},
+{name: 'match',   value: ' <div>an</div>', start: 13, end: 27},
+{name: 'right',   value: '</div>',         start: 27, end: 33},
+{name: 'between', value: ' example',       start: 33, end: 41}
+] */
+
+// Omitting unneeded parts with null valueNames, and using escapeChar
+str = '...{1}.\\{{function(x,y){return {y:x}}}';
+XRegExp.matchRecursive(str, '{', '}', 'g', {
+  valueNames: ['literal', null, 'value', null],
+  escapeChar: '\\'
+});
+/* -> [
+{name: 'literal', value: '...',  start: 0, end: 3},
+{name: 'value',   value: '1',    start: 4, end: 5},
+{name: 'literal', value: '.\\{', start: 6, end: 9},
+{name: 'value',   value: 'function(x,y){return {y:x}}', start: 10, end: 37}
+] */
+
+// Sticky mode via flag y
+str = '<1><<<2>>><3>4<5>';
+XRegExp.matchRecursive(str, '<', '>', 'gy');
+// -> ['1', '<<2>>', '3']
+
+ + +

XRegExp.replace(str, search, replacement, [scope])

+ +

Returns a new string with one or all matches of a pattern replaced. The pattern can be a string + or regex, and the replacement can be a string or a function to be called for each match. To + perform a global search and replace, use the optional scope argument or include flag g if + using a regex. Replacement strings can use $<n> or ${n} for named and numbered backreferences. + Replacement functions can use named backreferences via the last argument. Also fixes browser + bugs compared to the native String.prototype.replace and can be used reliably cross-browser.

+ +

For the full details of XRegExp's replacement text syntax, see Syntax: Replacement text.

+ + + + + + + + + + + + +
Parameters:
    +
  • str {String}
    + String to search. +
  • +
  • search {RegExp|String}
    + Search pattern to be replaced. +
  • +
  • replacement {String|Function}
    + Replacement string or a function invoked to create it.
    + Replacement strings can include special replacement syntax: +
      +
    • $$ - Inserts a literal $ character.
    • +
    • $&, $0 - Inserts the matched substring.
    • +
    • $` - Inserts the string that precedes the matched substring (left context).
    • +
    • $' - Inserts the string that follows the matched substring (right context).
    • +
    • $n, $nn - Where n/nn are digits referencing an existent capturing group, inserts + backreference n/nn.
    • +
    • $<n>, ${n} - Where n is a name or any number of digits that reference an existing capturing + group, inserts backreference n.
    • +
    + Replacement functions are invoked with three or more arguments: +
      +
    • args[0] - The matched substring (corresponds to $& above). If the namespacing feature is off, named backreferences are accessible as properties of this argument.
    • +
    • args[1..n] - One argument for each backreference (corresponding to $1, $2, etc. above). If the regex has no capturing groups, no arguments appear in this position.
    • +
    • args[n+1] - The zero-based index of the match within the entire search string.
    • +
    • args[n+2] - The total string being searched.
    • +
    • args[n+3] - If the the search pattern is a regex with named capturing groups, the last argument is the groups object. Its keys are the backreference names and its values are the backreference values. If the namespacing feature is off, this argument is not present.
    • +
    +
  • +
  • [scope] {String}
    + Use 'one' to replace the first match only, or 'all'. Defaults to 'one'. Defaults to 'all' if using a regex with flag g. +
  • +
+
Returns: + {String}
+ New string with one or all matches replaced. +
+ +

Example

+
// Regex search, using named backreferences in replacement string
+const name = XRegExp('(?<first>\\w+) (?<last>\\w+)');
+XRegExp.replace('John Smith', name, '$<last>, $<first>');
+// -> 'Smith, John'
+
+// Regex search, using named backreferences in replacement function
+XRegExp.replace('John Smith', name, (...args) => {
+  const groups = args[args.length - 1];
+  return `${groups.last}, ${groups.first}`;
+});
+// -> 'Smith, John'
+
+// String search, with replace-all
+XRegExp.replace('RegExp builds RegExps', 'RegExp', 'XRegExp', 'all');
+// -> 'XRegExp builds XRegExps'
+
+ + +

XRegExp.replaceEach(str, replacements)

+ +

Performs batch processing of string replacements. Used like XRegExp.replace, but + accepts an array of replacement details. Later replacements operate on the output of earlier + replacements. Replacement details are accepted as an array with a regex or string to search for, + the replacement string or function, and an optional scope of 'one' or 'all'. Uses the XRegExp + replacement text syntax, which supports named backreference properties via $<name> or ${name}.

+ + + + + + + + + + + + +
Parameters:
    +
  • str {String}
    + String to search. +
  • +
  • replacements {Array}
    + Array of replacement detail arrays. +
  • +
+
Returns: + {String}
+ New string with all replacements. +
+ +

Example

+
str = XRegExp.replaceEach(str, [
+  [XRegExp('(?<name>a)'), 'z$<name>'],
+  [/b/gi, 'y'],
+  [/c/g, 'x', 'one'], // scope 'one' overrides /g
+  [/d/, 'w', 'all'],  // scope 'all' overrides lack of /g
+  ['e', 'v', 'all'],  // scope 'all' allows replace-all for strings
+  [/f/g, (match) => match.toUpperCase()]
+]);
+
+ + +

XRegExp.split(str, separator, [limit])

+ +

Splits a string into an array of strings using a regex or string separator. Matches of the + separator are not included in the result array. However, if separator is a regex that contains + capturing groups, backreferences are spliced into the result each time separator is matched. + Fixes browser bugs compared to the native String.prototype.split and can be used reliably + cross-browser.

+ + + + + + + + + + + + +
Parameters:
    +
  • str {String}
    + String to split. +
  • +
  • separator {RegExp|String}
    + Regex or string to use for separating the string. +
  • +
  • [limit] {Number}
    + Maximum number of items to include in the result array. +
  • +
+
Returns: + {Array}
+ Array of substrings. +
+ +

Example

+
// Basic use
+XRegExp.split('a b c', ' ');
+// -> ['a', 'b', 'c']
+
+// With limit
+XRegExp.split('a b c', ' ', 2);
+// -> ['a', 'b']
+
+// Backreferences in result array
+XRegExp.split('..word1..', /([a-z]+)(\d+)/i);
+// -> ['..', 'word', '1', '..']
+
+ + +

XRegExp.tag([flags])`pattern`

+ +

Requires the XRegExp.build addon, which is bundled in xregexp-all.js.

+ +

Provides tagged template literals that create regexes with XRegExp syntax and flags. The + provided pattern is handled as a raw string, so backslashes don't need to be escaped.

+ +

Interpolation of strings and regexes shares the features of XRegExp.build. Interpolated + patterns are treated as atomic units when quantified, interpolated strings have their special + characters escaped, a leading ^ and trailing unescaped $ are stripped from interpolated + regexes if both are present, and any backreferences within an interpolated regex are + rewritten to work within the overall pattern.

+ + + + + + + + + + + + +
Parameters: +
    +
  • [flags] {String}
    + Any combination of XRegExp flags. +
  • +
  • pattern {String}
    + Regex pattern as a raw string, optionally with interpolation. +
  • +
+
Returns: + {RegExp}
+ Extended regular expression object. +
+ +

Example

+
XRegExp.tag()`\b\w+\b`.test('word'); // -> true
+
+const hours = /1[0-2]|0?[1-9]/;
+const minutes = /(?<minutes>[0-5][0-9])/;
+const time = XRegExp.tag('x')`\b ${hours} : ${minutes} \b`;
+time.test('10:59'); // -> true
+XRegExp.exec('10:59', time).groups.minutes; // -> '59'
+
+const backref1 = /(a)\1/;
+const backref2 = /(b)\1/;
+XRegExp.tag()`${backref1}${backref2}`.test('aabb'); // -> true
+
+ + +

XRegExp.test(str, regex, [pos], [sticky])

+ +

Executes a regex search in a specified string. Returns true or false. Optional pos and + sticky arguments specify the search start position, and whether the match must start at the + specified position only. The lastIndex property of the provided regex is not used, but is + updated for compatibility. Also fixes browser bugs compared to the native + RegExp.prototype.test and can be used reliably cross-browser.

+ + + + + + + + + + + + +
Parameters:
    +
  • str {String}
    + String to search. +
  • +
  • regex {RegExp}
    + Regex to search with. +
  • +
  • [pos=0] {Number}
    + Zero-based index at which to start the search. +
  • +
  • [sticky=false] {Boolean|String}
    + Whether the match must start at the specified position only. The string 'sticky' is accepted as an alternative to true. +
  • +
+
Returns: + {Boolean}
+ Whether the regex matched the provided value. +
+ +

Example

+
// Basic use
+XRegExp.test('abc', /c/); // -> true
+
+// With pos and sticky
+XRegExp.test('abc', /c/, 0, 'sticky'); // -> false
+XRegExp.test('abc', /c/, 2, 'sticky'); // -> true
+
+ + +

XRegExp.uninstall(options)

+ +

Uninstalls optional features according to the specified options. Used to undo the actions of XRegExp.install.

+ + + + + + + + + + + + +
Parameters:
    +
  • options {Object|String}
    + Options object or string. +
  • +
+
Returns: + {undefined}
+ Does not return a value. +
+ +

Example

+
// With an options object
+XRegExp.uninstall({
+  // Disables support for astral code points in Unicode addons (unless enabled per regex)
+  astral: true,
+
+  // Don't add named capture groups to the `groups` property of matches
+  namespacing: true
+});
+
+// With an options string
+XRegExp.uninstall('astral namespacing');
+
+ + +

XRegExp.union(patterns, [flags])

+ +

Returns an XRegExp object that is the union of the given patterns. Patterns can be provided as + regex objects or strings. Metacharacters are escaped in patterns provided as strings. + Backreferences in provided regex objects are automatically renumbered to work correctly within the larger combined pattern. Native + flags used by provided regexes are ignored in favor of the flags argument.

+ + + + + + + + + + + + +
Parameters:
    +
  • patterns {Array}
    + Regexes and strings to combine. +
  • +
  • [flags] {String}
    + Any combination of XRegExp flags. +
  • +
  • [options] {Object}
    + Options object with optional properties: +
      +
    • conjunction {String} Type of conjunction to use: 'or' (default) or 'none'.
    • +
    +
  • +
+
Returns: + {RegExp}
+ Union of the provided regexes and strings. +
+ +

Example

+
XRegExp.union(['a+b*c', /(dogs)\1/, /(cats)\1/], 'i');
+// -> /a\+b\*c|(dogs)\1|(cats)\2/i
+
+XRegExp.union([/man/, /bear/, /pig/], 'i', {conjunction: 'none'});
+// -> /manbearpig/i
+
+ + +

XRegExp.version

+ +

The XRegExp version number as a string containing three dot-separated parts. For example, '2.0.0-beta-3'.

+ + +

<regexp>.xregexp.source

+ +

The original pattern provided to the XRegExp constructor. Note that this differs from the <regexp>.source property which holds the transpiled source in native RegExp syntax and therefore can't be used to reconstruct the regex (e.g. <regexp>.source holds no knowledge of capture names). This property is available only for regexes originally constructed by XRegExp. It is null for native regexes copied using the XRegExp constructor or XRegExp.globalize.

+ + +

<regexp>.xregexp.flags

+ +

The original flags provided to the XRegExp constructor. Differs from the ES6 <regexp>.flags property in that it includes XRegExp's non-native flags and is accessible even in pre-ES6 browsers. This property is available only for regexes originally constructed by XRegExp. It is null for native regexes copied using the XRegExp constructor or XRegExp.globalize. When regexes originally constructed by XRegExp are copied using XRegExp.globalize, the value of this property is augmented with 'g' if not already present. Flags are listed in alphabetical order.

+ + + + + +
+
+ + + diff --git a/docs/assets/index.css b/docs/assets/index.css new file mode 100644 index 0000000..f0f85a2 --- /dev/null +++ b/docs/assets/index.css @@ -0,0 +1,82 @@ +body {font-family:Calibri, Tahoma, Verdana, Arial, Helvetica, sans-serif; font-size:85%; margin:0; padding:0; background:#fff;} +a:link, a:visited {color:#296e31; text-decoration:none;} +a:hover, a:active {color:#0a3716; text-decoration:underline;} +#header {padding:15px 15px 10px; border-bottom:3px solid #e3e3e3; background:#f3f3f3;} +#logoX {color:#999;} +#body {height:100%; padding:15px;} +#navBar {height:100%; width:200px; float:left;} +#main {height:100%; margin-left:200px;} +#footer {clear:both; border-top:3px solid #e3e3e3; padding:0 15px 20px;} +#footnotes {margin-top:25px;} +#tocContainer {float:right; background:#fff; padding:5px 0 20px 20px;} +#toc {border:1px solid #aaa; padding:0 20px 8px;} +#toc h2 {margin-top:15px;} +#toc ul {padding-left:15px;} +.small {font-size:80%;} +.plain {font-weight:normal;} +.alert {color:#900; font-weight:bold;} +.todo {color:#c00; font-weight:bold;} +.standout {background:#ffc;} +.clear {clear:both;} +h1 {margin-bottom:0; font-family:Cambria, Tahoma, Verdana, Arial, Helvetica, sans-serif;} +h1 a:link, h1 a:visited, h1 a:active, h1 a:hover {color:#000; text-decoration:none;} +h1.subtitle {margin-top:0; font-size:1.2em; font-weight:normal; font-family:Calibri, Tahoma, Verdana, Arial, Helvetica, sans-serif;} +h2 {border-bottom:1px solid #aaa; margin-top:25px; font-family:Cambria, "Times New Roman", Times, serif; font-size:145%;} +h2 code {border-bottom:0;} +h2 code span.plain {font-size:90%;} +h3 {margin:15px 0 10px; font-family:Cambria, "Times New Roman", Times, serif; font-size:125%; font-weight:normal;} +pre {background:#fafafa; white-space:pre-wrap; font-family:Monaco, Consolas, "Courier New", Courier, monospace; border:1px solid #e3e3e3; padding:5px;} +code {font-family:Monaco, Consolas, "Courier New", Courier, monospace; border:1px solid #eee; background:#f3f3f3;} +cite {font-style:normal;} +q {font-style:italic;} +q:before, q:after {content:"";} +li {margin-bottom:1px; line-height:130%;} +table {border-collapse:collapse; border-color:#888;} +table ul {padding-left:20px; margin:0;} +thead {background:#333; color:#f3f3f3;} +th, td {border:solid #888; border-width:0 1px 1px 0; padding:5px;} +tr.alt {background:#f3f3f3;} +tr.alt code {background:#fafafa;} +table.api {margin-left:20px;} +table.api th, table.api td {border:0;} +table.api tr.alt {background:#fff;} +table.api tr.alt td {border-top:1px solid #ddd;} +table.api tbody th {vertical-align:top; text-align:left; border-right:1px solid #ddd;} +div.aside {border:3px double #ddd; background:#f6f6f6; padding:0 15px 15px; margin-bottom:15px;} +div.aside p {margin:15px 0 0;} +div.aside code {border:1px solid #ddd; background:#f6f6f6; padding:0 2px;} +div.right {float:right; clear:right;} +div.aside.right {width:300px; margin-left:15px;} +a.footnoteLink {font-size:80%; color:#999;} +tr.highlight {background:#bfdcff;} +tr.highlight code {border-color:#99b9df; background:#b3ceef;} + +.menu { + width:180px; +} +.menu ul { + list-style-type:none; + margin:0; + padding:0 0 10px 0; + border:0 solid #a0df99; + border-width:0 1px 1px 0; +} +.menu li a { + font:italic 15px Georgia, "Times New Roman", Times, serif; + display:block; + height:24px; + padding:4px 0 4px 10px; + line-height:24px; + text-decoration:none; +} +.menu li a:link, .menu li a:visited { + color:#296e31; +} +.menu li a:hover { + color:#0a3716; + text-decoration:underline; +} +.menu li a.selected { + color:#333; font-weight:bold; +} +a img {border:0;} diff --git a/docs/flags/index.html b/docs/flags/index.html new file mode 100644 index 0000000..3999ecb --- /dev/null +++ b/docs/flags/index.html @@ -0,0 +1,216 @@ + + + + + New flags :: XRegExp + + + + +
+ +
+ + + + + + + +

New flags

+ +

About flags

+ +

XRegExp provides four new flags (n, s, x, A), which can be combined with native flags and arranged in any order. Unlike native flags, non-native flags do not show up as properties on regular expression objects.

+ + + + +

Explicit capture (n)

+ +

Specifies that the only valid captures are explicitly named groups of the form (?<name>…). This allows unnamed (…) parentheses to act as noncapturing groups without the syntactic clumsiness of the expression (?:…).

+ +

Annotations

+ + + +

Dot matches all (s)

+ + + + + +

Usually, a dot does not match newlines. However, a mode in which dots match any code unit (including newlines) can be as useful as one where dots don't. The s flag allows the mode to be selected on a per-regex basis. Escaped dots (\.) and dots within character classes ([.]) are always equivalent to literal dots. The newline code points are as follows:

+ + + +

Annotations

+ + +
+

When using XRegExp's Unicode Properties addon, you can match any code point without using the s flag via \p{Any}.

+
+ + +

Free-spacing and line comments (x)

+ +

This flag has two complementary effects. First, it causes most whitespace to be ignored, so you can free-format the regex pattern for readability. Second, it allows comments with a leading #. Specifically, it turns most whitespace into an "ignore me" metacharacter, and # into an "ignore me, and everything else up to the next newline" metacharacter. They aren't taken as metacharacters within character classes (which means that classes are not free-format, even with x), and as with other metacharacters, you can escape whitespace and # that you want to be taken literally. Of course, you can always use \s to match whitespace.

+ +
+

It might be better to think of whitespace and comments as do-nothing (rather than ignore-me) metacharacters. This distinction is important with something like \12 3, which with the x flag is taken as \12 followed by 3, and not \123. However, quantifiers following whitespace or comments apply to the preceeding token, so x + is equivalent to x+.

+
+ +

The ignored whitespace characters are those matched natively by \s. ES3 whitespace is based on Unicode 2.1.0 or later. ES5 whitespace is based on Unicode 3.0.0 or later, plus U+FEFF. Following are the code points that should be matched by \s according to ES5 and Unicode 4.0.1–6.1.0 (not yet updated for later versions):

+ + + +

Annotations

+ + +
+

Unicode 1.1.5–4.0.0 assigned code point U+200B (ZWSP) to the Zs (Space separator) category, which means that some browsers or regex engines might include this additional code point in those matched by \s, etc. Unicode 4.0.1 moved ZWSP to the Cf (Format) category.

+ +

Unicode 1.1.5 assigned code point U+FEFF (ZWNBSP) to the Zs category. Unicode 2.0.14 moved ZWNBSP to the Cf category. ES5 explicitly includes ZWNBSP in its list of whitespace characters, even though this does not match any version of the Unicode standard since 1996.

+ +

U+180E (Mongolian vowel separator) was introduced in Unicode 3.0.0, which assigned it the Cf category. Unicode 4.0.0 moved it into the Zs category, and Unicode 6.3.0 moved it back to the Cf category.

+
+ +
+

JavaScript's \s is similar but not equivalent to \p{Z} (the Separator category) from regex libraries that support Unicode categories, including XRegExp's own Unicode Categories addon. The difference is that \s includes code points U+0009U+000D and U+FEFF, which are not assigned the Separator category in the Unicode character database.

+ +

JavaScript's \s is nearly equivalent to \p{White_Space} from the Unicode Properties addon. The differences are: 1. \p{White_Space} does not include U+FEFF (ZWNBSP). 2. \p{White_Space} includes U+0085 (NEL), which is not assigned the Separator category in the Unicode character database.

+ +

Aside: Not all JavaScript regex syntax is Unicode-aware. According to JavaScript specs, \s, \S, ., ^, and $ use Unicode-based interpretations of whitespace and newline, while \d, \D, \w, \W, \b, and \B use ASCII-only interpretations of digit, word character, and word boundary. Many browsers get some of these details wrong.

+ +

For more details, see JavaScript, Regex, and Unicode.

+
+ + +

Astral (A)

+ +

Requires the Unicode Base addon.

+ +

By default, \p{…} and \P{…} support the Basic Multilingual Plane (i.e. code points up to U+FFFF). You can opt-in to full 21-bit Unicode support (with code points up to U+10FFFF) on a per-regex basis by using flag A. In XRegExp, this is called astral mode. You can automatically add flag A for all new regexes by running XRegExp.install('astral'). When in astral mode, \p{…} and \P{…} always match a full code point rather than a code unit, using surrogate pairs for code points above U+FFFF.

+ +
// Using flag A to match astral code points
+XRegExp('^\\pS$').test('💩'); // -> false
+XRegExp('^\\pS$', 'A').test('💩'); // -> true
+XRegExp('(?A)^\\pS$').test('💩'); // -> true
+// Using surrogate pair U+D83D U+DCA9 to represent U+1F4A9 (pile of poo)
+XRegExp('(?A)^\\pS$').test('\uD83D\uDCA9'); // -> true
+
+// Implicit flag A
+XRegExp.install('astral');
+XRegExp('^\\pS$').test('💩'); // -> true
+
+ +

Opting in to astral mode disables the use of \p{…} and \P{…} within character classes. In astral mode, use e.g. (\pL|[0-9_])+ instead of [\pL0-9_]+.

+ +

Annotations

+ + + + + + +
+
+ + + diff --git a/docs/index.html b/docs/index.html new file mode 100644 index 0000000..36e2b42 --- /dev/null +++ b/docs/index.html @@ -0,0 +1,137 @@ + + + + + XRegExp + + + + +
+ +
+ + + + + + + +

What is it?

+ +

XRegExp provides augmented (and extensible) JavaScript regular expressions. You get modern syntax and flags beyond what browsers support natively. XRegExp is also a regex utility belt with tools to make your grepping and parsing easier, while freeing you from regex cross-browser inconsistencies and other annoyances.

+ +

XRegExp supports all native ES6 regular expression syntax. It supports ES5+ browsers (including Internet Explorer 9+), and you can use it with Node.js or as a RequireJS module. Over the years, many of XRegExp's features have been adopted by new JavaScript standards (named capturing, Unicode properties/scripts/categories, flag s, sticky matching, etc.), so using XRegExp can be a way to extend these features into older browsers. It's released under the MIT License.

+ +

XRegExp lets you write regexes like this:

+ +
// Using named capture and flag x (free-spacing and line comments)
+const date = XRegExp(`(?<year>  [0-9]{4} ) -?  # year
+                      (?<month> [0-9]{2} ) -?  # month
+                      (?<day>   [0-9]{2} )     # day`, 'x');
+
+ +

And do cool stuff like this:

+ +
// Using named backreferences...
+XRegExp.exec('2021-02-23', date).groups.year;
+// -> '2021'
+XRegExp.replace('2021-02-23', date, '$<month>/$<day>/$<year>');
+// -> '02/23/2021'
+
+// Finding matches within matches, while passing forward and returning specific backreferences
+const html = `<a href="http://xregexp.com/api/">XRegExp</a>
+              <a href="http://www.google.com/">Google</a>`;
+XRegExp.matchChain(html, [
+  {regex: /<a href="([^"]+)">/i, backref: 1},
+  {regex: XRegExp('(?i)^https?://(?<domain>[^/?#]+)'), backref: 'domain'}
+]);
+// -> ['xregexp.com', 'www.google.com']
+
+ +

Check out more usage examples on GitHub ⇨.

+ +

Features

+ + + +

Performance

+ +

XRegExp compiles to native RegExp objects. Therefore regexes built with XRegExp perform just as fast as native regular expressions. There is a tiny extra cost when compiling a pattern for the first time.

+ +

Installation and usage

+ +

In browsers (bundle XRegExp with all of its addons):

+ +
<script src="https://unpkg.com/xregexp/xregexp-all.js"></script>
+
+ +

Using npm:

+ +
npm install xregexp
+
+ +

In Node.js:

+ +
const XRegExp = require('xregexp');
+
+ +

Named Capture Breaking Change in XRegExp 5

+ +

XRegExp 5 introduced a breaking change where named backreference properties now appear on the result's groups object (following ES2018), rather than directly on the result. To restore the old handling so you don't need to update old code, run the following line after importing XRegExp:

+ +
XRegExp.uninstall('namespacing');
+
+ +

XRegExp 4.1.0 and later allow introducing the new behavior without upgrading to XRegExp 5 by running XRegExp.install('namespacing').

+ +

Following is the most commonly needed change to update code for the new behavior:

+ +
// Change this
+const name = XRegExp.exec(str, regexWithNamedCapture).name;
+
+// To this
+const name = XRegExp.exec(str, regexWithNamedCapture).groups.name;
+
+ +

See the README on GitHub ⇨ for more examples of using named capture with XRegExp.exec and XRegExp.replace.

+ + + + + +
+
+ + + diff --git a/docs/syntax/index.html b/docs/syntax/index.html new file mode 100644 index 0000000..9565c32 --- /dev/null +++ b/docs/syntax/index.html @@ -0,0 +1,232 @@ + + + + + New syntax :: XRegExp + + + + +
+ +
+ + + + + + + +

New syntax

+ +

Named capture

+ +

XRegExp includes comprehensive support for named capture. Following are the details of XRegExp's named capture syntax:

+ + + +

Notes

+ + +

Example

+
const repeatedWords = XRegExp.tag('gi')`\b(?<word>[a-z]+)\s+\k<word>\b`;
+// Alternatively: XRegExp('\\b(?<word>[a-z]+)\\s+\\k<word>\\b', 'gi');
+
+// Check for repeated words
+repeatedWords.test('The the test data');
+// -> true
+
+// Remove any repeated words
+const withoutRepeated = XRegExp.replace('The the test data', repeatedWords, '${word}');
+// -> 'The test data'
+
+const url = XRegExp(`^(?<scheme> [^:/?]+ ) ://   # aka protocol
+                      (?<host>   [^/?]+  )       # domain name/IP
+                      (?<path>   [^?]*   ) \\??  # optional path
+                      (?<query>  .*      )       # optional query`, 'x');
+
+// Get the URL parts
+const parts = XRegExp.exec('http://google.com/path/to/file?q=1', url);
+// parts -> ['http://google.com/path/to/file?q=1', 'http', 'google.com', '/path/to/file', 'q=1']
+// parts.groups.scheme -> 'http'
+// parts.groups.host   -> 'google.com'
+// parts.groups.path   -> '/path/to/file'
+// parts.groups.query  -> 'q=1'
+
+// Named backreferences are available in replacement functions as properties of the last argument
+XRegExp.replace('http://google.com/path/to/file?q=1', url, (match, ...args) => {
+  const groups = args.pop();
+  return match.replace(groups.host, 'xregexp.com');
+});
+// -> 'http://xregexp.com/path/to/file?q=1'
+
+ +

Regexes that use named capture work with all native methods. However, you need to use XRegExp.exec and XRegExp.replace for access to named backreferences, otherwise only numbered backreferences are available.

+ +

Annotations

+ + + +

Inline comments

+ +

Inline comments use the syntax (?#comment). They are an alternative to the line comments allowed in free-spacing mode.

+ +

Comments are a do-nothing (rather than ignore-me) metasequence. This distinction is important with something like \1(?#comment)2, which is taken as \1 followed by 2, and not \12. However, quantifiers following comments apply to the preceeding token, so x(?#comment)+ is equivalent to x+.

+ +

Example

+
const regex = XRegExp('^(?#month)\\d{1,2}/(?#day)\\d{1,2}/(?#year)(\\d{2}){1,2}', 'n');
+const isDate = regex.test('04/20/2008'); // -> true
+
+// Can still be useful when combined with free-spacing, because inline comments
+// don't need to end with \n
+const regex = XRegExp('^ \\d{1,2}      (?#month)' +
+                      '/ \\d{1,2}      (?#day  )' +
+                      '/ (\\d{2}){1,2} (?#year )', 'nx');
+
+ +

Annotations

+ + + +

Leading mode modifier

+ +

A mode modifier uses the syntax (?imnsuxA), where imnsuxA is any combination of XRegExp flags except g or y. Mode modifiers provide an alternate way to enable the specified flags. XRegExp allows the use of a single mode modifier at the very beginning of a pattern only.

+ +

Example

+
const regex = XRegExp('(?im)^[a-z]+$');
+regex.ignoreCase; // -> true
+regex.multiline; // -> true
+
+ +

When creating a regex, it's okay to include flags in a mode modifier that are also provided via the separate flags argument. For instance, XRegExp('(?s).+', 's') is valid.

+ +

Flags g and y cannot be included in a mode modifier, or an error is thrown. This is because g and y, unlike all other flags, have no impact on the meaning of a regex. Rather, they change how particular methods choose to apply the regex. In fact, XRegExp methods provide e.g. scope, sticky, and pos arguments that allow you to use and change such functionality on a per-run rather than per-regex basis. Also consider that it makes sense to apply all other flags to a particular subsection of a regex, whereas flags g and y only make sense when applied to the regex as a whole. Allowing g and y in a mode modifier might therefore create future compatibility problems.

+ +

The use of unknown flags in a mode modifier causes an error to be thrown. However, XRegExp addons can add new flags that are then automatically valid within mode modifiers.

+ +

Annotations

+ + + +

Stricter error handling

+ +

XRegExp makes any escaped letters or numbers a SyntaxError unless they form a valid and complete metasequence or backreference. This helps to catch errors early, and makes it safe for future versions of ES or XRegExp to introduce new escape sequences. It also means that octal escapes are always an error in XRegExp. ES3/5 do not allow octal escapes, but browsers support them anyway for backward compatibility, which often leads to unintended behavior.

+ +

XRegExp requires all backreferences, whether written as \n, \k<n>, or \k<name>, to appear to the right of the opening parenthesis of the group they reference.

+ +

XRegExp never allows \n-style backreferences to be followed by literal numbers. To match backreference 1 followed by a literal 2 character, you can use, e.g., (a)\k<1>2, (?x)(a)\1 2, or (a)\1(?#)2.

+ + +

Unicode

+ +

XRegExp supports matching Unicode categories, scripts, and other properties via addon scripts. Such tokens are matched using \p{…}, \P{…}, and \p{^…}. See XRegExp Unicode addons for more details.

+ +

XRegExp additionally supports the \u{N…} syntax for matching individual code points. In ES6 this is supported natively, but only when using the u flag. XRegExp supports this syntax for code points 0FFFF even when not using the u flag, and it supports the complete Unicode range 010FFFF when using u.

+ + +

Replacement text

+ +

XRegExp's replacement text syntax is used by the XRegExp.replace function. It adds $0 as a synonym of $& (to refer to the entire match), and adds $<n> and ${n} for backreferences to named and numbered capturing groups (in addition to $1, etc.). When the braces syntax is used for numbered backreferences, it allows numbers with three or more digits (not possible natively) and allows separating a backreference from an immediately-following digit (not always possible natively). XRegExp uses stricter replacement text error handling than native JavaScript, to help you catch errors earlier (e.g., the use of a $ character that isn't part of a valid metasequence causes an error to be thrown).

+ +

Following are the special tokens that can be used in XRegExp replacement strings:

+ + + +

XRegExp behavior for $<n> and ${n}:

+ + + +

XRegExp behavior for $n and $nn:

+ + + +

For comparison, following is JavaScript's native behavior for $n and $nn:

+ + + + + + + +
+
+ + + diff --git a/docs/syntax/named_capture_comparison/index.html b/docs/syntax/named_capture_comparison/index.html new file mode 100644 index 0000000..6f0ba0f --- /dev/null +++ b/docs/syntax/named_capture_comparison/index.html @@ -0,0 +1,355 @@ + + + + + Named capture comparison :: XRegExp + + + + +
+ +
+ + + + + +

New syntax » Named capture comparison

+ +

There are several different syntaxes used for named capture. Although Python was the first to implement the feature, most libraries have adopted .NET's alternative syntax.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
LibraryCaptureBackref in regexBackref in replacementStored atBackref numberingMultiple groups with same name
XRegExp +
    +
  • (?<name>…)
  • +
  • (?P<name>…)1
  • +
+
+
    +
  • \k<name>
  • +
+
+
    +
  • $<name>2
  • +
  • ${name}
  • +
+
+ result.groups.name3 + SequentialError4
EcmaScript 2018 +
    +
  • (?<name>…)
  • +
+
+
    +
  • \k<name>
  • +
+
+
    +
  • $<name>
  • +
+
+ result.groups.name + SequentialError
.NET +
    +
  • (?<name>…)
  • +
  • (?'name'…)
  • +
+
+
    +
  • \k<name>
  • +
  • \k'name'
  • +
+
+
    +
  • ${name}
  • +
+
matcher.Groups('name')Unnamed first, then namedBackref to last executed participating group
Perl 5.10 +
    +
  • (?<name>…)
  • +
  • (?'name'…)
  • +
  • (?P<name>…)
  • +
+
+
    +
  • \k<name>
  • +
  • \k'name'
  • +
  • \k{name}
  • +
  • \g{name}
  • +
  • (?P=name)
  • +
+
+
    +
  • $+{name}
  • +
+
$+{name}SequentialBackref to leftmost participating group
PCRE 7 +
    +
  • (?<name>…)
  • +
  • (?'name'…)
  • +
  • (?P<name>…)
  • +
+
+
    +
  • \k<name>
  • +
  • \k'name'
  • +
  • \k{name}5
  • +
  • \g{name}5
  • +
  • (?P=name)
  • +
+
N/ASequentialError
PCRE 4 +
    +
  • (?P<name>…)
  • +
+
+
    +
  • (?P=name)
  • +
+
N/ASequentialError
Python +
    +
  • (?P<name>…)
  • +
+
+
    +
  • (?P=name)
  • +
+
+
    +
  • \g<name>
  • +
+
result.group('name')SequentialError
Oniguruma +
    +
  • (?<name>…)
  • +
  • (?'name'…)
  • +
+
+
    +
  • \k<name>
  • +
  • \k'name'
  • +
+
+
    +
  • \k<name>
  • +
  • \k'name'
  • +
+
N/AUnnamed groups default to noncapturing when mixed with named groupsBackref to rightmost participating group. Backrefs within a regex work as alternation of matches of all preceding groups with the same name, in reverse order.
Java 7 +
    +
  • (?<name>…)
  • +
+
+
    +
  • \k<name>
  • +
+
+
    +
  • ${name}
  • +
+
matcher.group('name')SequentialError
JGsoft +
    +
  • (?<name>…)
  • +
  • (?'name'…)
  • +
  • (?P<name>…)
  • +
+
+
    +
  • \k<name>
  • +
  • \k'name'
  • +
  • (?P=name)
  • +
+
+
    +
  • ${name}
  • +
  • \g<name>
  • +
+
N/A.NET and Python styles, depending on capture syntaxSame as .NET
Boost.Regex +
    +
  • (?<name>…)
  • +
  • (?'name'…)
  • +
+
+
    +
  • \k<name>
  • +
  • \g{name}
  • +
+
????
RE2 +
    +
  • (?P<name>…)
  • +
+
N/A????
JRegex +
    +
  • ({name}…)
  • +
+
+
    +
  • {\name}
  • +
+
+
    +
  • ${name}
  • +
+
matcher.group('name')??
+ +

1 As of XRegExp 2. Not recommended for use, because support for the (?P<name>…) syntax may be removed in future versions of XRegExp. It is currently supported only to avoid an octal escape versus backreference issue in old Opera. Opera supported the Python named capture syntax natively, but did not provide full named capture functionality.

+ +

2 As of XRegExp 4.

+ +

3 As of XRegExp 4.1, when the namespacing option is on (it's on by default in XRegExp 5). Stored at result.name when namespacing is off.
+ Note: Within string.replace callbacks, stored at: arguments[arguments.length - 1].name (with namespacing on) or arguments[0].name (with namespacing off).

+ +

4 As of XRegExp 3.

+ +

5 As of PCRE 7.2.

+ +

TODO: Add a column comparing the use of capture names in regex conditionals (not supported by XRegExp).

+ + + + + +
+
+ + + diff --git a/docs/unicode/index.html b/docs/unicode/index.html new file mode 100644 index 0000000..6d62054 --- /dev/null +++ b/docs/unicode/index.html @@ -0,0 +1,86 @@ + + + + + Unicode :: XRegExp + + + + +
+ +
+ + + + + +

Unicode

+ +

Requires the Unicode addons, which are bundled in xregexp-all.js. Alternatively, you can download the individual addon scripts from GutHub. XRegExp's npm package uses xregexp-all.js.

+ +

The Unicode Base script adds base support for Unicode matching via the \p{…} syntax. À la carte token addon packages add support for Unicode categories, scripts, and other properties. All Unicode tokens can be inverted using \P{…} or \p{^…}. Token names are case insensitive, and any spaces, hyphens, and underscores are ignored. You can omit the braces for token names that are a single letter.

+ +

Example

+
// Categories
+XRegExp('\\p{Sc}\\pN+'); // Sc = currency symbol, N = number
+// Can also use the full names \p{Currency_Symbol} and \p{Number}
+
+// Scripts
+XRegExp('\\p{Cyrillic}');
+XRegExp('[\\p{Latin}\\p{Common}]');
+// Can also use the Script= prefix to match ES2018: \p{Script=Cyrillic}
+
+// Properties
+XRegExp('\\p{ASCII}');
+XRegExp('\\p{Assigned}');
+
+// In action...
+
+const unicodeWord = XRegExp("^\\pL+$"); // L = letter
+unicodeWord.test("Русский"); // true
+unicodeWord.test("日本語"); // true
+unicodeWord.test("العربية"); // true
+
+XRegExp("^\\p{Katakana}+$").test("カタカナ"); // true
+
+ +

By default, \p{…} and \P{…} support the Basic Multilingual Plane (i.e. code points up to U+FFFF). You can opt-in to full 21-bit Unicode support (with code points up to U+10FFFF) on a per-regex basis by using flag A. In XRegExp, this is called astral mode. You can automatically add flag A for all new regexes by running XRegExp.install('astral'). When in astral mode, \p{…} and \P{…} always match a full code point rather than a code unit, using surrogate pairs for code points above U+FFFF.

+ +
// Using flag A to match astral code points
+XRegExp('^\\pS$').test('💩'); // -> false
+XRegExp('^\\pS$', 'A').test('💩'); // -> true
+// Using surrogate pair U+D83D U+DCA9 to represent U+1F4A9 (pile of poo)
+XRegExp('^\\pS$', 'A').test('\uD83D\uDCA9'); // -> true
+
+// Implicit flag A
+XRegExp.install('astral');
+XRegExp('^\\pS$').test('💩'); // -> true
+
+ +

Opting in to astral mode disables the use of \p{…} and \P{…} within character classes. In astral mode, use e.g. (\pL|[0-9_])+ instead of [\pL0-9_]+.

+ + + + + +
+
+ + + diff --git a/package-lock.json b/package-lock.json index 5b76587..2b8a87e 100644 --- a/package-lock.json +++ b/package-lock.json @@ -1,11 +1,11 @@ { "name": "xregexp", - "version": "5.0.0", + "version": "5.0.1", "lockfileVersion": 2, "requires": true, "packages": { "": { - "version": "5.0.0", + "version": "5.0.1", "license": "MIT", "dependencies": { "@babel/runtime-corejs3": "^7.12.1" diff --git a/package.json b/package.json index 3d01284..125d702 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "xregexp", - "version": "5.0.0", + "version": "5.0.1", "description": "Extended regular expressions", "homepage": "http://xregexp.com/", "author": "Steven Levithan ", diff --git a/src/addons/build.js b/src/addons/build.js index ad17507..91be36f 100644 --- a/src/addons/build.js +++ b/src/addons/build.js @@ -1,5 +1,5 @@ /*! - * XRegExp.build 5.0.0 + * XRegExp.build 5.0.1 * * Steven Levithan (c) 2012-present MIT License */ diff --git a/src/addons/matchrecursive.js b/src/addons/matchrecursive.js index c459227..f79f277 100644 --- a/src/addons/matchrecursive.js +++ b/src/addons/matchrecursive.js @@ -1,5 +1,5 @@ /*! - * XRegExp.matchRecursive 5.0.0 + * XRegExp.matchRecursive 5.0.1 * * Steven Levithan (c) 2009-present MIT License */ diff --git a/src/addons/unicode-base.js b/src/addons/unicode-base.js index b40aebb..3bc3e7c 100644 --- a/src/addons/unicode-base.js +++ b/src/addons/unicode-base.js @@ -1,5 +1,5 @@ /*! - * XRegExp Unicode Base 5.0.0 + * XRegExp Unicode Base 5.0.1 * * Steven Levithan (c) 2008-present MIT License */ @@ -15,7 +15,7 @@ export default (XRegExp) => { * - Adds the `XRegExp.addUnicodeData` method used by other addons to provide character data. * * Unicode Base relies on externally provided Unicode character data. Official addons are - * available to provide data for Unicode categories, scripts, blocks, and properties. + * available to provide data for Unicode categories, scripts, and properties. * * @requires XRegExp */ diff --git a/src/addons/unicode-categories.js b/src/addons/unicode-categories.js index 1b496bc..938a06d 100644 --- a/src/addons/unicode-categories.js +++ b/src/addons/unicode-categories.js @@ -1,5 +1,5 @@ /*! - * XRegExp Unicode Categories 5.0.0 + * XRegExp Unicode Categories 5.0.1 * * Steven Levithan (c) 2010-present MIT License * Unicode data by Mathias Bynens diff --git a/src/addons/unicode-properties.js b/src/addons/unicode-properties.js index 852d3a1..4efa0b9 100644 --- a/src/addons/unicode-properties.js +++ b/src/addons/unicode-properties.js @@ -1,5 +1,5 @@ /*! - * XRegExp Unicode Properties 5.0.0 + * XRegExp Unicode Properties 5.0.1 * * Steven Levithan (c) 2012-present MIT License * Unicode data by Mathias Bynens diff --git a/src/addons/unicode-scripts.js b/src/addons/unicode-scripts.js index 6121c9f..08e446a 100644 --- a/src/addons/unicode-scripts.js +++ b/src/addons/unicode-scripts.js @@ -1,5 +1,5 @@ /*! - * XRegExp Unicode Scripts 5.0.0 + * XRegExp Unicode Scripts 5.0.1 * * Steven Levithan (c) 2010-present MIT License * Unicode data by Mathias Bynens diff --git a/src/xregexp.js b/src/xregexp.js index 1a0d83a..494e9a9 100644 --- a/src/xregexp.js +++ b/src/xregexp.js @@ -1,5 +1,5 @@ /*! - * XRegExp 5.0.0 + * XRegExp 5.0.1 * * Steven Levithan (c) 2007-present MIT License */ @@ -646,7 +646,7 @@ XRegExp.prototype = new RegExp(); * @memberOf XRegExp * @type String */ -XRegExp.version = '5.0.0'; +XRegExp.version = '5.0.1'; // ==--------------------------== // Public methods