+ + + + + +

XRegExp
XRegExp.addToken
XRegExp.build
XRegExp.cache
XRegExp.escape
XRegExp.exec
XRegExp.forEach
XRegExp.globalize
XRegExp.install
XRegExp.isInstalled
XRegExp.isRegExp
XRegExp.match
XRegExp.matchChain
XRegExp.matchRecursive
XRegExp.replace
XRegExp.replaceEach
XRegExp.split
XRegExp.tag
XRegExp.test
XRegExp.uninstall
XRegExp.union
XRegExp.version

+ +

API

+ +

`XRegExp(pattern, [flags])`

+ +

Creates an extended regular expression object for matching text with a pattern. Differs from a + native regular expression in that additional syntax and flags are supported. The returned object + is in fact a native RegExp and works with all native methods.

+ + + + + + + + + + + + +

Parameters:

Parameters:	+ + `pattern` {`String`\|`RegExp`} + Regex pattern string, or an existing regex object to copy. + + [`flags`] {`String`} + Any combination of flags. + Native flags: + + `g` - global + `i` - ignore case + `m` - multiline anchors + `u` - unicode (ES6) + `y` - sticky (Firefox 3+, ES6) + + Additional XRegExp flags: + + `n` - explicit capture + `s` - dot matches all (aka singleline) - works even when not natively supported + `x` - free-spacing and line comments (aka extended) + `A` - astral (requires the Unicode Base addon) + + Flags cannot be provided when constructing one `RegExp` from another. + + +
Returns:	+ {`RegExp`} + Extended regular expression object. +

pattern {String|RegExp}
+ Regex pattern string, or an existing regex object to copy. +
[flags] {String}
+ Any combination of flags.
+ Native flags: +
- g - global
- i - ignore case
- m - multiline anchors
- u - unicode (ES6)
- y - sticky (Firefox 3+, ES6)
+ Additional XRegExp flags: +
- n - explicit capture
- s - dot matches all (aka singleline) - works even when not natively supported
- x - free-spacing and line comments (aka extended)
- A - astral (requires the Unicode Base addon)
+ Flags cannot be provided when constructing one RegExp from another. +

Returns: + {RegExp}
+ Extended regular expression object. +

+ +

Example

// With named capture and flag x
+XRegExp(`(?<year>  [0-9]{4} ) [-\\s]?  # year
+         (?<month> [0-9]{2} ) [-\\s]?  # month
+         (?<day>   [0-9]{2} )          # day`, 'x');
+
+// Providing a regex object copies it. Native regexes are recompiled using native (not
+// XRegExp) syntax. Copies maintain extended data, are augmented with `XRegExp.prototype`
+// properties, and have fresh `lastIndex` properties (set to zero).
+XRegExp(/regex/);
+

+ +

For details about the regular expression just shown, see Syntax: Named capture and Flags: Free-spacing.

+ +

Regexes, strings, and backslashes

JavaScript string literals (as opposed to, e.g., user input or text extracted from the DOM) use a backslash as an escape character. The string literal '\\' therefore contains a single backslash, and its length property's value is 1. However, a backslash is also an escape character in regular expression syntax, where the pattern \\ matches a single backslash. When providing string literals to the RegExp or XRegExp constructor functions, four backslashes are therefore needed to match a single backslash—e.g., XRegExp('\\\\'). Only two of those backslashes are actually passed into the constructor function. The other two are used to escape the backslashes in the string before the function ever sees the string. The exception is when using ES6 raw strings via String.raw or XRegExp.tag.

+ +

The same issue is at play with the \\s sequences in the example code just shown. XRegExp is provided with the two characters \s, which it in turn recognizes as the metasequence used to match a whitespace character.

+ + +

`XRegExp.addToken(regex, handler, [options])`

Extends XRegExp syntax and allows custom flags. This is used internally and can be used to create XRegExp addons. If more than one token can match the same string, the last added wins.

+ + + + + + + + + + + + +

Parameters:

Parameters:	+ `regex` {`RegExp`} + Regex object that matches the new token. + + `handler` {`Function`} + Function that returns a new pattern string (using native regex syntax) + to replace the matched token within all future XRegExp regexes. Has access to persistent + properties of the regex being built, through `this`. Invoked with three arguments: + + The match array, with named backreference properties. + The regex scope where the match was found: `'default'` or `'class'`. + The flags used by the regex, including any flags in a leading mode modifier. + + The handler function becomes part of the XRegExp construction process, so be careful not to + construct XRegExps within the function or you will trigger infinite recursion. + + [`options`] {`Object`} + Options object with optional properties: + + `scope` {`String`} Scopes where the token applies: `'default'`, `'class'`, or `'all'`. + `flag` {`String`} Single-character flag that triggers the token. This also registers the + flag, which prevents XRegExp from throwing an 'unknown flag' error when the flag is used. + `optionalFlags` {`String`} Any custom flags checked for within the token `handler` that are + not required to trigger the token. This registers the flags, to prevent XRegExp from + throwing an 'unknown flag' error when any of the flags are used. + `reparse` {`Boolean`} Whether the `handler` function's output should not be treated as + final, and instead be reparseable by other tokens (including the current token). Allows + token chaining or deferring. + `leadChar` {`String`} Single character that occurs at the beginning of any successful match + of the token (not always applicable). This doesn't change the behavior of the token unless + you provide an erroneous value. However, providing it can increase the token's performance + since the token can be skipped at any positions where this character doesn't appear. + + + +
Returns:	+ {`undefined`} + Does not return a value. +

regex {RegExp}
+ Regex object that matches the new token. +
handler {Function}
+ Function that returns a new pattern string (using native regex syntax) + to replace the matched token within all future XRegExp regexes. Has access to persistent + properties of the regex being built, through this. Invoked with three arguments: +
1. The match array, with named backreference properties.
2. The regex scope where the match was found: 'default' or 'class'.
3. The flags used by the regex, including any flags in a leading mode modifier.
+ The handler function becomes part of the XRegExp construction process, so be careful not to + construct XRegExps within the function or you will trigger infinite recursion. +
[options] {Object}
+ Options object with optional properties: +
- scope {String} Scopes where the token applies: 'default', 'class', or 'all'.
- flag {String} Single-character flag that triggers the token. This also registers the + flag, which prevents XRegExp from throwing an 'unknown flag' error when the flag is used.
- optionalFlags {String} Any custom flags checked for within the token handler that are + not required to trigger the token. This registers the flags, to prevent XRegExp from + throwing an 'unknown flag' error when any of the flags are used.
- reparse {Boolean} Whether the handler function's output should not be treated as + final, and instead be reparseable by other tokens (including the current token). Allows + token chaining or deferring.
- leadChar {String} Single character that occurs at the beginning of any successful match + of the token (not always applicable). This doesn't change the behavior of the token unless + you provide an erroneous value. However, providing it can increase the token's performance + since the token can be skipped at any positions where this character doesn't appear.
+

Returns: + {undefined}
+ Does not return a value. +

+ +

Example

// Basic usage: Add \a for the ALERT control code
+XRegExp.addToken(
+  /\\a/,
+  () => '\\x07',
+  {scope: 'all'}
+);
+XRegExp('\\a[\\a-\\n]+').test('\x07\n\x07'); // -> true
+

+ +

Show more XRegExp.addToken examples. ↓

+ +

// Add escape sequences: \Q..\E and \Q..
+XRegExp.addToken(
+  /\\Q([\s\S]*?)(?:\\E|$)/,
+  (match) => XRegExp.escape(match[1]),
+  {scope: 'all'}
+);
+XRegExp('^\\Q(?*+)').test('(?*+)'); // -> true
+
+// Add the U (ungreedy) flag from PCRE and RE2, which reverses greedy and lazy quantifiers.
+// Since `scope` is not specified, it uses 'default' (i.e., transformations apply outside of
+// character classes only)
+XRegExp.addToken(
+  /([?*+]|{\d+(?:,\d*)?})(\??)/,
+  (match) => match[1] + (match[2] ? '' : '?'),
+  {flag: 'U'}
+);
+XRegExp('a+', 'U').exec('aaa')[0]; // -> 'a'
+XRegExp('a+?', 'U').exec('aaa')[0]; // -> 'aaa'
+
+// Add \R for matching any line separator (CRLF, CR, LF, etc.)
+XRegExp.addToken(
+  /\\R/,
+  () => '(?:\\r\\n|[\\n-\\r\\x85\\u2028\\u2029])'
+);
+
+// Add Ruby/Oniguruma's \h for hexadecimal digits, and \H for the inverse
+XRegExp.addToken(
+  /\\([hH])/,
+  (match, scope) => {
+    const inv = (match[1] === 'H'); // Uppercase for inverted
+    if (scope === 'class') {
+      return inv ? '\\0-/:-@G-`g-\\uffff' : '0-9A-Fa-f';
+    }
+    return '[' + (inv ? '^' : '') + '0-9A-Fa-f]';
+  },
+  {scope: 'all'}
+);
+
+// Add POSIX character classes like [[:alpha:]] (ASCII-only)
+XRegExp.addToken(
+  /\[:([a-z\d]+):]/i,
+  (function() {
+    const posix = {
+      alnum : 'A-Za-z0-9',
+      alpha : 'A-Za-z',
+      ascii : '\\0-\\x7F',
+      blank : ' \\t',
+      cntrl : '\\0-\\x1F\\x7F',
+      digit : '0-9',
+      graph : '\\x21-\\x7E',
+      lower : 'a-z',
+      print : '\\x20-\\x7E',
+      punct : '!"#$%& \'()*+,\\-./:;<=>?@[\\\\\\]^_`{|}~',
+      space : ' \\t\\r\\n\\v\\f',
+      upper : 'A-Z',
+      word  : 'A-Za-z0-9_',
+      xdigit: 'A-Fa-f0-9'
+    };
+    return (match) => {
+      if (!posix[match[1]]) {
+        throw new SyntaxError(match[1] + ' is not a valid POSIX character class');
+      }
+      return posix[match[1]];
+    };
+  }()),
+  {scope: 'class'}
+);
+XRegExp('^[[:xdigit:][:space:]]+$').test('00A9 1B7F'); // -> true
+

+ + +

`XRegExp.build(pattern, subs, [flags])`

+ +

Requires the XRegExp.build addon, which is bundled in xregexp-all.js.

+ +

Builds regexes using named subpatterns, for readability and pattern reuse. Backreferences in the + outer pattern and provided subpatterns are automatically renumbered to work correctly. Native + flags used by provided subpatterns are ignored in favor of the flags argument.

+ + + + + + + + + + + + +

Parameters:

Parameters:	+ `pattern` {`String`} + XRegExp pattern using `{{name}}` for embedded subpatterns. Allows `({{name}})` as shorthand for `(?<name>{{name}})`. Patterns cannot be embedded within character classes. + + `subs` {`Object`} + Lookup object for named subpatterns. Values can be strings or regexes. A leading `^` and trailing unescaped `$` are stripped from subpatterns, if both are present. + + [`flags`] {`String`} + Any combination of XRegExp flags. + + +
Returns:	+ {`RegExp`} + Regex with interpolated subpatterns. + + +

pattern {String}
+ XRegExp pattern using {{name}} for embedded subpatterns. Allows ({{name}}) as shorthand for (?<name>{{name}}). Patterns cannot be embedded within character classes. +
subs {Object}
+ Lookup object for named subpatterns. Values can be strings or regexes. A leading ^ and trailing unescaped $ are stripped from subpatterns, if both are present. +
[flags] {String}
+ Any combination of XRegExp flags. +

Returns:

{RegExp}
+ Regex with interpolated subpatterns. +

+ +

Example

const time = XRegExp.build('(?x)^ {{hours}} ({{minutes}}) $', {
+  hours: XRegExp.build('{{h12}} : | {{h24}}', {
+    h12: /1[0-2]|0?[1-9]/,
+    h24: /2[0-3]|[01][0-9]/
+  }, 'x'),
+  minutes: /^[0-5][0-9]$/
+});
+
+time.test('10:59'); // -> true
+XRegExp.exec('10:59', time).groups.minutes; // -> '59'
+

+ +

+ + +

`XRegExp.cache(pattern, [flags])`

+ +

Caches and returns the result of calling XRegExp(pattern, flags). On any subsequent call with + the same pattern and flag combination, the cached copy of the regex is returned.

+ + + + + + + + + + + + +

Parameters:	+ `pattern` {`String`} + Regex pattern string. + + [`flags`] {`String`} + Any combination of XRegExp flags. + + +
Returns:	+ {`RegExp`} + Cached XRegExp object. +

+ +

Example

let match;
+while (match = XRegExp.cache('.', 'gs').exec('abc')) {
+  // The regex is compiled once only
+}
+
+const regex1 = XRegExp.cache('.', 's'),
+const regex2 = XRegExp.cache('.', 's');
+// regex1 and regex2 are references to the same regex object
+

+ + +

`XRegExp.escape(str)`

+ +

Escapes any regular expression metacharacters, for use when matching literal strings. The result + can safely be used at any position within a regex that uses any flags.

+ +

The escaped characters are [, ], {, }, (, ), -, *, +, ?, ., \, ^, $, |, ,, #, and whitespace (see free-spacing for the list of whitespace characters).

+ + + + + + + + + + + + +

Parameters:	+ `str` {`String`} + String to escape. + + +
Returns:	+ {`String`} + String with regex metacharacters escaped. +

+ +

Example

XRegExp.escape('Escaped? <.>');
+// -> 'Escaped\?\u0020<\.>'
+

+ + +

`XRegExp.exec(str, regex, [pos], [sticky])`

+ +

Executes a regex search in a specified string. Returns a match array or null. If the provided + regex uses named capture, named capture properties are included on the match array's groups + property. Optional pos and sticky arguments specify the search start position, and whether + the match must start at the specified position only. The lastIndex property of the provided + regex is not used, but is updated for compatibility. Also fixes browser bugs compared to the + native RegExp.prototype.exec and can be used reliably cross-browser.

+ + + + + + + + + + + + +

Parameters:	+ `str` {`String`} + String to search. + + `regex` {`RegExp`} + Regex to search with. + + [`pos`=`0`] {`Number`} + Zero-based index at which to start the search. + + [`sticky`=`false`] {`Boolean`\|`String`} + Whether the match must start at the specified position only. The string `'sticky'` is accepted as an alternative to `true`. + + +
Returns:	+ {`Array`} + Match array with named capture properties on the `groups` object, or `null`. If the `namespacing` feature is off, named capture properties are directly on the match array. +

+ +

Example

// Basic use, with named backreference
+let match = XRegExp.exec('U+2620', XRegExp('U\\+(?[0-9A-F]{4})'));
+match.groups.hex; // -> '2620'
+
+// With pos and sticky, in a loop
+let pos = 2, result = [], match;
+while (match = XRegExp.exec('<1><2><3><4>5<6>', /<(\d)>/, pos, 'sticky')) {
+  result.push(match[1]);
+  pos = match.index + match[0].length;
+}
+// result -> ['2', '3', '4']
+

+ + +

`XRegExp.forEach(str, regex, callback)`

+ +

Executes a provided function once per regex match. Searches always start at the beginning of the string and continue until the end, regardless of the state of the regex's global property and initial lastIndex.

+ + + + + + + + + + + + +

Parameters:	+ `str` {`String`} + String to search. + + `regex` {`RegExp`} + Regex to search with. + + `callback` {`Function`} + Function to execute for each match. Invoked with four arguments: + + The match array, with named backreference properties. + The zero-based match index. + The string being traversed. + The regex object being used to traverse the string. + + + +
Returns:	+ {`undefined`} + Does not return a value. +

+ +

Example

// Extracts every other digit from a string
+const evens = [];
+XRegExp.forEach('1a2345', /\d/, function (match, i) {
+  if (i % 2) evens.push(+match[0]);
+});
+// evens -> [2, 4]
+

+ + +

`XRegExp.globalize(regex)`

+ +

Copies a regex object and adds flag g. The copy maintains extended data, + is augmented with XRegExp.prototype properties, and has a fresh lastIndex property (set to + zero). Native regexes are not recompiled using XRegExp syntax.

+ + + + + + + + + + + + +

Parameters:	+ `regex` {`RegExp`} + Regex to globalize. + + +
Returns:	+ {`RegExp`} + Copy of the provided regex with flag `g` added. +

+ +

Example

const globalCopy = XRegExp.globalize(/regex/);
+globalCopy.global; // -> true
+
+function parse(str, regex) {
+  regex = XRegExp.globalize(regex);
+  let match;
+  while (match = regex.exec(str)) {
+    // ...
+  }
+}
+

+ + +

`XRegExp.install(options)`

+ +

Installs optional features according to the specified options. Can be undone using XRegExp.uninstall.

+ + + + + + + + + + + + +

Parameters:	+ `options` {`Object`\|`String`} + Options object or string. + + +
Returns:	+ {`undefined`} + Does not return a value. +

+ +

Example

// With an options object
+XRegExp.install({
+  // Enables support for astral code points in Unicode addons (implicitly sets flag A)
+  astral: true,
+
+  // Adds named capture groups to the `groups` property of matches
+  // On by default in XRegExp 5
+  namespacing: true
+});
+
+// With an options string
+XRegExp.install('astral namespacing');
+

+ + +

`XRegExp.isInstalled(feature)`

+ +

Checks whether an individual optional feature is installed.

+ + + + + + + + + + + + +

Parameters:	+ `feature` {`String`} + Name of the feature to check. One of: + + `astral` + `namespacing` + + + +
Returns:	+ {`Boolean`} + Whether the feature is installed. +

+ +

Example

XRegExp.isInstalled('astral');
+

+ + +

`XRegExp.isRegExp(value)`

+ +

Returns true if an object is a regex; false if it isn't. This works correctly for regexes + created in another frame, when instanceof and constructor checks would fail.

+ + + + + + + + + + + + +

Parameters:	+ `value` {`*`} + Object to check. + + +
Returns:	+ {`Boolean`} + Whether the object is a `RegExp` object. +

+ +

Example

XRegExp.isRegExp('string'); // -> false
+XRegExp.isRegExp(/regex/i); // -> true
+XRegExp.isRegExp(RegExp('^', 'm')); // -> true
+XRegExp.isRegExp(XRegExp('(?s).')); // -> true
+

+ + +

`XRegExp.match(str, regex, [scope])`

+ +

Returns the first matched string, or in global mode, an array containing all matched strings. + This is essentially a more convenient re-implementation of String.prototype.match that gives + the result types you actually want (string instead of exec-style array in match-first mode, + and an empty array instead of null when no matches are found in match-all mode). It also lets + you override flag g and ignore lastIndex, and fixes browser bugs.

+ + + + + + + + + + + + +

Parameters:	+ `str` {`String`} + String to search. + + `regex` {`RegExp`} + Regex to search with. + + [`scope`=`'one'`] {`String`} + Use `'one'` to return the first match as a string. Use `'all'` to + return an array of all matched strings. If not explicitly specified and `regex` uses flag `g`, + `scope` is `all`. + + +
Returns:	+ {`String`\|`Array`} + In match-first mode: First match as a string, or `null`. In match-all + mode: Array of all matched strings, or an empty array. +

+ +

Example

// Match first
+XRegExp.match('abc', /\w/); // -> 'a'
+XRegExp.match('abc', /\w/g, 'one'); // -> 'a'
+XRegExp.match('abc', /x/g, 'one'); // -> null
+
+// Match all
+XRegExp.match('abc', /\w/g); // -> ['a', 'b', 'c']
+XRegExp.match('abc', /\w/, 'all'); // -> ['a', 'b', 'c']
+XRegExp.match('abc', /x/, 'all'); // -> []
+

+ + +

`XRegExp.matchChain(str, chain)`

+ +

Retrieves the matches from searching a string using a chain of regexes that successively search + within previous matches. The provided chain array can contain regexes and or objects with regex + and backref properties. When a backreference is specified, the named or numbered backreference + is passed forward to the next regex or returned.

+ + + + + + + + + + + + +

Parameters:	+ `str` {`String`} + String to search. + + `chain` {`Array`} + Regexes that each search for matches within preceding results. + + +
Returns:	+ {`Array`} + Matches by the last regex in the chain, or an empty array. +

+ +

Example

// Basic usage; matches numbers within <b> tags
+XRegExp.matchChain('1 <b>2</b> 3 <b>4 a 56</b>', [
+  XRegExp('(?is)<b>.*?</b>'),
+  /\d+/
+]);
+// -> ['2', '4', '56']
+
+// Passing forward and returning specific backreferences
+const html = `<a href="http://xregexp.com/api/">XRegExp</a>
+              <a href="http://www.google.com/">Google</a>`;
+XRegExp.matchChain(html, [
+  {regex: /<a href="([^"]+)">/i, backref: 1},
+  {regex: XRegExp('(?i)^https?://(?<domain>[^/?#]+)'), backref: 'domain'}
+]);
+// -> ['xregexp.com', 'www.google.com']
+

+ + +

`XRegExp.matchRecursive(str, left, right, [flags], [options])`

+ +

Requires the XRegExp.matchRecursive addon, which is bundled in xregexp-all.js.

+ +

Returns an array of match strings between outermost left and right delimiters, or an array of + objects with detailed match parts and position data. An error is thrown if delimiters are + unbalanced within the data.

+ + + + + + + + + + + + +

Parameters:	+ `str` {`String`} + String to search. + + `left` {`String`} + Left delimiter as an XRegExp pattern. + + `right` {`String`} + Right delimiter as an XRegExp pattern. + + [`flags`] {`String`} + Any combination of XRegExp flags, used for the left and right delimiters. + + [`options`] {`Object`} + Lets you specify `valueNames` and `escapeChar` options. + + +
Returns:	+ {`Array`} + Array of matches, or an empty array. + + +

+ +

Example

// Basic usage
+let str = '(t((e))s)t()(ing)';
+XRegExp.matchRecursive(str, '\\(', '\\)', 'g');
+// -> ['t((e))s', '', 'ing']
+
+// Extended information mode with valueNames
+str = 'Here is <div> <div>an</div></div> example';
+XRegExp.matchRecursive(str, '<div\\s*>', '</div>', 'gi', {
+  valueNames: ['between', 'left', 'match', 'right']
+});
+/* -> [
+{name: 'between', value: 'Here is ',       start: 0,  end: 8},
+{name: 'left',    value: '<div>',          start: 8,  end: 13},
+{name: 'match',   value: ' <div>an</div>', start: 13, end: 27},
+{name: 'right',   value: '</div>',         start: 27, end: 33},
+{name: 'between', value: ' example',       start: 33, end: 41}
+] */
+
+// Omitting unneeded parts with null valueNames, and using escapeChar
+str = '...{1}.\\{{function(x,y){return {y:x}}}';
+XRegExp.matchRecursive(str, '{', '}', 'g', {
+  valueNames: ['literal', null, 'value', null],
+  escapeChar: '\\'
+});
+/* -> [
+{name: 'literal', value: '...',  start: 0, end: 3},
+{name: 'value',   value: '1',    start: 4, end: 5},
+{name: 'literal', value: '.\\{', start: 6, end: 9},
+{name: 'value',   value: 'function(x,y){return {y:x}}', start: 10, end: 37}
+] */
+
+// Sticky mode via flag y
+str = '<1><<<2>>><3>4<5>';
+XRegExp.matchRecursive(str, '<', '>', 'gy');
+// -> ['1', '<<2>>', '3']
+

+ + +

`XRegExp.replace(str, search, replacement, [scope])`

+ +

Returns a new string with one or all matches of a pattern replaced. The pattern can be a string + or regex, and the replacement can be a string or a function to be called for each match. To + perform a global search and replace, use the optional scope argument or include flag g if + using a regex. Replacement strings can use $<n> or ${n} for named and numbered backreferences. + Replacement functions can use named backreferences via the last argument. Also fixes browser + bugs compared to the native String.prototype.replace and can be used reliably cross-browser.

+ +

For the full details of XRegExp's replacement text syntax, see Syntax: Replacement text.

+ + + + + + + + + + + + +

Parameters:

Parameters:	+ `str` {`String`} + String to search. + + `search` {`RegExp`\|`String`} + Search pattern to be replaced. + + `replacement` {`String`\|`Function`} + Replacement string or a function invoked to create it. + Replacement strings can include special replacement syntax: + + `$$` - Inserts a literal `$` character. + `$&`, `$0` - Inserts the matched substring. + $` - Inserts the string that precedes the matched substring (left context). + `$'` - Inserts the string that follows the matched substring (right context). + `$n`, `$nn` - Where n/nn are digits referencing an existent capturing group, inserts + backreference n/nn. + `$<n>`, `${n}` - Where n is a name or any number of digits that reference an existing capturing + group, inserts backreference n. + + Replacement functions are invoked with three or more arguments: + + `args[0]` - The matched substring (corresponds to `$&` above). If the `namespacing` feature is off, named backreferences are accessible as properties of this argument. + `args[1..n]` - One argument for each backreference (corresponding to `$1`, `$2`, etc. above). If the regex has no capturing groups, no arguments appear in this position. + `args[n+1]` - The zero-based index of the match within the entire search string. + `args[n+2]` - The total string being searched. + `args[n+3]` - If the the search pattern is a regex with named capturing groups, the last argument is the groups object. Its keys are the backreference names and its values are the backreference values. If the `namespacing` feature is off, this argument is not present. + + + [`scope`] {`String`} + Use `'one'` to replace the first match only, or `'all'`. Defaults to `'one'`. Defaults to `'all'` if using a regex with flag `g`. + + +
Returns:	+ {`String`} + New string with one or all matches replaced. +

str {String}
+ String to search. +
search {RegExp|String}
+ Search pattern to be replaced. +
replacement {String|Function}
+ Replacement string or a function invoked to create it.
+ Replacement strings can include special replacement syntax: +
- $$ - Inserts a literal $ character.
- $&, $0 - Inserts the matched substring.
- $` - Inserts the string that precedes the matched substring (left context).
- $' - Inserts the string that follows the matched substring (right context).
- $n, $nn - Where n/nn are digits referencing an existent capturing group, inserts + backreference n/nn.
- $<n>, ${n} - Where n is a name or any number of digits that reference an existing capturing + group, inserts backreference n.
+ Replacement functions are invoked with three or more arguments: +
- args[0] - The matched substring (corresponds to $& above). If the namespacing feature is off, named backreferences are accessible as properties of this argument.
- args[1..n] - One argument for each backreference (corresponding to $1, $2, etc. above). If the regex has no capturing groups, no arguments appear in this position.
- args[n+1] - The zero-based index of the match within the entire search string.
- args[n+2] - The total string being searched.
- args[n+3] - If the the search pattern is a regex with named capturing groups, the last argument is the groups object. Its keys are the backreference names and its values are the backreference values. If the namespacing feature is off, this argument is not present.
+
[scope] {String}
+ Use 'one' to replace the first match only, or 'all'. Defaults to 'one'. Defaults to 'all' if using a regex with flag g. +

Returns: + {String}
+ New string with one or all matches replaced. +

+ +

Example

// Regex search, using named backreferences in replacement string
+const name = XRegExp('(?<first>\\w+) (?<last>\\w+)');
+XRegExp.replace('John Smith', name, '$<last>, $<first>');
+// -> 'Smith, John'
+
+// Regex search, using named backreferences in replacement function
+XRegExp.replace('John Smith', name, (...args) => {
+  const groups = args[args.length - 1];
+  return `${groups.last}, ${groups.first}`;
+});
+// -> 'Smith, John'
+
+// String search, with replace-all
+XRegExp.replace('RegExp builds RegExps', 'RegExp', 'XRegExp', 'all');
+// -> 'XRegExp builds XRegExps'
+

+ + +

`XRegExp.replaceEach(str, replacements)`

+ +

Performs batch processing of string replacements. Used like XRegExp.replace, but + accepts an array of replacement details. Later replacements operate on the output of earlier + replacements. Replacement details are accepted as an array with a regex or string to search for, + the replacement string or function, and an optional scope of 'one' or 'all'. Uses the XRegExp + replacement text syntax, which supports named backreference properties via $<name> or ${name}.

+ + + + + + + + + + + + +

Parameters:	+ `str` {`String`} + String to search. + + `replacements` {`Array`} + Array of replacement detail arrays. + + +
Returns:	+ {`String`} + New string with all replacements. +

+ +

Example

str = XRegExp.replaceEach(str, [
+  [XRegExp('(?<name>a)'), 'z$<name>'],
+  [/b/gi, 'y'],
+  [/c/g, 'x', 'one'], // scope 'one' overrides /g
+  [/d/, 'w', 'all'],  // scope 'all' overrides lack of /g
+  ['e', 'v', 'all'],  // scope 'all' allows replace-all for strings
+  [/f/g, (match) => match.toUpperCase()]
+]);
+

+ + +

`XRegExp.split(str, separator, [limit])`

+ +

Splits a string into an array of strings using a regex or string separator. Matches of the + separator are not included in the result array. However, if separator is a regex that contains + capturing groups, backreferences are spliced into the result each time separator is matched. + Fixes browser bugs compared to the native String.prototype.split and can be used reliably + cross-browser.

+ + + + + + + + + + + + +

Parameters:	+ `str` {`String`} + String to split. + + `separator` {`RegExp`\|`String`} + Regex or string to use for separating the string. + + [`limit`] {`Number`} + Maximum number of items to include in the result array. + + +
Returns:	+ {`Array`} + Array of substrings. +

+ +

Example

// Basic use
+XRegExp.split('a b c', ' ');
+// -> ['a', 'b', 'c']
+
+// With limit
+XRegExp.split('a b c', ' ', 2);
+// -> ['a', 'b']
+
+// Backreferences in result array
+XRegExp.split('..word1..', /([a-z]+)(\d+)/i);
+// -> ['..', 'word', '1', '..']
+

+ + +

XRegExp.tag([flags])`pattern`

+ +

Requires the XRegExp.build addon, which is bundled in xregexp-all.js.

+ +

Provides tagged template literals that create regexes with XRegExp syntax and flags. The + provided pattern is handled as a raw string, so backslashes don't need to be escaped.

+ +

Interpolation of strings and regexes shares the features of XRegExp.build. Interpolated + patterns are treated as atomic units when quantified, interpolated strings have their special + characters escaped, a leading ^ and trailing unescaped $ are stripped from interpolated + regexes if both are present, and any backreferences within an interpolated regex are + rewritten to work within the overall pattern.

+ + + + + + + + + + + + +

Parameters:	+ + [`flags`] {`String`} + Any combination of XRegExp flags. + + `pattern` {`String`} + Regex pattern as a raw string, optionally with interpolation. + + +
Returns:	+ {`RegExp`} + Extended regular expression object. +

+ +

Example

XRegExp.tag()`\b\w+\b`.test('word'); // -> true
+
+const hours = /1[0-2]|0?[1-9]/;
+const minutes = /(?<minutes>[0-5][0-9])/;
+const time = XRegExp.tag('x')`\b ${hours} : ${minutes} \b`;
+time.test('10:59'); // -> true
+XRegExp.exec('10:59', time).groups.minutes; // -> '59'
+
+const backref1 = /(a)\1/;
+const backref2 = /(b)\1/;
+XRegExp.tag()`${backref1}${backref2}`.test('aabb'); // -> true
+

+ + +

`XRegExp.test(str, regex, [pos], [sticky])`

+ +

Executes a regex search in a specified string. Returns true or false. Optional pos and + sticky arguments specify the search start position, and whether the match must start at the + specified position only. The lastIndex property of the provided regex is not used, but is + updated for compatibility. Also fixes browser bugs compared to the native + RegExp.prototype.test and can be used reliably cross-browser.

+ + + + + + + + + + + + +

Parameters:	+ `str` {`String`} + String to search. + + `regex` {`RegExp`} + Regex to search with. + + [`pos`=`0`] {`Number`} + Zero-based index at which to start the search. + + [`sticky`=`false`] {`Boolean`\|`String`} + Whether the match must start at the specified position only. The string `'sticky'` is accepted as an alternative to `true`. + + +
Returns:	+ {`Boolean`} + Whether the regex matched the provided value. +

+ +

Example

// Basic use
+XRegExp.test('abc', /c/); // -> true
+
+// With pos and sticky
+XRegExp.test('abc', /c/, 0, 'sticky'); // -> false
+XRegExp.test('abc', /c/, 2, 'sticky'); // -> true
+

+ + +

`XRegExp.uninstall(options)`

+ +

Uninstalls optional features according to the specified options. Used to undo the actions of XRegExp.install.

+ + + + + + + + + + + + +

Parameters:	+ `options` {`Object`\|`String`} + Options object or string. + + +
Returns:	+ {`undefined`} + Does not return a value. +

+ +

Example

// With an options object
+XRegExp.uninstall({
+  // Disables support for astral code points in Unicode addons (unless enabled per regex)
+  astral: true,
+
+  // Don't add named capture groups to the `groups` property of matches
+  namespacing: true
+});
+
+// With an options string
+XRegExp.uninstall('astral namespacing');
+

+ + +

`XRegExp.union(patterns, [flags])`

+ +

Returns an XRegExp object that is the union of the given patterns. Patterns can be provided as + regex objects or strings. Metacharacters are escaped in patterns provided as strings. + Backreferences in provided regex objects are automatically renumbered to work correctly within the larger combined pattern. Native + flags used by provided regexes are ignored in favor of the flags argument.

+ + + + + + + + + + + + +

Parameters:	+ `patterns` {`Array`} + Regexes and strings to combine. + + [`flags`] {`String`} + Any combination of XRegExp flags. + + [`options`] {`Object`} + Options object with optional properties: + + `conjunction` {`String`} Type of conjunction to use: `'or'` (default) or `'none'`. + + + +
Returns:	+ {`RegExp`} + Union of the provided regexes and strings. +

+ +

Example

XRegExp.union(['a+b*c', /(dogs)\1/, /(cats)\1/], 'i');
+// -> /a\+b\*c|(dogs)\1|(cats)\2/i
+
+XRegExp.union([/man/, /bear/, /pig/], 'i', {conjunction: 'none'});
+// -> /manbearpig/i
+

+ + +

`XRegExp.version`

+ +

The XRegExp version number as a string containing three dot-separated parts. For example, '2.0.0-beta-3'.

+ + +

`<regexp>.xregexp.source`

+ +

The original pattern provided to the XRegExp constructor. Note that this differs from the <regexp>.source property which holds the transpiled source in native RegExp syntax and therefore can't be used to reconstruct the regex (e.g. <regexp>.source holds no knowledge of capture names). This property is available only for regexes originally constructed by XRegExp. It is null for native regexes copied using the XRegExp constructor or XRegExp.globalize.

+ + +

`<regexp>.xregexp.flags`

+ +

The original flags provided to the XRegExp constructor. Differs from the ES6 <regexp>.flags property in that it includes XRegExp's non-native flags and is accessible even in pre-ES6 browsers. This property is available only for regexes originally constructed by XRegExp. It is null for native regexes copied using the XRegExp constructor or XRegExp.globalize. When regexes originally constructed by XRegExp are copied using XRegExp.globalize, the value of this property is augmented with 'g' if not already present. Flags are listed in alphabetical order.

+ + + + + +

+ +

+ + + + + +

About flags
Explicit capture (n)
Dot matches all (s)
Free-spacing and line comments (x)
Astral (A)

+ +

New flags

+ +

About flags

+ +

XRegExp provides four new flags (n, s, x, A), which can be combined with native flags and arranged in any order. Unlike native flags, non-native flags do not show up as properties on regular expression objects.

+ +

New flags +
- n — Explicit capture
- s — Dot matches all (aka singleline mode) — Added as a native flag in ES2018
- x — Free-spacing and line comments (aka extended mode)
- A — Astral (requires the Unicode Base addon)
+
Native flags +
- g — All matches, or advance lastIndex after matches (global)
- i — Case insensitive (ignoreCase)
- m — ^ and $ match at newlines (multiline)
- u — Handle surrogate pairs as code points and enable \u{…} (unicode) — Requires native ES6 support
- y — Matches must start at lastIndex (sticky) — Requires Firefox 3+ or native ES6 support
+

+ + +

Explicit capture (`n`)

+ +

Specifies that the only valid captures are explicitly named groups of the form (?<name>…). This allows unnamed (…) parentheses to act as noncapturing groups without the syntactic clumsiness of the expression (?:…).

+ +

Annotations

Rationale: Backreference capturing adds performance overhead and is needed far less often than simple grouping. The n flag frees the (…) syntax from its often-undesired capturing side effect, while still allowing explicitly-named capturing groups.
Compatibility: No known problems; the n flag is illegal in native JavaScript regular expressions.
Prior art: The n flag comes from .NET.

+ + +

Dot matches all (`s`)

+ + + + + +

Usually, a dot does not match newlines. However, a mode in which dots match any code unit (including newlines) can be as useful as one where dots don't. The s flag allows the mode to be selected on a per-regex basis. Escaped dots (\.) and dots within character classes ([.]) are always equivalent to literal dots. The newline code points are as follows:

+ +

U+000a — Line feed — \n
U+000d — Carriage return — \r
U+2028 — Line separator
U+2029 — Paragraph separator

+ +

Annotations

Rationale: All popular Perl-style regular expression flavors except JavaScript include a flag that allows dots to match newlines. Without this mode, matching any single code unit requires, e.g., [\s\S], [\0-\uFFFF], [^] (JavaScript only; doesn't work in some browsers without XRegExp), or god forbid (.|\s).
Compatibility: No known problems; the s flag is illegal in native JavaScript regular expressions.
Prior art: The s flag comes from Perl.

+ +

When using XRegExp's Unicode Properties addon, you can match any code point without using the s flag via \p{Any}.

+ + +

Free-spacing and line comments (`x`)

+ +

This flag has two complementary effects. First, it causes most whitespace to be ignored, so you can free-format the regex pattern for readability. Second, it allows comments with a leading #. Specifically, it turns most whitespace into an "ignore me" metacharacter, and # into an "ignore me, and everything else up to the next newline" metacharacter. They aren't taken as metacharacters within character classes (which means that classes are not free-format, even with x), and as with other metacharacters, you can escape whitespace and # that you want to be taken literally. Of course, you can always use \s to match whitespace.

+ +

It might be better to think of whitespace and comments as do-nothing (rather than ignore-me) metacharacters. This distinction is important with something like \12 3, which with the x flag is taken as \12 followed by 3, and not \123. However, quantifiers following whitespace or comments apply to the preceeding token, so x + is equivalent to x+.

+ +

The ignored whitespace characters are those matched natively by \s. ES3 whitespace is based on Unicode 2.1.0 or later. ES5 whitespace is based on Unicode 3.0.0 or later, plus U+FEFF. Following are the code points that should be matched by \s according to ES5 and Unicode 4.0.1–6.1.0 (not yet updated for later versions):

+ +

U+0009 — Tab — \t
U+000A — Line feed — \n
U+000B — Vertical tab — \v
U+000C — Form feed — \f
U+000D — Carriage return — \r
U+0020 — Space
U+00A0 — No-break space
U+1680 — Ogham space mark
U+180E — Mongolian vowel separator
U+2000 — En quad
U+2001 — Em quad
U+2002 — En space
U+2003 — Em space
U+2004 — Three-per-em space
U+2005 — Four-per-em space
U+2006 — Six-per-em space
U+2007 — Figure space
U+2008 — Punctuation space
U+2009 — Thin space
U+200A — Hair space
U+2028 — Line separator
U+2029 — Paragraph separator
U+202F — Narrow no-break space
U+205F — Medium mathematical space
U+3000 — Ideographic space
U+FEFF — Zero width no-break space

+ +

Annotations

Rationale: Regular expressions are notoriously hard to read; adding whitespace and comments makes regular expressions easier to read.
Compatibility: No known problems; the x flag is illegal in native JavaScript regular expressions.
Prior art: The x flag comes from Perl, and was originally inspired by Jeffrey Friedl's pretty-printing of complex regexes.

+ +

Unicode 1.1.5–4.0.0 assigned code point U+200B (ZWSP) to the Zs (Space separator) category, which means that some browsers or regex engines might include this additional code point in those matched by \s, etc. Unicode 4.0.1 moved ZWSP to the Cf (Format) category.

+ +

Unicode 1.1.5 assigned code point U+FEFF (ZWNBSP) to the Zs category. Unicode 2.0.14 moved ZWNBSP to the Cf category. ES5 explicitly includes ZWNBSP in its list of whitespace characters, even though this does not match any version of the Unicode standard since 1996.

+ +

U+180E (Mongolian vowel separator) was introduced in Unicode 3.0.0, which assigned it the Cf category. Unicode 4.0.0 moved it into the Zs category, and Unicode 6.3.0 moved it back to the Cf category.

+ +

JavaScript's \s is similar but not equivalent to \p{Z} (the Separator category) from regex libraries that support Unicode categories, including XRegExp's own Unicode Categories addon. The difference is that \s includes code points U+0009–U+000D and U+FEFF, which are not assigned the Separator category in the Unicode character database.

+ +

JavaScript's \s is nearly equivalent to \p{White_Space} from the Unicode Properties addon. The differences are: 1. \p{White_Space} does not include U+FEFF (ZWNBSP). 2. \p{White_Space} includes U+0085 (NEL), which is not assigned the Separator category in the Unicode character database.

+ +

Aside: Not all JavaScript regex syntax is Unicode-aware. According to JavaScript specs, \s, \S, ., ^, and $ use Unicode-based interpretations of whitespace and newline, while \d, \D, \w, \W, \b, and \B use ASCII-only interpretations of digit, word character, and word boundary. Many browsers get some of these details wrong.

+ +

For more details, see JavaScript, Regex, and Unicode.

+ + +

Astral (`A`)

+ +

Requires the Unicode Base addon.

+ +

By default, \p{…} and \P{…} support the Basic Multilingual Plane (i.e. code points up to U+FFFF). You can opt-in to full 21-bit Unicode support (with code points up to U+10FFFF) on a per-regex basis by using flag A. In XRegExp, this is called astral mode. You can automatically add flag A for all new regexes by running XRegExp.install('astral'). When in astral mode, \p{…} and \P{…} always match a full code point rather than a code unit, using surrogate pairs for code points above U+FFFF.

+ +

// Using flag A to match astral code points
+XRegExp('^\\pS$').test('💩'); // -> false
+XRegExp('^\\pS$', 'A').test('💩'); // -> true
+XRegExp('(?A)^\\pS$').test('💩'); // -> true
+// Using surrogate pair U+D83D U+DCA9 to represent U+1F4A9 (pile of poo)
+XRegExp('(?A)^\\pS$').test('\uD83D\uDCA9'); // -> true
+
+// Implicit flag A
+XRegExp.install('astral');
+XRegExp('^\\pS$').test('💩'); // -> true
+

+ +

Opting in to astral mode disables the use of \p{…} and \P{…} within character classes. In astral mode, use e.g. (\pL|[0-9_])+ instead of [\pL0-9_]+.

+ +

Annotations

Rationale: Astral code point matching uses surrogate pairs and is somewhat slower than BMP-only matching. Enabling astral code point matching on a per-regex basis can therefore be useful.
Compatibility: No known problems; the A flag is illegal in native JavaScript regular expressions.
Prior art: None.

+ + + + + +

+ +

+ + + + + +

What is it?
Features
Performance
Installation and usage
v5 breaking change

+ +

What is it?

+ +

XRegExp provides augmented (and extensible) JavaScript regular expressions. You get modern syntax and flags beyond what browsers support natively. XRegExp is also a regex utility belt with tools to make your grepping and parsing easier, while freeing you from regex cross-browser inconsistencies and other annoyances.

+ +

XRegExp supports all native ES6 regular expression syntax. It supports ES5+ browsers (including Internet Explorer 9+), and you can use it with Node.js or as a RequireJS module. Over the years, many of XRegExp's features have been adopted by new JavaScript standards (named capturing, Unicode properties/scripts/categories, flag s, sticky matching, etc.), so using XRegExp can be a way to extend these features into older browsers. It's released under the MIT License.

+ +

XRegExp lets you write regexes like this:

+ +

// Using named capture and flag x (free-spacing and line comments)
+const date = XRegExp(`(?<year>  [0-9]{4} ) -?  # year
+                      (?<month> [0-9]{2} ) -?  # month
+                      (?<day>   [0-9]{2} )     # day`, 'x');
+

+ +

And do cool stuff like this:

+ +

// Using named backreferences...
+XRegExp.exec('2021-02-23', date).groups.year;
+// -> '2021'
+XRegExp.replace('2021-02-23', date, '$<month>/$<day>/$<year>');
+// -> '02/23/2021'
+
+// Finding matches within matches, while passing forward and returning specific backreferences
+const html = `<a href="http://xregexp.com/api/">XRegExp</a>
+              <a href="http://www.google.com/">Google</a>`;
+XRegExp.matchChain(html, [
+  {regex: /<a href="([^"]+)">/i, backref: 1},
+  {regex: XRegExp('(?i)^https?://(?<domain>[^/?#]+)'), backref: 'domain'}
+]);
+// -> ['xregexp.com', 'www.google.com']
+

+ +

Check out more usage examples on GitHub ⇨.

+ +

Features

+ +

Adds new regex and replacement text syntax, including comprehensive support for named capture.
Adds new regex flags: s, to make dot match all characters; x, for free-spacing and line comments; n, for explicit capture mode; and A, for astral mode (full 21-bit Unicode matching).
Provides a suite of functions that make complex regex processing easier.
Supports addons that add even more new regex syntax, flags, and methods. Offical addons support Unicode, recursive matching, and grammatical patterns.

+ +

Performance

+ +

XRegExp compiles to native RegExp objects. Therefore regexes built with XRegExp perform just as fast as native regular expressions. There is a tiny extra cost when compiling a pattern for the first time.

+ +

Installation and usage

+ +

In browsers (bundle XRegExp with all of its addons):

+ +

<script src="https://unpkg.com/xregexp/xregexp-all.js"></script>
+

+ +

Using npm:

+ +

npm install xregexp
+

+ +

In Node.js:

+ +

const XRegExp = require('xregexp');
+

+ +

Named Capture Breaking Change in XRegExp 5

+ +

XRegExp 5 introduced a breaking change where named backreference properties now appear on the result's groups object (following ES2018), rather than directly on the result. To restore the old handling so you don't need to update old code, run the following line after importing XRegExp:

+ +

XRegExp.uninstall('namespacing');
+

+ +

XRegExp 4.1.0 and later allow introducing the new behavior without upgrading to XRegExp 5 by running XRegExp.install('namespacing').

+ +

Following is the most commonly needed change to update code for the new behavior:

+ +

// Change this
+const name = XRegExp.exec(str, regexWithNamedCapture).name;
+
+// To this
+const name = XRegExp.exec(str, regexWithNamedCapture).groups.name;
+

+ +

See the README on GitHub ⇨ for more examples of using named capture with XRegExp.exec and XRegExp.replace.

+ + + + + +

+ +

+ + + + + +

Named capture
Inline comments
Leading mode modifier
Stricter error handling
Unicode
Replacement text

+ +

New syntax

+ +

Named capture

+ +

XRegExp includes comprehensive support for named capture. Following are the details of XRegExp's named capture syntax:

+ +

Capture: (?<name>…)
Backreference in regex: \k<name>
Backreference in replacement text: $<name>
Backreference stored at: result.groups.name
Backreference numbering: Sequential (i.e., left to right for both named and unnamed capturing groups)
Multiple groups with same name: SyntaxError

+ +

Notes

See additional details and compare to named capture in other regex flavors here: Named capture comparison.
JavaScript added native support for named capture in ES2018. XRegExp support predates this, and it extends this support into pre-ES2018 browsers.
Capture names can use a wide range of Unicode characters (see the definition of RegExpIdentifierName).

+ +

Example

const repeatedWords = XRegExp.tag('gi')`\b(?<word>[a-z]+)\s+\k<word>\b`;
+// Alternatively: XRegExp('\\b(?<word>[a-z]+)\\s+\\k<word>\\b', 'gi');
+
+// Check for repeated words
+repeatedWords.test('The the test data');
+// -> true
+
+// Remove any repeated words
+const withoutRepeated = XRegExp.replace('The the test data', repeatedWords, '${word}');
+// -> 'The test data'
+
+const url = XRegExp(`^(?<scheme> [^:/?]+ ) ://   # aka protocol
+                      (?<host>   [^/?]+  )       # domain name/IP
+                      (?<path>   [^?]*   ) \\??  # optional path
+                      (?<query>  .*      )       # optional query`, 'x');
+
+// Get the URL parts
+const parts = XRegExp.exec('http://google.com/path/to/file?q=1', url);
+// parts -> ['http://google.com/path/to/file?q=1', 'http', 'google.com', '/path/to/file', 'q=1']
+// parts.groups.scheme -> 'http'
+// parts.groups.host   -> 'google.com'
+// parts.groups.path   -> '/path/to/file'
+// parts.groups.query  -> 'q=1'
+
+// Named backreferences are available in replacement functions as properties of the last argument
+XRegExp.replace('http://google.com/path/to/file?q=1', url, (match, ...args) => {
+  const groups = args.pop();
+  return match.replace(groups.host, 'xregexp.com');
+});
+// -> 'http://xregexp.com/path/to/file?q=1'
+

+ +

Regexes that use named capture work with all native methods. However, you need to use XRegExp.exec and XRegExp.replace for access to named backreferences, otherwise only numbered backreferences are available.

+ +

Annotations

Rationale: Named capture can help make regular expressions and related code self-documenting, and thereby easier to read and use.
Compatibility: The named capture syntax is illegal in pre-ES2018 native JavaScript regular expressions and hence does not cause problems. Backreferences to undefined named groups throw a SyntaxError.
Compatibility with deprecated features: XRegExp's named capture functionality does not support the lastMatch property of the global RegExp object or the RegExp.prototype.compile method, since those features were deprecated in JavaScript 1.5.
Prior art: Comes from Python (feature) and .NET (syntax).

+ + +

Inline comments

+ +

Inline comments use the syntax (?#comment). They are an alternative to the line comments allowed in free-spacing mode.

+ +

Comments are a do-nothing (rather than ignore-me) metasequence. This distinction is important with something like \1(?#comment)2, which is taken as \1 followed by 2, and not \12. However, quantifiers following comments apply to the preceeding token, so x(?#comment)+ is equivalent to x+.

+ +

Example

const regex = XRegExp('^(?#month)\\d{1,2}/(?#day)\\d{1,2}/(?#year)(\\d{2}){1,2}', 'n');
+const isDate = regex.test('04/20/2008'); // -> true
+
+// Can still be useful when combined with free-spacing, because inline comments
+// don't need to end with \n
+const regex = XRegExp('^ \\d{1,2}      (?#month)' +
+                      '/ \\d{1,2}      (?#day  )' +
+                      '/ (\\d{2}){1,2} (?#year )', 'nx');
+

+ +

Annotations

Rationale: Comments make regular expressions more readable.
Compatibility: No known problems with this syntax; it is illegal in native JavaScript regular expressions.
Prior art: The syntax comes from Perl. It is also available in .NET, PCRE, Python, Ruby, and Tcl, among other regular expression flavors.

+ + +

Leading mode modifier

+ +

A mode modifier uses the syntax (?imnsuxA), where imnsuxA is any combination of XRegExp flags except g or y. Mode modifiers provide an alternate way to enable the specified flags. XRegExp allows the use of a single mode modifier at the very beginning of a pattern only.

+ +

Example

const regex = XRegExp('(?im)^[a-z]+$');
+regex.ignoreCase; // -> true
+regex.multiline; // -> true
+

+ +

When creating a regex, it's okay to include flags in a mode modifier that are also provided via the separate flags argument. For instance, XRegExp('(?s).+', 's') is valid.

+ +

Flags g and y cannot be included in a mode modifier, or an error is thrown. This is because g and y, unlike all other flags, have no impact on the meaning of a regex. Rather, they change how particular methods choose to apply the regex. In fact, XRegExp methods provide e.g. scope, sticky, and pos arguments that allow you to use and change such functionality on a per-run rather than per-regex basis. Also consider that it makes sense to apply all other flags to a particular subsection of a regex, whereas flags g and y only make sense when applied to the regex as a whole. Allowing g and y in a mode modifier might therefore create future compatibility problems.

+ +

The use of unknown flags in a mode modifier causes an error to be thrown. However, XRegExp addons can add new flags that are then automatically valid within mode modifiers.

+ +

Annotations

Rationale: Mode modifiers allow you to enable flags in situations where a regex pattern can be provided as a string only. They can also improve readability, since flags are read first rather than after the pattern.
Compatibility: No known problems with this syntax; it is illegal in native JavaScript regular expressions.
Compatibility with other regex flavors: Some regex flavors support the use of multiple mode modifiers anywhere in a pattern, and allow extended syntax for unsetting flags via (?-i), simultaneously setting and unsetting flags via (?i-m), and enabling flags for subpatterns only via (?i:…). XRegExp does not support these extended options.
Prior art: The syntax comes from Perl. It is also available in .NET, Java, PCRE, Python, Ruby, and Tcl, among other regular expression flavors.

+ + +

Stricter error handling

+ +

XRegExp makes any escaped letters or numbers a SyntaxError unless they form a valid and complete metasequence or backreference. This helps to catch errors early, and makes it safe for future versions of ES or XRegExp to introduce new escape sequences. It also means that octal escapes are always an error in XRegExp. ES3/5 do not allow octal escapes, but browsers support them anyway for backward compatibility, which often leads to unintended behavior.

+ +

XRegExp requires all backreferences, whether written as \n, \k<n>, or \k<name>, to appear to the right of the opening parenthesis of the group they reference.

+ +

XRegExp never allows \n-style backreferences to be followed by literal numbers. To match backreference 1 followed by a literal 2 character, you can use, e.g., (a)\k<1>2, (?x)(a)\1 2, or (a)\1(?#)2.

+ + +

Unicode

+ +

XRegExp supports matching Unicode categories, scripts, and other properties via addon scripts. Such tokens are matched using \p{…}, \P{…}, and \p{^…}. See XRegExp Unicode addons for more details.

+ +

XRegExp additionally supports the \u{N…} syntax for matching individual code points. In ES6 this is supported natively, but only when using the u flag. XRegExp supports this syntax for code points 0–FFFF even when not using the u flag, and it supports the complete Unicode range 0–10FFFF when using u.

+ + +

Replacement text

+ +

XRegExp's replacement text syntax is used by the XRegExp.replace function. It adds $0 as a synonym of $& (to refer to the entire match), and adds $<n> and ${n} for backreferences to named and numbered capturing groups (in addition to $1, etc.). When the braces syntax is used for numbered backreferences, it allows numbers with three or more digits (not possible natively) and allows separating a backreference from an immediately-following digit (not always possible natively). XRegExp uses stricter replacement text error handling than native JavaScript, to help you catch errors earlier (e.g., the use of a $ character that isn't part of a valid metasequence causes an error to be thrown).

+ +

Following are the special tokens that can be used in XRegExp replacement strings:

+ +

$$ - Inserts a literal $ character.
$&, $0 - Inserts the matched substring.
$` - Inserts the string that precedes the matched substring (left context).
$' - Inserts the string that follows the matched substring (right context).
$n, $nn - Where n/nn are digits referencing an existing capturing group, inserts + backreference n/nn.
$<n>, ${n} - Where n is a name or any number of digits that reference an existent capturing + group, inserts backreference n.

+ +

XRegExp behavior for $<n> and ${n}:

+ +

Backreference to numbered capture, if n is an integer. Use 0 for the entire match. Any number of leading zeros may be used.
Backreference to named capture n, if it exists. Does not overlap with numbered capture since XRegExp does not allow named capture to use a bare integer as the name.
If the name or number does not refer to an existing capturing group, it's an error.

+ +

XRegExp behavior for $n and $nn:

+ +

Backreferences without curly braces end after 1 or 2 digits. Use ${…} for more digits.
$1 is an error if there are no capturing groups.
$10 is an error if there are less than 10 capturing groups. Use ${1}0 instead.
$01 is equivalent to $1 if a capturing group exists, otherwise it's an error.
$0 (not followed by 1-9) and $00 are the entire match.

+ +

For comparison, following is JavaScript's native behavior for $n and $nn:

+ +

Backreferences end after 1 or 2 digits. Cannot use backreference to capturing group 100+.
$1 is a literal $1 if there are no capturing groups.
$10 is $1 followed by a literal 0 if there are less than 10 capturing groups.
$01 is equivalent to $1 if a capturing group exists, otherwise it's a literal $01.
$0 is a literal $0.

+ + + + + +

+ +

+ + + + + +

New syntax » Named capture comparison

+ +

There are several different syntaxes used for named capture. Although Python was the first to implement the feature, most libraries have adopted .NET's alternative syntax.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

Library	Capture	Backref in regex	Backref in replacement	Stored at	Backref numbering	Multiple groups with same name
XRegExp	+ + `(?<name>…)` + `(?P<name>…)`¹ + +	+ + `\k<name>` + +	+ + `$<name>`² + `${name}` + +	+ `result.groups.name`³ +	Sequential	Error⁴
EcmaScript 2018	+ + `(?<name>…)` + +	+ + `\k<name>` + +	+ + `$<name>` + +	+ `result.groups.name` +	Sequential	Error
.NET	+ + `(?<name>…)` + `(?'name'…)` + +	+ + `\k<name>` + `\k'name'` + +	+ + `${name}` + +	`matcher.Groups('name')`	Unnamed first, then named	Backref to last executed participating group
Perl 5.10	+ + `(?<name>…)` + `(?'name'…)` + `(?P<name>…)` + +	+ + `\k<name>` + `\k'name'` + `\k{name}` + `\g{name}` + `(?P=name)` + +	+ + `$+{name}` + +	`$+{name}`	Sequential	Backref to leftmost participating group
PCRE 7	+ + `(?<name>…)` + `(?'name'…)` + `(?P<name>…)` + +	+ + `\k<name>` + `\k'name'` + `\k{name}`⁵ + `\g{name}`⁵ + `(?P=name)` + +	N/A		Sequential	Error
PCRE 4	+ + `(?P<name>…)` + +	+ + `(?P=name)` + +	N/A		Sequential	Error
Python	+ + `(?P<name>…)` + +	+ + `(?P=name)` + +	+ + `\g<name>` + +	`result.group('name')`	Sequential	Error
Oniguruma	+ + `(?<name>…)` + `(?'name'…)` + +	+ + `\k<name>` + `\k'name'` + +	+ + `\k<name>` + `\k'name'` + +	N/A	Unnamed groups default to noncapturing when mixed with named groups	Backref to rightmost participating group. Backrefs within a regex work as alternation of matches of all preceding groups with the same name, in reverse order.
Java 7	+ + `(?<name>…)` + +	+ + `\k<name>` + +	+ + `${name}` + +	`matcher.group('name')`	Sequential	Error
JGsoft	+ + `(?<name>…)` + `(?'name'…)` + `(?P<name>…)` + +	+ + `\k<name>` + `\k'name'` + `(?P=name)` + +	+ + `${name}` + `\g<name>` + +	N/A	.NET and Python styles, depending on capture syntax	Same as .NET
Boost.Regex	+ + `(?<name>…)` + `(?'name'…)` + +	+ + `\k<name>` + `\g{name}` + +	?	?	?	?
RE2	+ + `(?P<name>…)` + +	N/A	?	?	?	?
JRegex	+ + `({name}…)` + +	+ + `{\name}` + +	+ + `${name}` + +	`matcher.group('name')`	?	?

+ +

¹ As of XRegExp 2. Not recommended for use, because support for the (?P<name>…) syntax may be removed in future versions of XRegExp. It is currently supported only to avoid an octal escape versus backreference issue in old Opera. Opera supported the Python named capture syntax natively, but did not provide full named capture functionality.

+ +

² As of XRegExp 4.

+ +

³ As of XRegExp 4.1, when the namespacing option is on (it's on by default in XRegExp 5). Stored at result.name when namespacing is off.
+ Note: Within string.replace callbacks, stored at: arguments[arguments.length - 1].name (with namespacing on) or arguments[0].name (with namespacing off).

+ +

⁴ As of XRegExp 3.

+ +

⁵ As of PCRE 7.2.

+ +

TODO: Add a column comparing the use of capture names in regex conditionals (not supported by XRegExp).

+ + + + + +

+ +

+ + + + + +

Unicode

+ +

Requires the Unicode addons, which are bundled in xregexp-all.js. Alternatively, you can download the individual addon scripts from GutHub. XRegExp's npm package uses xregexp-all.js.

+ +

The Unicode Base script adds base support for Unicode matching via the \p{…} syntax. À la carte token addon packages add support for Unicode categories, scripts, and other properties. All Unicode tokens can be inverted using \P{…} or \p{^…}. Token names are case insensitive, and any spaces, hyphens, and underscores are ignored. You can omit the braces for token names that are a single letter.

+ +

Example

// Categories
+XRegExp('\\p{Sc}\\pN+'); // Sc = currency symbol, N = number
+// Can also use the full names \p{Currency_Symbol} and \p{Number}
+
+// Scripts
+XRegExp('\\p{Cyrillic}');
+XRegExp('[\\p{Latin}\\p{Common}]');
+// Can also use the Script= prefix to match ES2018: \p{Script=Cyrillic}
+
+// Properties
+XRegExp('\\p{ASCII}');
+XRegExp('\\p{Assigned}');
+
+// In action...
+
+const unicodeWord = XRegExp("^\\pL+$"); // L = letter
+unicodeWord.test("Русский"); // true
+unicodeWord.test("日本語"); // true
+unicodeWord.test("العربية"); // true
+
+XRegExp("^\\p{Katakana}+$").test("カタカナ"); // true
+

+ +

// Using flag A to match astral code points
+XRegExp('^\\pS$').test('💩'); // -> false
+XRegExp('^\\pS$', 'A').test('💩'); // -> true
+// Using surrogate pair U+D83D U+DCA9 to represent U+1F4A9 (pile of poo)
+XRegExp('^\\pS$', 'A').test('\uD83D\uDCA9'); // -> true
+
+// Implicit flag A
+XRegExp.install('astral');
+XRegExp('^\\pS$').test('💩'); // -> true
+

+ +

Opting in to astral mode disables the use of \p{…} and \P{…} within character classes. In astral mode, use e.g. (\pL|[0-9_])+ instead of [\pL0-9_]+.

+ + + + + +

XRegExp

The one of a kind JavaScript regular expression library

Table of contents

XRegExp instance properties

API

XRegExp(pattern, [flags])

Example

Regexes, strings, and backslashes

XRegExp.addToken(regex, handler, [options])

Example

XRegExp.build(pattern, subs, [flags])

Example

XRegExp.cache(pattern, [flags])

Example

XRegExp.escape(str)

Example

XRegExp.exec(str, regex, [pos], [sticky])

Example

XRegExp.forEach(str, regex, callback)

Example

XRegExp.globalize(regex)

Example

XRegExp.install(options)

Example

XRegExp.isInstalled(feature)

Example

XRegExp.isRegExp(value)

Example

XRegExp.match(str, regex, [scope])

Example

XRegExp.matchChain(str, chain)

Example

XRegExp.matchRecursive(str, left, right, [flags], [options])

Example

XRegExp.replace(str, search, replacement, [scope])

Example

XRegExp.replaceEach(str, replacements)

Example

XRegExp.split(str, separator, [limit])

Example

XRegExp.tag([flags])`pattern`

Example

XRegExp.test(str, regex, [pos], [sticky])

Example

XRegExp.uninstall(options)

Example

XRegExp.union(patterns, [flags])

Example

XRegExp.version

<regexp>.xregexp.source

<regexp>.xregexp.flags

XRegExp

The one of a kind JavaScript regular expression library

Table of contents

New flags

About flags

Explicit capture (n)

Annotations

Dot matches all (s)

Annotations

Free-spacing and line comments (x)

Annotations

Astral (A)

Annotations

XRegExp

The one of a kind JavaScript regular expression library

What is it?

Features

Performance

Installation and usage

Named Capture Breaking Change in XRegExp 5

XRegExp

The one of a kind JavaScript regular expression library

Table of contents

New syntax

Named capture

Notes

Example

Annotations

Inline comments

`XRegExp(pattern, [flags])`

`XRegExp.addToken(regex, handler, [options])`

`XRegExp.build(pattern, subs, [flags])`

`XRegExp.cache(pattern, [flags])`

`XRegExp.escape(str)`

`XRegExp.exec(str, regex, [pos], [sticky])`

`XRegExp.forEach(str, regex, callback)`

`XRegExp.globalize(regex)`

`XRegExp.install(options)`

`XRegExp.isInstalled(feature)`

`XRegExp.isRegExp(value)`

`XRegExp.match(str, regex, [scope])`

`XRegExp.matchChain(str, chain)`

`XRegExp.matchRecursive(str, left, right, [flags], [options])`

`XRegExp.replace(str, search, replacement, [scope])`

`XRegExp.replaceEach(str, replacements)`

`XRegExp.split(str, separator, [limit])`

`XRegExp.test(str, regex, [pos], [sticky])`

`XRegExp.uninstall(options)`

`XRegExp.union(patterns, [flags])`

`XRegExp.version`

`<regexp>.xregexp.source`

`<regexp>.xregexp.flags`

Explicit capture (`n`)

Dot matches all (`s`)

Free-spacing and line comments (`x`)

Astral (`A`)