Guides/Regular Expressions/Verbs

From J Wiki
Jump to navigation Jump to search
Overview | Verbs | Basic Patterns | J Patterns | Compiling Patterns

The regex definitions are loaded in with the standard library from script ~system/main/regex.ijs, which may be abbreviated as load'regex' or require'regex'. (You might also browse the code under jqt using open'regex'.)

The main verbs are rxmatch, which finds the first occurrence of a match, and rxmatches, which finds all matches. Most of the remaining definitions use these two verbs.

rxmatch

 match=. pattern rxmatch string

Find first match. The result of rxmatch is a table, each row being an index/length pair. The first row has the entire match, and the following rows have matches for each subexpression. If there is no match, _1 0 is returned. For example:

   '(x+)([[:digit:]]+)' rxmatch 'test xxx1234 match'
5 7
5 3
8 4

rxmatches

 matches =. pattern rxmatches string

Find all matches. rxmatches returns a list of tables, with one item per match in the string. The shape of the result is #matches by #subexpr by 2. For example:

   'abc(x|y)' rxmatches 'one abcx two abcy'
 4 4
 7 1

13 4
16 1

rxall

 subs=. pattern rxall string

All substring matches. The result of rxall is a boxed list of all substrings in the right argument which match the pattern.

rxapply

 newstr=. pattern f rxapply string

Apply f to each match. rxapply applies its verb argument to each of the substrings in the right argument which match the pattern in the left argument.

rxcut

 subs=. matches rxcut string

Cut into alternating non-match/match. rxcut returns a boxed list which will match the original string if razed. The items alternate between non-matches and matches, always starting with a non-match.

rxeq

 ismatch=. pattern rxeq string

Returns a 1 if the pattern fully describes the string. (Similar to the -: verb).

For example:

   'abc(x|y)' rxeq 'abcx'
1

rxE

 mask=. pattern rxE string

rxE returns a boolean mask of length #string, with 1's to mark the start of a match. (Similar to the E. verb.)

rxfirst

 sub=. pattern rxfirst string

First substring match. rxfirst returns the substring in the right argument which matches the pattern.

rxfrom

 subs=. matches rxfrom string

Select substrings matched. rxfrom returns a box containing the substrings described by each index/length pair on the left.

rxindex

 index=. pattern rxindex string

Index of match. The result of rxindex is the index of the first match, or #string if none. (Similar to the i. verb.)

rxrplc

 newstr=. (pat;rplcstr) rxrplc string

Replace pat with str. rxrplc replaces substrings in the right argument. The left argument is a boxed list of the pattern and the replacement text.

rxmerge

 newstr=. strs matches rxmerge string

Merge strs into string rxmerge takes a table of matches as an argument, and returns a verb which merges the boxed strings in the left argument into those positions on the right. (Similar to the } adverb).

Notes

  1. The rmatch and rxmatches verbs return either a single or list of matches respectively, with each match being a table of index/length pairs for the match and each subexpression. Other verbs which use the result of rxmatch or rxmatches typically use only the first row for each match, which represents the entire match.
  2. A pattern is usually a simple character string. However, in some cases where the result of a function s a numeric array of matches (e.g. for rxmatch and rxmatches), then the pattern can be given as a pair: character string;indices. In this case, the numeric array is subset to the indices.

For example, the pattern '(x+)([[:digit:]]+)' matches one or more letters 'x', followed by a string of digits, with both the 'x's and the digits being a subexpressions of the pattern. Each match will be returned as a three-row table, describing the entire match, just the 'x's, and just the digits.

   pat=. rxcomp '(x+)([[:digit:]]+)'
   str=. 'just one xxx1234 match here'
   pat rxmatches str
 9 7
 9 3
12 4

   (pat;1 2) rxmatches str   NB. just the 'x's and digits
 9 3
12 4

   pat |. rxapply str        NB. reverse the whole match
just one 4321xxx match here

   (pat;,2) |. rxapply str   NB. reverse just the digits
just one xxx4321 match here

Examples

   pat=. '[[:alpha:]][[:alnum:]_]*'  NB. pattern for J name
   str=. '3,foo3=.23,j42=.123,123'   NB. a sample string
   pat rxmatch str                   NB. find at index 2, length 4
2 4

   pat=. '([[:alpha:]][[:alnum:]_]*) *=[.;]'   NB. subexp is name in assign
   pat rxmatch str                             NB. pattern at 2/6; name at 2/4
2 6
2 4

   pat rxmatches str       NB. find all matches
 2 6
 2 4

11 5
11 3

   pat rxfirst str         NB. first matching substring
foo3=.

   pat rxall str           NB. all matching substrings
┌──────┬─────┐
foo3=.j42=.
└──────┴─────┘
   pat&rxindex&> '  foo=.10';'nothing at all'   NB. index of match
2 14

   pat rxE str                 NB. mask over matches
0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0

   '[[:digit:]]*' rxeq '2342342'   NB. test for exact match
1
   '[[:digit:]]*' rxeq '2342 342'
0

   pat rxmatch str             NB. entire and subexpression match
2 6
2 4
   pat rxmatches str           NB. all matches
 2 6
 2 4

11 5
11 3

   (pat rxmatches str) rxfrom str  NB. rxfrom selects substrings
┌──────┬────┐
foo3=.foo3
├──────┼────┤
j42=. j42 
└──────┴────┘

   ]m=.(pat;,0) rxmatches str   NB. entire matches only
 2 6
11 5
   m rxcut str                   NB. return alternating non-match/match boxes
┌──┬──────┬───┬─────┬───────┐
3,foo3=.23,j42=.123,123
└──┴──────┴───┴─────┴───────┘

   ('first';'second') m rxmerge str  NB. replace matches
3,first23,second123,123

   pat |. rxapply str        NB. reverse each match
3,.=3oof23,.=24j123,123

   (pat;,1) |. rxapply str   NB. reverse just name part of match
3,3oof=.23,24j=.123,123

   rxcomp '[wrong'         NB. a bad pattern
|unbalanced [] at offset 6     : rxcomp_jregex_
|   (rxerror'')    13!:8[12