Guides/Regular Expressions/Verbs
The regex definitions are loaded in with the standard library from script ~system/main/regex.ijs, which may be abbreviated as load'regex'
or require'regex'
. (You might also browse the code under jqt using open'regex'
.)
The main verbs are rxmatch, which finds the first occurrence of a match, and rxmatches, which finds all matches. Most of the remaining definitions use these two verbs.
rxmatch
match=. pattern rxmatch string
Find first match. The result of rxmatch is a table, each row being an index/length pair. The first row has the entire match, and the following rows have matches for each subexpression. If there is no match, _1 0 is returned. For example:
'(x+)([[:digit:]]+)' rxmatch 'test xxx1234 match'
5 7
5 3
8 4
rxmatches
matches =. pattern rxmatches string
Find all matches. rxmatches returns a list of tables, with one item per match in the string. The shape of the result is #matches by #subexpr by 2. For example:
'abc(x|y)' rxmatches 'one abcx two abcy'
4 4
7 1
13 4
16 1
rxall
subs=. pattern rxall string
All substring matches. The result of rxall is a boxed list of all substrings in the right argument which match the pattern.
rxapply
newstr=. pattern f rxapply string
Apply f to each match. rxapply applies its verb argument to each of the substrings in the right argument which match the pattern in the left argument.
rxcut
subs=. matches rxcut string
Cut into alternating non-match/match. rxcut returns a boxed list which will match the original string if razed. The items alternate between non-matches and matches, always starting with a non-match.
rxeq
ismatch=. pattern rxeq string
Returns a 1 if the pattern fully describes the string. (Similar to the -: verb).
For example:
'abc(x|y)' rxeq 'abcx'
1
rxE
mask=. pattern rxE string
rxE returns a boolean mask of length #string, with 1's to mark the start of a match. (Similar to the E. verb.)
rxfirst
sub=. pattern rxfirst string
First substring match. rxfirst returns the substring in the right argument which matches the pattern.
rxfrom
subs=. matches rxfrom string
Select substrings matched. rxfrom returns a box containing the substrings described by each index/length pair on the left.
rxindex
index=. pattern rxindex string
Index of match. The result of rxindex is the index of the first match, or #string if none. (Similar to the i. verb.)
rxrplc
newstr=. (pat;rplcstr) rxrplc string
Replace pat with str. rxrplc replaces substrings in the right argument. The left argument is a boxed list of the pattern and the replacement text.
rxmerge
newstr=. strs matches rxmerge string
Merge strs into string rxmerge takes a table of matches as an argument, and returns a verb which merges the boxed strings in the left argument into those positions on the right. (Similar to the } adverb).
Notes
- The rmatch and rxmatches verbs return either a single or list of matches respectively, with each match being a table of index/length pairs for the match and each subexpression. Other verbs which use the result of rxmatch or rxmatches typically use only the first row for each match, which represents the entire match.
- A pattern is usually a simple character string. However, in some cases where the result of a function s a numeric array of matches (e.g. for rxmatch and rxmatches), then the pattern can be given as a pair: character string;indices. In this case, the numeric array is subset to the indices.
For example, the pattern '(x+)([[:digit:]]+)' matches one or more letters 'x', followed by a string of digits, with both the 'x's and the digits being a subexpressions of the pattern. Each match will be returned as a three-row table, describing the entire match, just the 'x's, and just the digits.
pat=. rxcomp '(x+)([[:digit:]]+)'
str=. 'just one xxx1234 match here'
pat rxmatches str
9 7
9 3
12 4
(pat;1 2) rxmatches str NB. just the 'x's and digits
9 3
12 4
pat |. rxapply str NB. reverse the whole match
just one 4321xxx match here
(pat;,2) |. rxapply str NB. reverse just the digits
just one xxx4321 match here
Examples
pat=. '[[:alpha:]][[:alnum:]_]*' NB. pattern for J name
str=. '3,foo3=.23,j42=.123,123' NB. a sample string
pat rxmatch str NB. find at index 2, length 4
2 4
pat=. '([[:alpha:]][[:alnum:]_]*) *=[.;]' NB. subexp is name in assign
pat rxmatch str NB. pattern at 2/6; name at 2/4
2 6
2 4
pat rxmatches str NB. find all matches
2 6
2 4
11 5
11 3
pat rxfirst str NB. first matching substring
foo3=.
pat rxall str NB. all matching substrings
┌──────┬─────┐
│foo3=.│j42=.│
└──────┴─────┘
pat&rxindex&> ' foo=.10';'nothing at all' NB. index of match
2 14
pat rxE str NB. mask over matches
0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
'[[:digit:]]*' rxeq '2342342' NB. test for exact match
1
'[[:digit:]]*' rxeq '2342 342'
0
pat rxmatch str NB. entire and subexpression match
2 6
2 4
pat rxmatches str NB. all matches
2 6
2 4
11 5
11 3
(pat rxmatches str) rxfrom str NB. rxfrom selects substrings
┌──────┬────┐
│foo3=.│foo3│
├──────┼────┤
│j42=. │j42 │
└──────┴────┘
]m=.(pat;,0) rxmatches str NB. entire matches only
2 6
11 5
m rxcut str NB. return alternating non-match/match boxes
┌──┬──────┬───┬─────┬───────┐
│3,│foo3=.│23,│j42=.│123,123│
└──┴──────┴───┴─────┴───────┘
('first';'second') m rxmerge str NB. replace matches
3,first23,second123,123
pat |. rxapply str NB. reverse each match
3,.=3oof23,.=24j123,123
(pat;,1) |. rxapply str NB. reverse just name part of match
3,3oof=.23,24j=.123,123
rxcomp '[wrong' NB. a bad pattern
|unbalanced [] at offset 6 : rxcomp_jregex_
| (rxerror'') 13!:8[12