Scripts/Regular Expressions Substitution
Originally published at http://olegykj.sourceforge.net/ as regexs.ijs
Regular expressions extended for Perl/awk/sed-like substitution. Features an option to process executable replacements.
In Unix shell tools sed and perl, there is a mechanism to describe search pattern and substitution in one operation. Often both patterns take advantage of sub-patterns to manipulate with string fragments. This bring convenience to often used text transformations such as reordering words, removing subparts, etc.
Although the same result could be achieved programmatically with existing regex operations, it would involve additional low-level logic and familiarity with the J regex API of numerous verbs. So the proposed rxs tools provides a high-level operations without going into J implementation details.
The rxs verb also features a powerful e (execute) option, that applies the specified J expression to each match and merges the results.
With appropriate use of the rxs verb, it can satisfy the need of most regex use cases and replace the need for using the low-level verbs. [{{#file: ""}} Download script: ]
NB. Regular expressions extended for Perl-like substitution NB. Version 3 for j601+. NB. Author Oleg Kobchenko. Originally http://olegykj.sourceforge.net/ NB. to do: \xHH require'strings regex' coclass 'jregex' NB. ========================================================= NB.*rxmain v return ()-less mat from ()-ful pattern rxmain=: ,:"1@:({."2) NB. ========================================================= NB.*rxs v make Perl-like s/PAT/REPL/OPT substitution NB. use: NB. '/PAT/REPL/OPT' rsx str NB. PAT - the usual POSIX pattern used in J regex NB. REPL - the POSIX sed-like replacement string NB. \1-\9 corresponding parens content NB. \0 or & whole match NB. \_ whole match in string representation (for 'e') NB. \t TAB \n LF NB. \r CR \f FF NB. \other other NB. OPT - any of 'ige' for ignore case, global, execute NB. see: examples RBEGE=: <;._1' \n LF \r CR \t TAB \f FF' RBEGX=: '\n';LF;'\r';CR;'\t';TAB;'\f';FF rxs=: 4 : 0 esc=. {.x 'pat rpl opt'=. 3{. <;._1 x str=. tolower^:('i'e. opt) y pat=. tolower^:('i'e. opt) pat mat=. pat rxmatch`rxmatches@.('g'e. opt) str if. (0=#mat) +. _1=1{.,mat do. y return. end. subs=. ,:^:(2: > #@$) mat rxfrom y mat=. rxmain mat newr=. '' if. 'e' e. opt do. r=. rpl rplc '\\';esc;RBEGE for_i. i.#mat do. pairs=. '&';5!:5<'t' [ t=. >(<i,0){subs pairs=. pairs,'\_';'('&,@(,&')')@(5!:5) <'t' [ t=. i{subs for_j. i.{:$subs do. pairs=. pairs, ('\',":j);5!:5<'t' [ t=. >(<i,j){subs end. pairs=. pairs,'\';'';esc;'\' re=. r rplc pairs for_j. i.+/'e'E.opt do. re=. (,@":@:".) :: ('__'"_) re end. newr=. newr,<re end. else. r=. rpl rplc '\\';esc;RBEGX for_i. i.#mat do. pairs=. '&';>(<i,0){subs for_j. i.{:$subs do. pairs=. pairs, ('\',":j);>(<i,j){subs end. pairs=. pairs,'\';'';esc;'\' newr=. newr,<r rplc pairs end. end. newr mat rxmerge y ) rxs_z_=: rxs_jregex_ Note 'Examples' NB. run indented lines and compare results «examples» )
[{{#file: ""}} Download script: ]
str=. 'hello Mr John Dow hi miz Sarah Bernard hi mr none' '/(mr|miz) ([a-z]+) ([a-z]+) */\3, \2 (\1) -- was: \0\n' rxs str hello Mr John Dow hi miz Sarah Bernard hi mr none '/(mr|miz) ([a-z]+) ([a-z]+) */\3, \2 (\1) -- was: \0\n/i' rxs str hello Dow, John (Mr) -- was: Mr John Dow hi miz Sarah Bernard hi mr none '/(mr|miz) ([a-z]+) ([a-z]+) */\3, \2 (\1) -- was: \0\n/ig' rxs str hello Dow, John (Mr) -- was: Mr John Dow hi Bernard, Sarah (miz) -- was: miz Sarah Bernard hi mr none p1=. '!(mr|miz) (([a-z]+) )?([a-z]+) *' r1=. '!\4,s,(":#\4),s, \3, s,\1,s,'' used: '',(":+/a:~:\_),\n [ s=.''/''' o1=. '!gie' (p1,r1,o1) rxs str hello Dow/3/John/Mr/ used: 5 hi Bernard/7/Sarah/miz/ used: 5 hi none/4//mr/ used: 3 '/([^ ]+) ([^ ]+)/\2,''-'',\1/e' rxs 'q''123 z456' z456-q'123 '/([^ ]+) ([^ ]+)/\2,''-'',\1/ee' rxs '123 456' NB. multiple /e 333 '/(\w+) (\w+) (.*)/\2, \1 \3' rxs 'Henry Rich xxx' Rich, Henry xxx
See Also
- Regex links in Parsing Guide
- perl s/// operator Google search
- Regexp Quote-Like Operators in perlop
- Swap first/last names user feedback
Contributed by Oleg Kobchenko