Phrases/Strings
String operations are collected here.
strjoin
Alternates formatted (":) items of y with string in x.
NB. join boxed list y with x; see strsplit strjoin=: #@[ }. <@[ ;@,. ]
Examples
'/' strjoin ;:'one two three' one/two/three LF strjoin ',' <@strjoin"1 ":each i.2 4 0,1,2,3 4,5,6,7 ]a=. ']}>',~'<{[',(']}',LF,'{[') strjoin '][' <@strjoin"1 ": each i.2 4 <{[0][1][2][3]} {[4][5][6][7]}> require'strings' a rplc cut'[ <td> ] </td> { <tr> } </tr> < <table> > </table> ' <table><tr><td>0</td><td>1</td><td>2</td><td>3</td></tr> <tr><td>4</td><td>5</td><td>6</td><td>7</td></tr></table>
strsplit
Simpler form of split which does not track non-overlapping strings.
NB. strsplit y by substring x; see join strsplit=: #@[ }.each [ (E. <;.1 ]) ,
Examples
',' strjoin ' ' strsplit '1 2 3 one two three' 1,2,3,one,two,three '<' strjoin ' of ' strsplit 'a of b of c' a<b<c '[' }.@strsplit ']' (strjoin ,&a:) '[a';'[b';'[c' +--+--+--+ |a]|b]|c]| +--+--+--+
nossplit
Splitting with account for non-overlapping strings. Good for repeating separator like ||.
NB. Non-overlapping variant of E. nos=: i.@#@] e. #@[ ({~^:a:&0@(,&_1)@(]I.+) { _1,~]) I.@E. NB. split y by non-overlapping substrings x nossplit=: #@[ }.each [ (nos <;.1 ]) ,
Examples
'||' nos 'abc||def||cd' 0 0 0 1 0 0 0 0 1 0 0 0 '||' nossplit 'abc||def|||cd' +---+---+---+ |abc|def||cd| +---+---+---+ '||' strsplit 'abc||def|||cd' +---+---++--+ |abc|def||cd| +---+---++--+
See Essays/Non-Overlapping Substrings for the detail of nos.
cut and dltb
Cutting(or splitting) texts with delimiters is used often, and deleting the leading and trailing blanks are useful in this case. When the delimiter is only one letter, you can use cut and dltb from strings library.
a=. 'Ken Iverson, Roger Hui, Eric Iverson, Clifford Reiter, Henry Rich' ,. /:~ dltb each ',' cut a +---------------+ |Clifford Reiter| +---------------+ |Eric Iverson | +---------------+ |Henry Rich | +---------------+ |Ken Iverson | +---------------+ |Roger Hui | +---------------+
Of course you may simply use ;.
,. <@dltb;._1 ',',a +---------------+ |Ken Iverson | +---------------+ |Roger Hui | +---------------+ |Eric Iverson | +---------------+ |Clifford Reiter| +---------------+ |Henry Rich | +---------------+
Alphabets
LATIN_UC=:'ABCDEFGHIJKLMNOPQRSTUVWXYZ' LATIN_LC=:'abcdefghijklmnopqrstuvwxyz' RUSSIAN_UC=:'АБВГДЕЁЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ' RUSSIAN_LC=:'абвгдеёжзийклмнопрстуфхцчшщъыьэюя'
Slicing with Regex
Using regular expressions to define tokens is very convenient and powerful.
Start with loading the Regex library, and defining additional functions.
load 'regex' rxgroups=: }.@rxmatch rxfrom ] NB. like rxall but for match groups
A set of different position-specific tokens.
A leading non-space followed by any space, and the rest.
'(\S+)\s+(.+)' rxgroups '12 3456 789' +--+--------+ |12|3456 789| +--+--------+
A set of same type tokens.
Space separated tokens.
'\S+' rxall '12 3456 789' +--+----+---+ |12|3456|789| +--+----+---+
See Also
- Guides/Strings string and text manipulation resources
- Guides/Parsing analyzing characters