Essays/Substring Replacement
Substring Replacement
x replace p;q replaces all occurrences of p by q in x .
replace=: 4 : 0 'p q'=. y j=. p nosindx x if. ''-:j do. x return. end. d=. p-&#q if. d do. b=. 1$~#x if. d>0 do. b=. 0 (,j+/i.d)} b else. b=. (1 j. -d) j} b end. x=. b # x j=. j-d*i.#j end. k=. <^:2"_1 j+/i.#q x=. q k} x ) nosindx=: 4 : 0 s=. x I.@E. y i=. s I. s+#x (i.&_1 {. ]) (s,_1) {~ (i,_1) {~^:a: 0 )
replace uses nosindex , adapted from Non-Overlapping Substrings to find the indices of non-overlapping substring occurrences. nosindex is equivalent to I.@E. if substring occurrences do not overlap (i.e. if no non-trivial prefix of p is a suffix of p ).
Examples of replace in use:
x=: 'of the people, by the people, for the people' x replace 'the';'a' of a people, by a people, for a people x replace 'the';'one' of one people, by one people, for one people x replace 'the';'them there' of them there people, by them there people, for them there people
Program Logic
Let d=. p-&#q be the difference between the length of p (the target string) and the length of q (the replacement string). The processing depends on whether d is positive, zero, or negative:
- If d is positive ( p is longer than q ), x is first shrinked to drop excessive elements and then amended by q .
- If d is zero ( p has the same length as q ), x is simply amended by q .
- If d is negative ( p is shorter than q ), x is first expanded to make room and then amended by q .
Let j=. x nosindex p be the indices of non-overlapping occurrences of p in x . If p and q differ in length, then j should be adjusted: j=. j-d*i.#j . The indices to be amended are k=. j+/i.#q . An additional post-processing to k is required to avoid scatter-modify, so the final expression will be k=. <^:2"_1 j+/i.#q .
The following examples illustrate the three cases:
x=: 'of the people, by the people, for the people' p=: 'the' ] j=: p nosindx x 3 18 34
p is longer than q (d>0)
q=: 'a' ] d=: p-&#q 2 j 3 18 34 ,j+/i.d 3 4 18 19 34 35 x ,: '.1' {~ 0 (,j+/i.d)} b=. 1$~#x of the people, by the people, for the people 111..1111111111111..11111111111111..11111111 ] x=. (0 (,j+/i.d)} b) # x of e people, by e people, for e people ] j=. j-d*i.#j 3 16 30 ] k=. <^:2"_1 j +/ i. # q +---+----+----+ |+-+|+--+|+--+| ||3|||16|||30|| |+-+|+--+|+--+| +---+----+----+ ] x=. q k} x of a people, by a people, for a people
p has the same length as q (d=0)
q=: 'one' ] d=: p-&#q 0 j 3 18 34 ] k=. <^:2"_1 j +/ i. # q +-------+----------+----------+ |+-----+|+--------+|+--------+| ||3 4 5|||18 19 20|||34 35 36|| |+-----+|+--------+|+--------+| +-------+----------+----------+ ] x=. q k} x of one people, by one people, for one people
p is shorter than q (d<0)
q=: 'them there' ] d=: p-&#q _7 j 3 18 34 1 j. -d 1j7 (t # x) ,: '.1' {~ 1 #~ t=. (1 j. - d) j} b=. 1$~#x of t he people, by t he people, for t he people 1111.......111111111111111.......1111111111111111.......111111111 ] x=. t # x of t he people, by t he people, for t he people ] j=. j - d * i. # j 3 25 48 ] k=. <^:2"_1 j +/ i. # q +------------------------+-------------------------------+-------------------------------+ |+----------------------+|+-----------------------------+|+-----------------------------+| ||3 4 5 6 7 8 9 10 11 12|||25 26 27 28 29 30 31 32 33 34|||48 49 50 51 52 53 54 55 56 57|| |+----------------------+|+-----------------------------+|+-----------------------------+| +------------------------+-------------------------------+-------------------------------+ ] x=. q k} x of them there people, by them there people, for them there people
Non-Overlapping Occurrences
As stated above, nosindx finds the indices of non-overlapping substring occurrences. It is the same as I.@E. if substring occurrences do not overlap. What if they do overlap?
] x=: 40$'abcabcabc_' abcabcabc_abcabcabc_abcabcabc_abcabcabc_ x replace 'abcabc';'syzygy' syzygyabc_syzygyabc_syzygyabc_syzygyabc_ ] j0=: 'abcabc' I.@E. x 0 3 10 13 20 23 30 33 ] j1=: 'abcabc' nosindx x 0 10 20 30 x abcabcabc_abcabcabc_abcabcabc_abcabcabc_ '1' j0} '.'$~#x 1..1......1..1......1..1......1..1...... '1' j1} '.'$~#x 1.........1.........1.........1......... 'syzygy' (j0+/i.6)} x syzsyzygy_syzsyzygy_syzsyzygy_syzsyzygy_ 'syzygy' (j1+/i.6)} x syzygyabc_syzygyabc_syzygyabc_syzygyabc_ x replace 'abcabc';'syzygy' syzygyabc_syzygyabc_syzygyabc_syzygyabc_
Contributed by Roger Hui, with further contribution by Igor Zhuravlov.