Essays/Non-Overlapping Substrings
x E. y finds all occurrences of x as a substring in y . For example:
x=: 'our' y=: 'fourscore and ten years ago, our fathers' x E. y 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 y ,: ' 1' {~ x E. y fourscore and ten years ago, our fathers 1 1
Substrings can overlap if a non-trivial prefix of x occurs as a suffix of x . (A trivial prefix is the empty string or the entire of x .) Suppose only the non-overlapping substrings (NOS) are to be found? In the following example, the only 1s we want are those at indices 2 11 24 33 46 .
x=: 'abcabcabc' y=: '--',60$(20$x),'--' (|:":,.i.#y) , y ,: ' 1' {~ x E. y 1111111111222222222233333333334444444444555555555566 01234567890123456789012345678901234567890123456789012345678901 --abcabcabcabcabcabcab--abcabcabcabcabcabcab--abcabcabcabcabca 1 1 1 1 1 1 1 1 1 1 1
Let s=: x I.@E. y be the indices of the (possibly overlapping) occurrences of x in y . If j in s is the index of an NOS, the index of the next NOS can be found through the use of the dyad I. (interval index).
] s=: x I.@E. y 2 5 8 11 24 27 30 33 46 49 52 ] i=: s I. s+#x 3 4 4 4 7 8 8 8 11 11 11 s ,: i{s,_1 2 5 8 11 24 27 30 33 46 49 52 11 24 24 24 33 46 46 46 _1 _1 _1
If 2 is (the index of) an NOS, then the next NOS is 11; if 5 is an NOS, the next NOS is 24; if 8 is an NOS, the next NOS is 24; etc. Since the first element of s is the index of an NOS, the indices of all the NOSes can be found by following the path that starts there; that is, by computing the transitive closure of that first element.
2 5 8 11 24 27 30 33 46 49 52 _1 11 24 24 24 33 46 46 46 _1 _1 _1 _1
nos=: 4 : 0 s=. x I.@E. y i=. s I. s+#x (i.#y) e. (s,_1) {~ (i,_1) {~^:a: 0 ) y ,: ' 1'{~ x nos y --abcabcabcabcabcabcab--abcabcabcabcabcabcab--abcabcabcabcabca 1 1 1 1 1 t ,: ' 1'{~ 'aaa' nos t=: 59$'a' aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t ,: ' 1'{~ 'no overlaps' nos t=: 59$'_no overlaps ' _no overlaps _no overlaps _no overlaps _no overlaps _no ove 1 1 1 1
The function can be written tacitly as follows:
tc =: ,&_1 {~^:a: 0: nos1=: i.@#@] e. #@[ (tc@(]I.+) { _1,~]) I.@E.
See also
Contributed by Roger Hui.