Puzzles/Word Frequencies
< Puzzles
Jump to navigation
Jump to search
Given a list of words, find the top m most frequent words and the corresponding frequencies.
Solution
The dyad x u/.y key is useful for such problems. It applies u to items of y that have the same keys as indicated by items of x . For example:
# /.~y NB. the word frequencies correponding to ~.y {./.~y NB. the unique words, i.e. ~.y ({. , <@#)/.~ y NB. the unique words and the corresponding frequencies
For the actual problem, we will use y (#,{.)/. i.#y , which gives a 2-column table of the frequencies and indices.
wordfreq=: 4 : 0 'c i'=. |: x. {. \:~ y (#,{.)/. i.#y (i{y) ,. <"0 c )
For example:
sample=: 3 : 0 a=. 'abcdefghijklmnopqrstuvwxyz' c=. 3 5 7 9 n=. 10^>.-:c x=. ; <"1&.> (>.1e4%n)#&.> (n,&.>c) (a {~ ?@$)&.> #a x {~ y ?@$ #x ) x=: sample 1e6 $ x 1000000 8 {. x ┌───────┬─────────┬─────────┬─────────┬─────────┬─────────┬───┬─────┐ │wghgnkv│xaubfuowg│vlqwuvaji│viajpaaih│qcbamjdfh│dftavyazm│sjj│qjtws│ └───────┴─────────┴─────────┴─────────┴─────────┴─────────┴───┴─────┘ 10 wordfreq x ┌───┬───┐ │sfn│832│ ├───┼───┤ │bgp│819│ ├───┼───┤ │yhg│818│ ├───┼───┤ │abd│815│ ├───┼───┤ │ctz│814│ ├───┼───┤ │wkt│813│ ├───┼───┤ │eim│810│ ├───┼───┤ │ovd│808│ ├───┼───┤ │rix│807│ ├───┼───┤ │yrc│806│ └───┴───┘
Contributed by Roger Hui.