ShareMyScreen/AdventOfCode/2022/07/NoSpaceLeftOnDevice
Now a problem of substance. I will have to write a program for this one, so I use File>New temp in the menu bar to open a script. I start with the commentary:
NB. AOC 2022 Problem 7 proclog =: {{ NB. x is the threshold, y is the boxed lines from the log NB. Process lines one by one, producing list of path;filename;length NB. Remove duplicate files NB. Calculate size of each directory NB. Return the total of sizes below threshold }}
To begin with I want info on the individual files. I can't hope to execute the lines from the log, because they might not even be valid J words. I will have to process them as text. The only lines that matter are the cd lines and the lines with size and name. I will store the path as a list of boxes, updated by cd.
First I'll read the lines in and box them:
] lines =. < onaoclines +------+----+-----+--------------+-------------+-- |$ cd /|$ ls|dir a|14848514 b.txt|8504156 c.dat|di... +------+----+-----+--------------+-------------+--
I'll write a loop to process the lines. How dirty is the input? I will remove extra blanks. Suppose there is no cd to start with? I'll make that an error. Could a directory be listed twice? Just in case I'll keep all the info so I can delete duplicates.
require 'strings' NB. AOC 2022 Problem 7 proclog =: {{ NB. x is the threshold, y is the boxed lines from the log NB. Process lines one by one, producing list of path;filename;length flist =. 0 3$a: NB. will be path;filename;length for_line. y do. ltext =. deb > line NB. text of line with surplus blanks removed NB. Handle the forms we recognize: $ cd dir and number filename if. '$ cd' -: 4 {. ltext do. dir =. 5 }. ltext NB. the directory string if. dir -: ,'/' do. currdir =. 0$a: NB. reset to top level elseif. dir -: '..' do. currdir =. }: currdir NB. back up one level else. currdir =. currdir , <dir NB. new dir: down one level end. elseif. '0123456789' e.~ {. ltext do. NB. size filename 'num name' =. (ltext i. ' ') ({. ; }.) ltext NB. size and filename (with leading space) flist =. flist , currdir ; name ; 0 ". num NB. add entry for file end. NB. other forms ignored end. NB. Remove duplicate files NB. Calculate size of each directory NB. Return the total of sizes below threshold flist }}
I'll test that.
] f =. proclog lines +-----+------+--------+ | | b.txt|14848514| +-----+------+--------+ | | c.dat|8504156 | +-----+------+--------+ |+-+ | f |29116 | ||a| | | | |+-+ | | | +-----+------+--------+ |+-+ | g |2557 | ||a| | | | |+-+ | | | +-----+------+--------+ |+-+ | h.lst|62596 | ||a| | | | |+-+ | | | +-----+------+--------+ |+-+-+| i |584 | ||a|e|| | | |+-+-+| | | +-----+------+--------+ |+-+ | j |4060174 | ||d| | | | |+-+ | | | +-----+------+--------+ |+-+ | d.log|8033020 | ||d| | | | |+-+ | | | +-----+------+--------+ |+-+ | d.ext|5626152 | ||d| | | | |+-+ | | | +-----+------+--------+ |+-+ | k |7214296 | ||d| | | | |+-+ | | | +-----+------+--------+
That matches the problem's result, so I will move on to calculating the size of each directory.
I need to combine the sizes of all the files in each directory, then move up one level and combine those into the higher-level directory, and so on. I'll have to be careful not to apply a file more than once. Maybe I'll remove files after they are processed, or perhaps loop, processing only the files at the bottom of the directory tree...
No, no, there's a better way. Every file counts in each of its containing directories. I'll simply create a record for each containing directory and combine all the files. Each file will create many records, but on this size of problem that's OK.
The list of directories for a file is all the prefixes of the file directories: that's a job for prefix: (u/. y). Let me refresh myself about that:
<\ ;:'a b c' +---+-----+-------+ |+-+|+-+-+|+-+-+-+| ||a|||a|b|||a|b|c|| |+-+|+-+-+|+-+-+-+| +---+-----+-------+
[Good that I checked: prefix is u\, not u/. as I had misremembered.] I will need to add on an empty list too, to include the top directory. I'll bench-test the lines that add up the directories:
f =. 0 2 {"1 ~. f NB. discard duplicates and lose the filename ] fdirs =. (a: , <\)&.> {."1 f NB. all directories for each name +--+--+------+------+------+------------+------+------+------+------+ |++|++|++---+|++---+|++---+|++---+-----+|++---+|++---+|++---+|++---+| |||||||||+-+||||+-+||||+-+||||+-+|+-+-+||||+-+||||+-+||||+-+||||+-+|| |++|++||||a||||||a||||||a||||||a|||a|e||||||d||||||d||||||d||||||d||| | | |||+-+||||+-+||||+-+||||+-+|+-+-+||||+-+||||+-+||||+-+||||+-+|| | | |++---+|++---+|++---+|++---+-----+|++---+|++---+|++---+|++---+| +--+--+------+------+------+------------+------+------+------+------+
There they are, the multiple paths for each file. They will be easy to combine: I'll use the workhorse key x u/. y where x is the directories (i. e. ; fdirs) and y is the list of file lengths, repeated for each directory. The verb u will be +/ y, which will add up all the lengths for each directory separately.
(; fdirs) +//. (#@> fdirs) # > {:"1 f 48381165 94853 584 24933642
That's the right answer and I can finish the script:
require 'strings' NB. AOC 2022 Problem 7 proclog =: {{ NB. x is the threshold, y is the boxed lines from the log NB. Process lines one by one, producing list of path;filename;length flist =. 0 3$a: NB. will be path;filename;length for_line. y do. ltext =. deb > line NB. text of line with surplus blanks removed NB. Handle the forms we recognize: $ cd dir and number filename if. '$ cd' -: 4 {. ltext do. dir =. 5 }. ltext NB. the directory string if. dir -: ,'/' do. currdir =. 0$a: NB. reset to top level elseif. dir -: '..' do. currdir =. }: currdir NB. back up one level else. currdir =. currdir , <dir NB. new dir: down one level end. elseif. '0123456789' e.~ {. ltext do. NB. size filename 'num name' =. (ltext i. ' ') ({. ; }.) ltext NB. size and filename (with leading space) flist =. flist , currdir ; name ; 0 ". num NB. add entry for file end. NB. other forms ignored end. NB. Remove duplicate files flist =. 0 2 {"1 ~. flist NB. Calculate size of each directory fdirs =. (a: , <\)&.> {."1 flist NB. all directories for each name dsizes =. (; fdirs) +//. (#@> fdirs) # > {:"1 flist NB. total filesize for each directory NB. Return the total of sizes below threshold +/ (x >: dsizes) # dsizes NB. Result: total size of small directories }}
For Part 2 I will need to modify the script to return the sizes of all directories. That's trivial and I won't write it out. Once I get the value, I need to find the smallest directory with large-enough size. I'll just check that against the example data:
sz =. 48381165 94853 584 24933642 sz =. /:~ sz NB. sort into order need =. 30000000 - 70000000 - {: sz NB. number of additional bytes needed index =. sz I. need NB. index of first file big enough index { sz NB. size of first file big enough 24933642
It took me a time or two to get need right.