NYCJUG/2010-01-12/PerlIdiomKeysMapEG
Example of good explanation - from http://www.perlmonks.org/?node_id=280658 - of a Perl idiom. This is good because it gives examples and is followed by comments from various people elaborating on usage and and efficiencies in relation to the construct.
Perl Idioms Explained - keys %{{map{$_=>1}@list}}
by broquaint (Abbot) on Aug 04, 2003 at 12:55 UTC ( #280658=perlmeditation: print w/ replies, xml )
Perl Idioms Explained - keys %{{map{$_=>1}@list}} =
my @list = qw/a b c d d a e b a b d e f/; my @uniq = keys %{{ map {$_ => 1} @list }}; print "@list\n@uniq\n";
__output__
a b c d d a e b a b d e f e f a b c d
Explanation
The above idiom is a simple way of creating a list of unique values from another list, as the output of the code aptly demonstrates. However, with all those curly braces it may not be immediately obvious what's going on, so let's break it down.
map { $_ => 1 } @list
This is pretty straight-forward - create a list of key-value pairs where the keys are the values from @list.
{ map { $_ => 1 } @list }
The surrounding pair of curly braces creates an anonymous hash which is populated with they key-value pairs from the map statement. So we now have a hash reference to an anonymous hash who's keys are the elements from @list, and because hash keys are unique, the keys of the anonymous hash represent a unique set of values.
keys %{ { map { $_ => 1 } @list } }
Finally, with the last pair of curly braces the hash reference to the anonymous hash is dereferenced and we get its list of keys.
Caveats
Because this idiom creates a list, and then a hash, both of which are immediately disposed of, it's not suited for large lists. Also, because the keys of a hash aren't ordered, the list of unique values returned will be in a random order (although this can often be remedied with a simple sort).
Summary
A short (memory hungry) way to get a list of unique values from a given list.
broquaint
Comment on Perl Idioms Explained - keys {{{ %{{map{$_=>1}@list}} }}}
Re: Perl Idioms Explained - keys %{{map{$_=>1}@list}}
by Juerd (Abbot) on Aug 04, 2003 at 13:02 UTC
In the same category, to see if $bar is in @foo:
my %foo = map { $_ => 1 } @foo; if ($foo{$bar}) { ... }
or faster and using less memory:
my %foo; undef @foo{@foo}; if (exists $foo{$bar}) { ... }
Of course, for a single test, grep is easier and often faster:
if (grep $_ eq $bar, @foo) { ... }
Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }
Re: Perl Idioms Explained - keys %{{map{$_=>1}@list}}
by diotalevi (Canon) on Aug 04, 2003 at 13:23 UTC
Instead of using => 1, it is valid and cheaper to use => undef.
Re: Re: Perl Idioms Explained - keys %{{map{$_=>1}@list}}
by RMGir (Parson) on Aug 04, 2003 at 15:34 UTC
I'm not disagreeing, but I am curious. Why is it cheaper? One less node allocated? (Oh, I think I see... PL_sv_undef gets reused instead of a new SV?) This sounds like a job for those perl internals diagrams someone did at some point... Wish my memory was better. -- Mike
Re: Re: Re: Perl Idioms Explained - keys %{{map{$_=>1}@list}}
by diotalevi (Canon) on Aug 04, 2003 at 15:41 UTC
Yes. Instead of one scalar of the value '1' for every key, you get to share the same undef value for all the keys and thus don't have to allocate tons of memory you aren't going to use anyway. So
keys @{{map { $_, undef } @ary}}
has the potential to be a lot cheaper than
keys @{{map { $_, 1 } @ary}}
The "a lot" part is just related to how many values are in your array. You burn one extra value for every entry in @ary and I'm just suggesting that you don't have to do that in this case.
Re: Perl Idioms Explained - keys %{{map{$_=>1}@list}}
by dragonchild (Archbishop) on Aug 04, 2003 at 14:06 UTC
Along similar lines, I will often create a hash of keys which are needed for lookup. For example, I will do:
my %acceptable_server = map { $_ => 1 } qw( DEV QA PROD );
It's a nice way of working with a changing list of static acceptable things.
We are the carpenters and bricklayers of the Information Age. The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6 Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.
Re: Perl Idioms Explained - keys %{{map{$_=>1}@list}}
by thinker (Parson) on Aug 04, 2003 at 15:46 UTC
Hi broquaint, or, to keep the order that the items are inserted (ie. the order in which the first instance of a value is encountered) my %seen;
@uniq = grep ! $seen{$_}++, @list;
cheers thinker
Re: Re: Perl Idioms Explained - keys %{{map{$_=>1}@list}}
by LanceDeeply (Chaplain) on Aug 04, 2003 at 18:58 UTC
This is the way I've usually seen it done. I was curious, so I ran a benchmark against the two. use Benchmark;
my @list; for ( 0..9999 ) { push @list, sprintf "%d", 100 * rand ; } timethese( 1000, { 'keys_map' => sub { my @uniq = keys %{{ map {$_ => 1} @list }} +; }, 'grep_seen' => sub { my %seen; my @uniq = grep ! $seen{$_}+ ++, @list; }, } );
Yields the following output. Benchmark: timing 1000 iterations of grep_seen, keys_map... grep_seen: 13 wallclock secs (11.17 usr + 0.00 sys = 11.17 CPU) @ 89 +.52/s (n=1000) keys_map: 30 wallclock secs (29.28 usr + 0.00 sys = 29.28 CPU) @ 34 +.15/s (n=1000)
Re: Re: Re: Perl Idioms Explained - keys %{{map{$_=>1}@list}}
by RMGir (Parson) on Aug 04, 2003 at 20:28 UTC
Adding
'keys_map_undef' => sub { my @uniq = keys %{{ map {$_ => undef} @list +}}; },
to test the undef suggestion, it turns out to be 15-20% faster than using 1. grep still wins, though. -- Mike
Re^2: Perl Idioms Explained - keys %{{map{$_=>1}@list}}
by Aristotle (Chancellor) on Aug 04, 2003 at 20:12 UTC
Note that this is slightly broken as is. To be entirely correct, you have to say
my (%seen, $seen_undef); my @uniq = grep defined() ? !$seen{$_}++ : !$seen_undef++, @list;
Of course, if you're fiddling with objects which cannot be compared for equity by stringification, it is still broken. Makeshifts last the longest.
Re3: Perl Idioms Explained - keys %{{map{$_=>1}@list}}
by dragonchild (Archbishop) on Aug 04, 2003 at 20:14 UTC
Of course, if you're fiddling with objects which cannot be compared for equity by stringification, it is still broken. Well, so is every uniquification based on the keys of a hash! The point still stands that grep is faster than the original idiom presented, by orders of magnitude, if still as memory-hungry.
We are the carpenters and bricklayers of the Information Age. The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6 Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.
Re^4: Perl Idioms Explained - keys %{{map{$_=>1}@list}}
by Aristotle (Chancellor) on Aug 04, 2003 at 20:19 UTC
Re: Re: Perl Idioms Explained - keys %{{map{$_=>1}@list}} by Jasper (Chaplain) on Aug 04, 2003 at 22:21 UTC
You can also grep lists for certain 'numbers' of entries (so if you wanted only the items that were in a list twice)
@doubles = grep ++$seen{$_} == 2, @list;
Jasper
Re: Re: Re: Perl Idioms Explained - keys %{{map{$_=>1}@list}}
by cees (Curate) on Aug 05, 2003 at 03:32 UTC
That would be 2 or more times right? Once ++$seen{$_} == 2 is true, you are immediatelly copying $_ to @doubles. So if it appeared a third time, you couldn't magically remove it again. Your code would be clearer if you used >= to emphasize that. The following would probably do if you only wanted entries with exactly 2 occurences: @doubles = grep $seen{$_} == 2, grep !$seen{$_}++, @list; But it is not pretty, or obvious. The first grep counts all occurences and only passes the first found entry to the next grep which will check to see how many were actually found. There must be a better way! - Cees
Re^3: Perl Idioms Explained - keys %{{map{$_=>1}@list}}
by Aristotle (Chancellor) on Aug 05, 2003 at 17:51 UTC
No, that won't fly. You need
$seen{$_}++ for @list; my @doubles = grep $seen{$_} == 2, @list;
Makeshifts last the longest.
Re: Perl Idioms Explained - keys %{{map{$_=>1}@list}}
by Aristotle (Chancellor) on Aug 04, 2003 at 20:18 UTC
Just as memory hungry, still not order preserving, and still broken with regards to undefined values, but faster approach:
my %seen; @seen{@list}=(); my @uniq = keys %seen;
And if you want to wrap it up without "leaking" lexicals:
my @uniq = do { my %seen; @seen{@list}=(); keys %seen; };
This approach does not (explicitly) iterate - more of the data structure setup work is done implicitly by perl. Makeshifts last the longest.