Wednesday, April 16, 2008

Google engEdu talk on Bloom filters

Ely Porat gave a Google engEdu talk, "The Bloom filter", with a good survey of Bloom filters and variants on Bloom filters.

Bloom filters let you quickly determine if an item is a member of a set with no false negatives but some false positives. They work well for very large data sets since they require about 10 bits per item for a 1% false positive rate and offer constant time inserts and lookups.

Ely's talk provides some useful motivation at the beginning, then has a nice, light introduction to Bloom filters starting at 8:40. The discussion of variants that support deletion or compression is interesting but, if you find yourself dragging, make sure to at least jump to 30:40 for the cute discussion of Cukoo Hashing and the Cukoo Bloom Filter.

Please see also the "Bloom filter" Wikipedia page which contains a nice summary of Bloom filters and many of the variants described in Ely's talk.

1 comment:

burtonator said...

I'm transcoding this now to my scalecast podcast:

http://scalecast.wordpress.com/

.... so you can watch it on ipods.