Using RedisBloom allows you to efficiently keep track of presence, heavy hitters, and counts on large streams of data. To use RedisBloom in .NET, you should use the StackExchange.Redis library. To get started with that package, follow our getting started guide. Once you have a reference to an
IDatabase object, you will need to use the
db.ExecuteAsync methods to run the custom commands you want against redis bloom.
Bloom Filters are a powerful data structure that can tell if an item is in a set, think a username on a sign-up form. They're incredibly compact, requiring only 10-20 bits per item you want to add, and extremely quick to add items to, and equally fast to determine if an item is in a set or not.
You don't need to create a Bloom Filter explicitly as any call of
BF.ADD to a non-existent key will automatically create a Bloom Filter for you. However, if you want to tell Redis ahead of time how much data the Bloom Filter can expect and the error rate that you want for that data (the number of false positives it will report), You can use the
The above command will reserve a Bloom Filter on the key
bf:username that expects 10000 records and will have an error rate of 1%.
To add to a Bloom Filter, all you need is to use the
The preceding code will add the username
Kermit to the
To check if an item has been added to a Bloom Filter yet, you will use the
After running that command, if the Bloom Filter reports that it contains the item,
exists will be true; otherwise,
exists will be false.
You can use Count-Min Sketches to count the number of times an item has been added to a set quickly and compactly. Although, of course, like other probabilistic data structures, it has some margin of error. In this case, it can over count the number of occurrences. The dimensions of the sketch determine the likelihood of this.
There are two ways to create a Count-Min Sketch, by probability and by dimension. Creating a Count-Min Sketch by probability will automatically generate a Count-Min Sketch based on the amount of overestimation you want to allow and the likelihood of overestimating a given element. If you want to initialize by dimensions, a Count-Min Sketch will initialize with the provided width and depth.
This code will initialize a Count-Min Sketch. The sketch will have an acceptable overcount of 10% and a probability of overcounting of 1%.
To add an item to a Count-Min Sketch, you call the
CMS.INCRBY command, passing in the quantity of the given item you want to add to the sketch.
The above will add three views of Gangnam Style to the sketch and one view of Baby Shark.
To query the number of occurrences of an element in the sketch, you need to use the
Cuckoo Filters solve a similar problem to Bloom Filters; they allow you to determine if an item has been added to a set yet. However, Cuckoo Filters have slightly different characteristics than Bloom Filters. For example, you may add the same item to a Cuckoo Filter more than once, and they do support delete operations (which introduces the possibility of false negatives in addition to false positives).
Similar to a Bloom Filter, a Cuckoo Filter is automatically created by adding an item to a Cuckoo Filter that does not exist. However, you may want to reserve a Cuckoo Filter ahead of time explicitly, so it knows precisely how many items you expect and how to expand. To do this, just run the
To add an item to a Cuckoo Filter, use the
The above will add
James.Bond@mi6.com to the Cuckoo Filter.
To check if an item has been added to a Cuckoo Filter yet, use the
The Top-K data structure allows you to keep a compact leader board of heavy-hitters. This data structure can be extremely useful when keeping track of the most popular items in an enormous stream of data as it makes it so you don't have to keep track of all of the counts of all of your records.
To initialize a Top-K, use the
TOPK.RESERVE command. This command will reserve a Top-K that will keep track of the highest
The above, for example, will keep track of the five most viewed videos sent to the Top-K.
Adding Items to a Top-K requires the use of the
TOPK.ADD command, this command can take however many items you want to insert into it, so if you get a batch of items to send at once, it may make sense to send them all across at the same time. For example, let's say we wanted to send 10,000 updates to the Top-K at the same time from a random set of videos:
This code will send them all across in one shot. You can, of course, chunk the items and send them in batches as well. Regardless, this will add items to your Top-K.
To list the items in your Top-K, you need to query the Top-K using the
This code will get all the items back for you and print them out.
To see if a given item is present in the Top-K, you would use
TOPK.QUERY, passing in the item you want to check membership of:
The above code will check if Baby Shark is in the Top 5 for video views from our above example.
- The Code for this Demo can be found in GitHub