Skip to main content

Using Probabilistic Data Structures with .NET

Using probabilistic data structures in Redis Stack allows you to efficiently keep track of presence, heavy hitters, and counts on large streams of data. To use probabilistic data structures in .NET, you should use the StackExchange.Redis library. To get started with that package, follow our getting started guide. Once you have a reference to an IDatabase object, you will need to use the db.Execute and db.ExecuteAsync methods to run the custom commands you want.

Bloom Filters

Bloom Filters are a powerful data structure that can tell if an item is in a set, think a username on a sign-up form. They're incredibly compact, requiring only 10-20 bits per item you want to add, and extremely quick to add items to, and equally fast to determine if an item is in a set or not.

Create a Filter

You don't need to create a Bloom Filter explicitly as any call of BF.ADD to a non-existent key will automatically create a Bloom Filter for you. However, if you want to tell Redis ahead of time how much data the Bloom Filter can expect and the error rate that you want for that data (the number of false positives it will report), You can use the BF.RESERVE command:

await db.ExecuteAsync("BF.RESERVE", "bf:username", .01, 10000);

The above command will reserve a Bloom Filter on the key bf:username that expects 10000 records and will have an error rate of 1%.

Adding to a Filter

To add to a Bloom Filter, all you need is to use the BF.ADD command:

await db.ExecuteAsync("BF.ADD", "bf:username", "Kermit");

The preceding code will add the username Kermit to the bf:username filter.

Check if an Item is in a Filter

To check if an item has been added to a Bloom Filter yet, you will use the BF.EXISTS command:

var exists = await db.ExecuteAsync("BF.EXISTS", "bf:username", "Kermit") == 1;

After running that command, if the Bloom Filter reports that it contains the item, exists will be true; otherwise, exists will be false.

Count-Min Sketch

You can use Count-Min Sketches to count the number of times an item has been added to a set quickly and compactly. Although, of course, like other probabilistic data structures, it has some margin of error. In this case, it can over count the number of occurrences. The dimensions of the sketch determine the likelihood of this.

Creating a Count-Min Sketch

There are two ways to create a Count-Min Sketch, by probability and by dimension. Creating a Count-Min Sketch by probability will automatically generate a Count-Min Sketch based on the amount of overestimation you want to allow and the likelihood of overestimating a given element. If you want to initialize by dimensions, a Count-Min Sketch will initialize with the provided width and depth.

await db.ExecuteAsync("CMS.INITBYPROB", "cms:views", .1, .01);

This code will initialize a Count-Min Sketch. The sketch will have an acceptable overcount of 10% and a probability of overcounting of 1%.

Adding Items to a Count-Min Sketch

To add an item to a Count-Min Sketch, you call the CMS.INCRBY command, passing in the quantity of the given item you want to add to the sketch.

await db.ExecuteAsync("CMS.INCRBY", "cms:views", "Gangnam Style", 1);
await db.ExecuteAsync("CMS.INCRBY", "cms:views", "Baby Shark", 1);
await db.ExecuteAsync("CMS.INCRBY", "cms:views", "Gangnam Style", 2);

The above will add three views of Gangnam Style to the sketch and one view of Baby Shark.

Querying the Sketch

To query the number of occurrences of an element in the sketch, you need to use the CMS.QUERY command:

var numViewsGangnamStyle = (long)await db.ExecuteAsync("CMS.QUERY", "cms:views", "Gangnam Style");
var numViewsBabyShark = (long)await db.ExecuteAsync("CMS.QUERY", "cms:views", "Baby Shark");
Console.WriteLine($"Gangnam Style Views: {numViewsGangnamStyle}");
Console.WriteLine($"Baby Shark Views: {numViewsBabyShark}");

Cuckoo Filters

Cuckoo Filters solve a similar problem to Bloom Filters; they allow you to determine if an item has been added to a set yet. However, Cuckoo Filters have slightly different characteristics than Bloom Filters. For example, you may add the same item to a Cuckoo Filter more than once, and they do support delete operations (which introduces the possibility of false negatives in addition to false positives).

Creating a Cuckoo Filter

Similar to a Bloom Filter, a Cuckoo Filter is automatically created by adding an item to a Cuckoo Filter that does not exist. However, you may want to reserve a Cuckoo Filter ahead of time explicitly, so it knows precisely how many items you expect and how to expand. To do this, just run the CF.RESERVE command:

await db.ExecuteAsync("CF.RESERVE", "cf:emails", 10000);

Adding to a Cuckoo Filter

To add an item to a Cuckoo Filter, use the CF.ADD command:

await db.ExecuteAsync("CF.ADD", "cf:emails", "foo@bar.com");
await db.ExecuteAsync("CF.ADD", "cf:emails", "James.Bond@mi6.com");

The above will add foo@bar.com and James.Bond@mi6.com to the Cuckoo Filter.

Checking Item Presence in a Cuckoo Filter

To check if an item has been added to a Cuckoo Filter yet, use the CF.EXISTS command:

var jamesEmailExists = (int) await db.ExecuteAsync("CF.EXISTS", "cf:emails", "James.Bond@mi6.com") == 1;
var str = jamesEmailExists
? "James.Bond@mi6.com has already been added"
: "James.Bond@mi6.com has not been added";
Console.WriteLine(str);

Top-K

The Top-K data structure allows you to keep a compact leader board of heavy-hitters. This data structure can be extremely useful when keeping track of the most popular items in an enormous stream of data as it makes it so you don't have to keep track of all of the counts of all of your records.

Initializing a Top-K

To initialize a Top-K, use the TOPK.RESERVE command. This command will reserve a Top-K that will keep track of the highest k items:

await db.ExecuteAsync("TOPK.RESERVE", "topk:views", 5);

The above, for example, will keep track of the five most viewed videos sent to the Top-K.

Add Items to the Top-K

Adding Items to a Top-K requires the use of the TOPK.ADD command, this command can take however many items you want to insert into it, so if you get a batch of items to send at once, it may make sense to send them all across at the same time. For example, let's say we wanted to send 10,000 updates to the Top-K at the same time from a random set of videos:

var videos = new[] {"Gangnam Style", "Baby Shark", "Despacito", "Uptown Funk", "See You Again", "Hello", "Roar", "Sorry"};
var rand = new Random();
var args = new List<string>(10001){"topk:views"};
for (var i = 0; i < 10000; i++)
{
args.Add(videos[rand.Next(videos.Length)]);
}

await db.ExecuteAsync("TOPK.ADD", args.ToArray());

This code will send them all across in one shot. You can, of course, chunk the items and send them in batches as well. Regardless, this will add items to your Top-K.

List the Top K Items

To list the items in your Top-K, you need to query the Top-K using the TOPK.LIST command:

var topK = (RedisResult[]) await db.ExecuteAsync("TOPK.LIST", "topk:views");
foreach (var item in topK)
{
Console.WriteLine(item);
}

This code will get all the items back for you and print them out.

Query if an Item is in the Top-K

To see if a given item is present in the Top-K, you would use TOPK.QUERY, passing in the item you want to check membership of:

var BabySharkInTopK = (int) await db.ExecuteAsync("TOPK.QUERY", "topk:views", "Baby Shark") == 1;
Console.WriteLine(BabySharkInTopK ? "Baby Shark is in the Top 5" : "Baby Shark is Not in the Top 5" );

The above code will check if Baby Shark is in the Top 5 for video views from our above example.

Resources

  • The Code for this Demo can be found in GitHub