Skip to main content

How to Build a Slack Bot to Retrieve Lost Files Using AWS S3 and RediSearch


Profile picture for Ajeet Raina
Author:
Ajeet Raina, Former Developer Growth Manager at Redis

alt_text

If you work remotely then you’re likely to have come across Slack at some point. And if you use Slack on a daily basis, then you’ll be all too aware of how easy it can be to lose files. Being pinged every day by different employees across different channels makes it difficult to keep track of files.

Eventually, you may be forced to rummage through a cluttered library of documents in search of that one _crucial _document that’s needed for a task, report or even a meeting. We’ve all been there and the frustration is just as real as it is irritating...which is why this Launchpad App has created an application to remove this impediment.

It was a tricky application to make, requiring a lot of attention to detail across a number of different components for it to come to fruition. However, the success of this application was possible due to Redis’ ability to extract and process data proficiently.

Thanks to Redis, all of the components functioned harmoniously with one another, creating a fully optimal application that interacts with users in real-time.

Let’s take a look at how this was done. Before we dive in, we’d also like to point out that we have a variety of exciting applications for you to check out in our Redis Launchpad.

alt_text

  1. What will you build?
  2. What will you need?
  3. Architecture overview
  4. Getting started
  5. How it works

Step 1. What will you build?

You’ll build a special search engine that’s designed to retrieve lost files in Slack. This is especially handy for people who work from home, where documents between channels are easily lost between employees.

Below we’ll go through each step chronologically and highlight what components are required to create this application.

Ready to get started? OK, let’s dive straight in.

Step 2. What will you need?

  • Slack:: used as an instant messaging app that connects employees with one another.
  • Slack Block Kit: used as a UI framework for Slack apps that offers a balance of control and flexibility when building experiences.
  • Python:: the preferred programming language to connect Redis in the application.
  • RediSearch: Provides querying, secondary indexing, and full-text search for Redis.
  • S3 bucket: Used as a public cloud storage resource in Amazon Web Services (AWS).
  • AWS Textract: Used as a machine learning service that automatically extracts text.
  • Nodejs: Responsible for image generation.

Step 3. Architecture

Let’s look at each of the components that creates the Reeko-Slack bot:

alt_text

1. file_shared

  • When a new file is shared in any public slack channel the file_share event is sent to the Slack Bot app.
  • The file name is added as a suggestion using the FT.SUGADD command in RediSearch.
  • All file data is added using the JSON.SET command.
  • The file is then stored on the S3 bucket as an object with the key as the filename.

2. S3-get

  • The JSON.GET command checks whether the desired file exists.
  • The file will then be retrieved from the S3 bucket if found.
  • The FT.SEARCH command uses RediSearch to look for documents in the S3 bucket- Users are presented will be prompted with different file name suggestions based on what they’ve typed in the search bar.
  • Once the user chooses one of the file suggestions, it is then downloaded and sent back to Slack.

4. S3-delete

  • User types the file name from the command["text'] parameter
  • The file data is deleted from RedisJson using the JSON.DEL command and is also removed from RediSearch's suggestions using the FT.SUGDEL command.

5. Summarise-document

  • The file name is identified from the command['text'] parameter.
  • It is then retrieved from the S3 bucket through the JSON.GET command.
  • Users can either download the pdf or png file locally from the S3 bucket.
  • The text is extracted using AWS Textract .
  • The extracted text is then summarised using Hugging face transformers summarization pipeline. The text summary is also added back to the JSON document using JSON.SET command.
  • A post request is then sent to the /create-image on the NodeJS backend with the file name and summary text.
  • An image is generated using a base template.
  • The image that is returned is saved to the S3 bucket and sent back to Slack.
  • The image URL is also added to the JSON document using JSON.SET command.

What is the S3 bucket?

The S3 bucket is a simple storage service from Amazon. It allows users to store objects through a web service interface. The product’s value comes from its ability to store, protect and retrieve data from ‘buckets’ at any time from anywhere and on any device.

Step 4. Getting started

Prerequisites

  • Python 3.6+
  • ngrok
  • AWS Account
  • Slack
  • Docker

1. Run Redis Docker container

This simple container image bundles together the latest stable releases of Redis and select Redis modules from Redis Labs. This image is based on the official image of Redis from Docker. By default, the container starts with Redis' default configuration and all included modules loaded.

 docker run -d -p 6379:6379 redis/redis-stack

2. Setup a Python environment

To test the integration, Python needs to be installed on your computer. You can get a suitable release from here. To check your Python version, follow the command below.

 # Python 3.6+ required
git clone https://github.com/redis-developer/Reeko-Slack-Bot
cd Reeko-Slack-Bot
python3 -m venv env
source env/bin/activate
cd python-backend
pip3 install -r requirements.txt

3. Using ngrok as a local proxy

To develop locally we'll be using ngrok. This will allow you to expose a public endpoint that Slack can use to send your app events. If you haven't already, install ngrok from their website .

The ngrok exposes local networked services behind NATs and firewalls to the public internet over a secure tunnel. Share local websites, build/test webhook consumers and self-host personal services.

4. Setting up an AWS Account

For testing you need a verified aws account. You can get your credentials file at ~/.aws/credentials (C:\Users\USER_NAME.aws\credentials for Windows users) and copy the following lines in the .env file.

Also make sure to add your S3 bucket name in the .env file.

 AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY_ID"
AWS_SECRET_ACCESS_KEY="YOUR_SECRET_ACCESS_KEY"
BUCKET_NAME="YOUR_BUCKET_NAME"

5. Install Slack on your local system

Slack needs to be installed on your computer. If it hasn’t been installed, you can get it from here for Windows or Mac. If you don’t already have an account you can make one here.

To get started, you'll need to create a new Slack app by clicking on the following link - https://api.slack.com/apps The foundational framework we’ll be using is Bolt. This will make it easier to build Slack apps with the platform’s latest features.

  1. Click on the Create an App button
  2. Name the application ‘Reeko’ and choose the development workspace. \

alt_text

alt_text

  1. Requesting scopes - Scopes give your app permission to do things (for example, post messages) in your development workspace. You can select the scopes to add to your app by navigating over to the OAuth & Permissions sidebar.
  2. Add the Bot Token Scopes by clicking on the Add an OAuth Scope button.
OAuth ScopeDescription
channels:historyView messages and other content in public channels that reeko has been added to
channels:joinJoin public channels in a workspace
channels:readView basic information about public channels in a workspace
channels:joinJoin public channels in a workspace
chat:writeSend messages as @reeko
chat:write.customizeSend messages as @reeko with a customized username and avatar
chat:write.publicSend messages to channels @reeko isn't a member of
files:readView files shared in channels and conversations that reeko has been added to
files:writeUpload, edit, and delete files as reeko
  1. Add the User Token Scopes by clicking on the Add an OAuth Scope button
OAuth ScopeDescription
channels:historyView messages and other content in public channels that Reeko has been added to.
files:readView files shared in channels and conversations that Reeko has been added to.
  1. Install your own app by selecting the Install App button at the top of the OAuth & Permissions page, or from the sidebar.
  2. After clicking through one more green Install App To Workspace button, you'll be sent through the Slack OAuth UI.
  3. After installation, you'll land back in the OAuth & Permissions page and find a Bot User OAuth Access Token. and a User OAuth Token. Click on the copy button for each of them. These tokens need to be added to the .env file. (The bot token starts with xoxb whereas the user token is longer and starts with xoxp).
SLACK_USER_TOKEN=xoxp-your-user-token
SLACK_BOT_TOKEN=xoxb-your-bot-token

alt_text

  1. As well as the access token, you'll need a signing secret. Your app's signing secret verifies that incoming requests are coming from Slack. Navigate to the Basic Information page from your app management page. Under App Credentials, copy the value for Signing Secret and add it to the .env file.
SLACK_SIGNING_SECRET=your-signing-secret

alt_text

  1. Make sure you have followed the steps in Cloning the repo to start the bolt app. The HTTP server is using a built-in development adapter, which is responsible for handling and parsing incoming events from Slack on port 3000.
python3 app.py

alt_text

Open a new terminal and ensure that you've installed ngrok. Make sure to tell ngrok to use port 3000 (which Bolt for Python uses by default):

ngrok http 3000

alt_text

For local slack development, we'll use your ngrok URL from above, so copy it to your clipboard.

For example: https://your-own-url.ngrok.io (copy to clipboard)

  1. Now we’re going to subscribe to events. Your app can listen to all sorts of events that are happening around your workspace - messages being posted, files being shared and more. On your app configuration page, select the Event Subscriptions sidebar. You'll be presented with an input box to enter a Request URL, which is where Slack sends the events your app is subscribed to. Hit the save button.

By default Bolt for Python listens for all incoming requests at the /slack/events route, so for the Request URL you can enter your ngrok URL appended with /slack/events.

Request URL: https://your-own-url.ngrok.io/slack/events

If the challenge was successful, you’ll get “verified” right next to the Request URL.

alt_text

On the same page click on the Subscribe to bot events menu that sits at the bottom of the page. Click on the Add Bot User Event.

Similarly, click on Subscribe to events on behalf of the user but then click on Add Workspace Event.

Add the following scopes

EventNameDescriptionRequired Scope
file_shareA file was sharedfiles:read
message.channelsA message was posted to a channelchannels: history

alt_text

  1. Select the Interactivity & Shortcuts sidebar and toggle the switch as on. Again, for the Request URL, enter your ngrok URL appended with /slack/events.
Request URL: https://your-own-url.ngrok.io/slack/events

alt_text

  1. Scroll down to the Select Menus section in the Options Load URL and enter your ngork URL appended with /slack/events.
Options Load URL: https://your-own-url.ngrok.io/slack/events

alt_text

  1. Finally we come to the slash commands. Slack's custom slash commands perform a very simple task. First they take whatever text you enter after the command itself (along with some other predefined values). Next, they’ll send it to a URL and accept whatever the script returns. After this, Slack will post it as a Slackbot message to the person who issued the command. We have 5 slash commands to add in the workspace.

Visit the Slash Commands sidebar and click on the Create New Command button to head over the Create New Command page. Add the Command, Request URL, Short Description and Usage hint, according to the table provided below.

Click on Save to return to the Slash Commands.

CommandRequest URLShort DescriptionUsage Hint
/s3-gethttps://your-own-url.ngrok.io/slack/eventsGet a file from s3 bucketfilename
/s3-searchhttps://your-own-url.ngrok.io/slack/eventsSearch for a file in S3
/s3-deletehttps://your-own-url.ngrok.io/slack/eventsDeletes the given file from the s3 bucketfilename
/summarise-documenthttps://your-own-url.ngrok.io/slack/eventsSummarise a documentfilename

alt_text

alt_text

alt_text

alt_text

alt_text

  1. Open the Slack channel and upload a file in any channel. Make sure to note the file name.

6. Setting up a NodeJS backend

Requirements

Getting Started:

alt_text

GraphicsMagick

GraphicsMagick is a highly versatile piece of software used for image processing. To generate images, you need to have GraphicsMagick installed on your machine.

You can find the suitable release from http://www.graphicsmagick.org/download.html#download-sites.

alt_text

Nodejs

note

Please follow all the steps in python-backend/README.md first.

Copy the AWS credentials from the python-backend/.env to the config.json file.

{
"accessKeyId": "",
"secretAccessKey": "",
"region": ""
}

Install all the packages and run the server.

npm install
npm run start

alt_text

7: Connecting Slack to S3

This step involves bringing your AWS S3 bucket to your Slack account. Doing this will allow you to upload, download or delete files from your workspace without writing a single line of code.

/S3-get filename

The purpose of this command is to retrieve a specified file from the S3 bucket. Once you type in the name of the file in the search bar, Reeko will check whether this document exists. If the document doesn’t exist then it will return as false and nothing will be done.

If the file is found, then the JSON.GET command will capture its name and download it from the S3 bucket. The downloaded file is sent back as a direct message in Slack.

JSON.GET amazonshareholderletterpdf

alt_text

/s3-delete filename

This command involves deleting files from the S3 bucket. To achieve this you simply need to type in the file name in the search bar and Reeko will pull up the file as demonstrated below.

You’ll have the option to permanently delete the file from the S3 bucket. The file data is deleted from RedisJson using the JSON.DEL command and is removed from RediSearch's suggestions using the FT.SUGDEL command. You’ll be informed when the file is deleted.

FT.SUGDEL file-index "amazon-shareholder-letter.pdf"

JSON.DEL amazonshareholderletterpdf

Step 8: File searching

Have you ever searched for a file without being entirely sure what it is you’re looking for? You may remember snippets of the content but not enough to manually track down its location. Well due to RediSearch’s autocomplete functionality this will no longer be a problem.

This command first opens up a modal inside of Slack with a search bar. You’ll then be suggested different file names will then be suggested depending on whatever the text you’ve written is. The way this works is simple:

Let’s assume the bucket has documents called abcd.csv, abcd.csv and abcdef.sv. If you type abc into the search bar, you’ll get these three results as a list from the FT.SEARCH command. From here you’ll be able to select the file you’re looking for. Once selected, the file is downloaded and sent back to Slack.

FT.SEARCH file-index "ama"

Step 9: Document summarization

In this step, Reeko will extract all of the text from the documents and summarize the content of each one with an image. This will prevent you from having to tediously open each document to get access to important information. Here’s how to do it:

  1. Get the file name from the command['text'] parameter.
  2. If the file is found, you can get the file's name by using the JSON.GET command.
 JSON.GET amazonshareholderletterpdf

  1. Download the pdf or png file locally from S3 bucket
  2. Extract the text using AWS Textract.
  3. The extracted text is summarised using the Hugging face transformers summarisation pipeline. The text summary is also added back to the JSON document using JSON.SET command.
 JSON.SET amazonshareholderletterpdf .summary ' Amazon has grown from having 158 employees to 614. We had just gone public at a split-adjusted stock price of $1. 50 per share.  In 1997, we hadnâ\x80\x99t invented prime, marketplace, alexa, or aws. If you want to be successful in business, you have to create more than you consume.  Your goal should be to create value for everyone you interact with. Stock prices are not about the past.  They are a prediction of future cash flows discounted back to the present.'
  1. A post request is then sent to the /create-image on the nodejs backend with the file name and summary text.
  2. An image is generated using a base template.
  3. The image that is returned is saved to the S3 bucket and sent back to Slack.
  4. The image URL is also added to the JSON document using JSON.SET command.
 JSON.SET amazonshareholderletterpdf .file_path 'https://bucket-1234.s3.amazonaws.com/b8bac45f-7f69-4c28-a26e-9888d9771bed-image.png'

Below we’ve used the Amazon 2020 shareholder letter as an example of how a document can be summarized using Reeko.

5. How it works

The Slack app is built using Bolt for Python framework. To connect the AWS S3 bucket and AWS Textract, use their respective boto3 clients.

Slack is receptive to all events around your workspace such as messages being posted, files being shared, users joining the team, and more. To listen to events, Slack uses the Events API. And to enable custom interactivity, you can use the Block Kit.

Slash commands work in the following way. First they consider the text you enter after the command itself and then send it to a URL. They then accept whatever the script returns and post it as a Slackbot message to the person who issued the command. There’s a set of 4 slash commands that make the slackbot.

In the application there are two Redis Modules:

  • RedisJSON - store file information like filename, summary and image url.
  • RediSearch - searches for files in the S3 bucket

The code below is used to initialize RediSearch in redisearch_connector.py. This is done by creating an index with the name file_index.

from redisearch import Client, TextField, AutoCompleter, Suggestion

class RedisSearchConnector():
def __init__(self):
self.index_name = 'file_index'
self.client = Client(self.index_name)
self.ac = AutoCompleter(self.index_name)

Use the following code to initialise RedisJSON in redisjson_connector.py.

from rejson import Client, Path

class RedisJsonConnector():
def __init__(self):
self.rj = Client(decode_responses=True)

And the code below is used to create an index on RediSearch

FT.CREATE file-index ON HASH SCHEMA file_name TEXT SORTABLE file_id TEXT created TEXT timestamp TEXT mimetype TEXT filetype TEXT user_id TEXT size

Conclusion: preventing lost files with Redis

The advanced capabilities of Redis allowed this Launchpad App to create an invaluable asset to remote workers - to never lose a file again on Slack. RediSearch offered a simple yet effective way of transmitting data to and from the S3 bucket with no lags, no pauses and no delays whatsoever. You can discover more about the ins and outs of how this app was made by simply clicking here.

Reeko is an innovative application that joins our exciting collection of apps that we currently have on the Redis Launchpad. By using Redis, programmers from all over the world are creating breakthrough applications that are having an impact on daily lives… and you can too.

So how can you change the world using Redis? For more inspiration, make sure to check out the other applications we have on our Launchpad.

alt_text

Who built this application?

Sarthak Arora

Being only 20 years old, Sarthak is a young yet highly-advanced programmer who’s already a 2x international Hacong winner.

To discover more about his work and his activity on GitHub, you can check out his profile here.

References