Scraping Discord channels

hc
3 min readJul 8, 2021

Introduction

The standard way of consuming an API is to use an API key with the provided SDK of a particular service. However there are times whereby we want access to private API endpoints which are not available. This is the motivation to reverse an official client which could be an app or website to find out the API endpoints and to replay the requests as if we were the client.

Discord

Recently I subscribed to a discord channel and would like to get notified in Telegram whenever a new message is posted. The reason to proxy from Discord to Telegram is I check Telegram more often. Also, the discord channel is a subscriber only for which I wanted to share with my friends the messages I received in a Telegram Group automatically.

I started off with reading the documentation on the website https://discord.com/developers/docs/resources/channel. The documentation was solely for Discord Bots. The issue I had was I couldn’t add my bot to the Discord Channel which I wanted to scrape. That left me with no choice but to use my own account even though there’s a risk of getting the account terminated (https://support.discord.com/hc/en-us/articles/115002192352-Automated-user-accounts-self-bots-).

Where to start Web or Desktop App?

I remembered that discord has a web app, and using the Google Chrome, it is always easier to monitor the network tab using the developer tools built right into it. After logging in, I went to the channel I wanted to scrape. The network tab shows the following requests made. messages?limit=50 does seem a little interesting. Clicking it shows us the json response sent by the server.

Response from discord API

That seems to be what we are looking for! With that, if we are able to replay the endpoint and get back a response we should be confident enough to say that the scraping will work.

To do so, we have to take a look at what is sent to the Discord’s server when we send the request. We can see that the athe endpoint /api/v8/channels/<channelID>/messages?limit=50 is queried with some extra headers. Authorization: mfa._Xf1ZjwM31.... look like what web servers usually use to authenticate the user.

Request headers sent to API

Creating a simple curl, we can try to test it. And indeed, we get a response containing a json object which has the latest messages!

curl https://discord.com/api/v8/channels/<channelID>/messages?limit=50 -H "Authorization: mfa._Xf1ZjwM3l..."

The next thing would be to setup a scheduler which runs every minute or so to poll the endpoint. The logic to get the latest message is straightforward. We store a local variable of the latest id as latestId. By comparing the first element in the json response response[0].id, if there's a mismatch we know that the message is new. Then we update our latestId with response[0].id and continue the polling.

Caveats

Even though I haven’t face any issue with this method after a week, the token used may expire after a certain amount of time. But I guess Discord will extend the token lifetime if they detect it’s active. This is to prevent a normal user from logging out if he's constantly on the web.

The API endpoint may also change in the future but for now, I’m very satisfied with this simple approach.

--

--