How to scrape live Crypto prices from WebSocket with Python?

Sasha Bouloudnine
March 13, 2024
17 min read

It is the bull run: cryptos are pumping to death and guys are making tons of money betting on…the incredible jeo boden.

tweet barkery making huge profit profit on jeo boden crypto meme coin - image32.png

But in this ocean of fanciful corners, how to find the corner that will propel you to the top?

new pairs on dexscreener snapshot - image17.png

In this tutorial, we will see how to scrape live prices of meme solana coins from DexScreener, using Python and WebSocket.

And automate this financial data export to a CSV file.

Let's make the dumbest decisions ever... based on data.

wagmi

What is a WebSocket?

A WebSocket is a communication protocol which allows real-time two-way communication between two entities: a client (you) and a server (the site).

What's the difference with an HTTP connection?

  1. HTTP: request response model with a connection by exchange

  2. WebSocket: two-way communication persistent on a single connection

With the protocole HTTP, once the exchange is completed the connection is broken. This is a request-response model with a connection by exchange.

websocket protocol client server messages schema - image15.png

On the contrary, with WebSocket, the connection is never broken.

This allows instantaneous data exchange, particularly interesting when real time is required.

http protocol schema client server colors - image9.png

If you want to know more, you can take a look at the Websocket - Wikipedia article. It's not always easy to digest, but it's interesting.

websocket wikipedia page - image11.png

Websocket vs. HTTP requests

As said above, the Websocket allows rapid bidirectional exchange. We will therefore use it for everything that needs real-time data.

  1. Chat applications
  2. Online (sport) games
  3. Real-time stock tickers

Chat applications

When it comes to message exchange, the most famous application to use the websocket protocol is Slack.

Go to your Slack channel, and filter the requests by websocket: a single request appears.

With 1 message every 10 seconds.

slack using websocket live chat - image23.png

Online (sport) games

Online betting has been booming for years.

On FanDuel SportsBooks, the leading online betting site in the US, there are between 5 and 7 million unique visitors per month, or 1 in 50 Americans.

fanduel monthly visits from similarweb - image25.png

Here too, if we open the Chrome inspection tool, on bwin for instance, part Network, and we filter the queries by WS, it is-to say websocket: we find a single connection.

Every second, the odds of all bets are transmitted.

bwin using websocket live odds sport - image16.png

Real-time stock tickers

Finally, we find this rapid bidirectional exchange technology in finance.

Here, in order to create powerful high-frequency trading tools, it is necessary to rely on data that can be manipulated on the scale of seconds.

one second time slot on tradingview - image27.png

On TradingView for example, the world's first consumer financial visualization platform, with 200 million unique visitors each month, we use…the websocket.

With messages exchanged between the client and the server every second.

tradingview using websocket financial data streams - image20.png

And we find the essential information for trading on the markets:

  1. ticker
  2. volume
  3. timestamp
  4. price
{ "m":"qsd", "p":[ "qs_HOdvVPMeHy0j", { "n":"NYSE:BABY", "s":"ok", "in":{ "volume":10295677, "lp_time":1710268799, "lp":75.97, "chp":1.48, "ch":1.11 } } ] }

And it is therefore also this technology that we will find on the screener of the most beautiful meme coins of the moment: DexScreener.

What data will we recover from DexScreener?

DexScreener is full of financial data everywhere: liquidity, website, volume etc…

As part of this tutorial, we will recover all financial data which are available from the h6 trending token list page on Solana:

  1. Pair name
  2. Token price
  3. Number of transactions
  4. Volume
  5. Makers
  6. Growth over 5m
  7. Growth over 1 hour
  8. Growth over 6 hours
  9. Growth over 24 hours
  10. Total liquidity
  11. Market Cap
  12. Token creation date

And accessible from this URL:

https://dexscreener.com/solana?rankBy=trendingScoreH6&order=desc

list of trending h6 tokens solana dexscreener snapshot - image12.png

In addition, we will retrieve some social information:

  1. Presence of a Twitter
  2. Presence of a Telegram
  3. Presence of a token image
  4. Presence of a banner

image banner image and social media links meme coin dexscreener - image5.png

Be careful, we will simply retrieve the fact of knowing whether or not the peer has a social link, not the link itself. To do this you have to visit the token page, not done here.

And here, in JSON format, is what the data retrieved for each token looks like:

{ "chainId":"solana", "dexId":"raydium", "pairAddress":"3LktdenQLDMgUDCCYFa2HthfcfSZkSbH4HuS6GBGsUcy", "baseToken":{ "address":"JBkhsnrng7vSzh7H2LWA7FFEMjsqDNXuFfT3rUhsHgLb", "name":"RACE CAT", "symbol":"RCAT" }, "quoteToken":{ "address":"So11111111111111111111111111111111111111112", "name":"Wrapped SOL", "symbol":"SUN" }, "quoteTokenSymbol":"SUN", "price":"0.0006895", "priceUsd":"0.1082", "txns":{ "m5":{ "buys":69, "sells":30 }, "h1":{ "buys":1362, "sells":880 }, "h6":{ "buys":1362, "sells":880 }, "h24":{ "buys":1362, "sells":880 } }, "buyers":{ "m5":58, "h1":980, "h6":980, "h24":980 }, "sellers":{ "m5":24, "h1":606, "h6":606, "h24":606 }, "makers":{ "m5":82, "h1":1004, "h6":1004, "h24":1004 }, "volume":{ "m5":9872.42, "h1":276737.25, "h6":276737.25, "h24":276737.25 }, "volumeBuy":{ "m5":5170.53, "h1":143102.08, "h6":143102.08, "h24":143102.08 }, "volumeSell":{ "m5":4701.88, "h1":133635.17, "h6":133635.17, "h24":133635.17 }, "priceChange":{ "m5":9.09, "h1":2602, "h6":2602, "h24":2602 }, "liquidity":{ "usd":20808.89, "base":96169, "quote":66.2277 }, "marketCap":108258, "pairCreatedAt":1710345305000, "ear":true, "profile":{ "ear":true, "website":true, "twitter":true, "linkCount":3, "imgKey":"8181ca" }, "c":"a", "a":"solamm" }

It's well structured, it's complete, and with the WebSocket, we're going to scrape that with pretty high frequency.

Why scrape data from DexScreener with Python?

All this data… what for?

We identified 3 convincing use cases, specific to cryptocurrency and finance market:

  1. Build sell/buy alert
  2. Build a predictive tracker
  3. Build trading bots

Build sell/buy alert

You want to be alerted as soon as doland tremp passes a certain single value upwards, to sell and take juicy profits?

Or on the contrary, if you are the victim of rug pull, be able to sell down before it’s too late?

In both situations, DexScreener data will allow you to generate an alert from the token price.

build a selling alert bot crypto - image13.png

Don't let yourself be surprised anymore.

Build a predictive tracker

If only it were possible, from the quantitative values ​​of the first 3 candles, to predict the growth... the explosion of the token...

predict coin pump based on historical financial quantitative data - image6.png

With all this data, you will be able to generate a list of criteria yourself in order to designate the winning pair.

For example:

  1. Large volume of transactions in 15 minutes
  2. Number of holders
  3. Number of transactions

ChatGPT is full of financial modeling suggestions, it can be an interesting starting point.

chatgpt suggests mathematic model to predict crytpo meme coin pump - image22.png

Build trading bots

Buy automatically when it goes up or a certain threshold of transactions has been crossed... and do the same thing on the downside to make profits effortlessly.

Too good to be true?

However, this is what this very recent guide from QuickNode, published on 02/09/2024, offers:

Create a Solana Trading Bot Using Jupiter API.

quicknode create a solana trading bot using jupiter api doc - image30.png

You will be able to offer your trading robot this precious data, and build a high frequency trading tool… homemade.

If you want to go further without reading tons of docs, we recommend this very good video from MoonDev:

i coded a solana sniper python trading bot for you

i coded a solana bot trader for you youtube video - image1.png

Or explore further with advanced machine learning predictive models. The Machine Learning Crash Course with TensorFlow APIs of Google is a good starting point.

Be careful, as they say realistically in the USA, there is no free lunch.

There is little doubt, however, that with the bull run coming, you will be able to make profits. It’s (almost) promised.

DexScreener does not refer to data scraping neither in its documentation nor in its notices of use.

no mention of scraping in dexscreener terms of use - image21.png

And even if he did, all this data is public data, accessible to everyone from the data of the chain.

This is the whole principle of blockchain.

A publicly accessible and shared transactions database.

And in the United States in particular, it is completely legal to scrape public data.

webscraping is legal techcrunch article - image4.png

Complete Code

The complete code is accessible right here, and can be downloaded in full directly from the GitHub Gist here:

dexscreener_trending_solana_pairs_websocket_scraper.py.

# ============================================================================= # Title: DexScreener Crypto Live Prices Scraper # Description: This script scrape the first 200 h6 trending Solana pairs from DexScreener -- every 10 seconds # Author: Sasha Bouloudnine # Date: 2024-03-13 # # Usage: # - Install websocket using `pip install websockets`. # - Launch the script. # # ============================================================================= import asyncio import websockets from datetime import datetime import you import base64 import json import csv import time def generate_sec_websocket_key(): random_bytes = os.urandom(16) key = base64.b64encode(random_bytes).decode('utf-8') return key TYPES = ['pairs', 'latestBlock'] DATA = [] FIELDNAMES = [ "chain_id", "dex_id", "pair_address", "token_address", "token_name", "token_symbol", "token_m5_buys", "token_m5_sells", "token_h1_buys", "token_h1_sells", "token_h1_to_m5_buys", "token_liquidity", "token_market_cap", "token_created_at", "token_created_since", "token_eti", "token_header", "token_website", "token_twitter", "token_links", "token_img_key", "token_price_usd", "token_price_change_h24", "token_price_change_h6", "token_price_change_h1", "token_price_change_m5" ] async def dexscreener_scraper(): headers = { "Host": "io.dexscreener.com", "Connection": "Upgrade", "Pragma": "no-cache", "Cache-Control": "no-cache", "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36", "Upgrade": "websocket", "Origin": "https://dexscreener.com", "Sec-WebSocket-Version": 13, "Accept-Encoding": "gzip, deflate, br, zstd", "Accept-Language": "fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7", "Sec-WebSocket-Key": generate_sec_websocket_key() } type ="wss://io.dexscreener.com/dex/screener/pairs/h24/1?rankBy[key]=trendingScoreH6&rankBy[order]=desc" async with websockets.connect(uri, extra_headers=headers) as websocket: while True: message_raw = await websocket.recv() message = json.loads(message_raw) _type = message["type"] assert _type in TYPES if _type == 'pairs': pairs = message["pairs"] assert pairs for pair in pairs: chain_id = pair["chainId"] dex_id = pair["dexId"] pair_address = pair["pairAddress"] assert pair_address token_address = pair["baseToken"]["address"] token_name = pair["baseToken"]["name"] token_symbol = pair["baseToken"]["symbol"] token_txns = pair["thx"] token_m5_buys = token_txns["m5"]["buys"] token_m5_sells = token_txns["m5"]["sells"] token_h1_buys = token_txns["h1"]["buys"] token_h1_sells = token_txns["h1"]["sells"] token_h1_to_m5_buys = round(token_m5_buys*12/token_h1_buys, 2) if token_m5_buys else None token_liquidity = pair["liquidity"]["usd"] token_market_cap = pair["marketCap"] token_created_at_raw = pair["pairCreatedAt"] token_created_at = token_created_at_raw / 1000 token_created_at = datetime.utcfromtimestamp(token_created_at) now_utc = datetime.utcnow() token_created_since = round((now_utc - token_created_at).total_seconds() / 60, 2) token_eti = pair.get("ear", False) token_header = pair.get("profile", {}).get("header", False) token_website = pair.get("profile", {}).get("website", False) token_twitter = pair.get("profile", {}).get("twitter", False) token_links = pair.get("profile", {}).get("linkCount", False) token_img_key = pair.get("profile", {}).get("imgKey", False) token_price_usd = pair["priceUsd"] token_price_change_h24 = pair["priceChange"]["h24"] token_price_change_h6 = pair["priceChange"]["h6"] token_price_change_h1 = pair["priceChange"]["h1"] token_price_change_m5 = pair["priceChange"]["m5"] VALUES = [ chain_id, dex_id, pair_address, token_address, token_name, token_symbol, token_m5_buys, token_m5_sells, token_h1_buys, token_h1_sells, token_h1_to_m5_buys, token_liquidity, token_market_cap, token_created_at, token_created_since, token_eti, token_header, token_website, token_twitter, token_links, token_img_key, token_price_usd, token_price_change_h24, token_price_change_h6, token_price_change_h1, token_price_change_m5 ] print(token_name, token_price_usd) row = dict(zip(FIELDNAMES, VALUES)) DATA.append(row) file_created_at = int(time.time()) filename = 'dexscreener_%s.csv' % file_created_at with open(filename, 'In') as f: writer = csv.DictWriter(f, fieldnames=FIELDNAMES, delimiter='\t') writer.writeheader() for row in DATA: writer.writerow(row) print('done %s' % filename) print('pause 10s :°') time.sleep(10) if __name__ == "__main__": asyncio.run(dexscreener_scraper())

Prerequisites

Before launching it, just install the Python library websockets with the pip library installation tool.

This library allows you to exchange messages via the WebSocket protocol with Python.

$ pip install websockets

How it works?

First, go to GitHub and download the script, or copy and paste the contents of the script into a Python file.

download python file from github - image7.png

Then, open your console, and launch the script with the command below.

$ python3 dexscreener_trending_solana_pairs_websocket_scraper.py PayPaw 0.001496 SolPets 0.02063 Lion 0.003831 ate boden 0.1267 doland tremp 0.4682 Peng 0.7786 I CHOOSE POOR EVERYTIME! 0.01417 I CHOOSE RICH EVERYTIME! 0.03966 ... VANRY 0.3481 ZynCoin 0.1241 SolCard 0.04757 done dexscreener_1710028163000.csv pause 10s :°

The script will perform the following actions:

  1. Open connection with DexScreener WebSocket client
  2. Retrieve data from 200 Solana trending h6 coins
  3. Save this as a CSV file

Every 10 seconds.

Powerful.

🦅

Step by step tutorial

The code is there but…how does it work?

This is what we will see in this complete tutorial, which we will carry out in 4 distinct stages:

  1. Identify the websocket endpoint
  2. Adding the while loop
  3. Parse the data
  4. Export to CSV

Identify the websocket endpoint

Internet browsing is based on the concept of query: it materializes an exchange between a client (the browser) and a server (the site).

The exchange can be summarized as follows:

  1. The browser (client) arrives on the site
  2. A request is sent
  3. The site (server) returns a response
  4. The browser (client) displays web page

client server request response schema - image26.png

In our case, where is the WebSocket request located?

To do this, you must go to DexScreener, then:

  1. Open Chrome Inspection Tool
  2. Go to the tab Network
  3. Filter by WS, short for WebSocket
  4. Refresh
  5. Retrieve the valuable query URL

identify websocket request network chrome dev tools - image24.png

Note that in the part Messages, we find expected bi-directional messages exchanged between the site (server) and the browser (client).

Notably the absolutely intuitive: ping > pong.

webscokets network chrome screenshot ping pong - image2.png

But now how to reproduce these exchanges with Python?

We will start with Copy as cURL, to retrieve the value of the URL, as well as the headers of the request.

copy as curl request network chrome dev tools - image8.png

Then we can convert the query into requests, with the excellent Convert cURL commands to Python of our friends ScrapingBee.

Finally, we will replace the part requests, with the syntax of the library websockets.

import asyncio import websockets import json async def dexscreener_scraper(): headers = { "Host": "io.dexscreener.com", "Connection": "Upgrade", "Pragma": "no-cache", "Cache-Control": "no-cache", "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36", "Upgrade": "websocket", "Origin": "https://dexscreener.com", "Sec-WebSocket-Version": 13, "Accept-Encoding": "gzip, deflate, br, zstd", "Accept-Language": "fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7" } type ="wss://io.dexscreener.com/dex/screener/pairs/h24/1?rankBy[key]=trendingScoreH6&rankBy[order]=desc" async with websockets.connect(uri, extra_headers=headers) as websocket: message_raw = await websocket.recv() message = json.loads(message_raw) print(message) if __name__ == '__main__': asyncio.run(dexscreener_scraper())

Note the use of the asynchronous library asyncio. We recommend this excellent video to go further: Live Crypto Prices with Websockets - Python Web Scraping for Beginners.

We start the machine and… eureka!

A long JSON appears, with the list of pairs, and for each pair the following metrics:

{ "schemaVersion":"1.3.0", "type":"pairs", "stats":{ "m5":{ "txn":36900, "volumeUsd":15302890.229999958 }, "h1":{ "txn":469943, "volumeUsd":240360254.3900005 }, "h6":{ "txn":2888366, "volumeUsd":1731720375.3599985 }, "h24":{ "txn":10680606, "volumeUsd":8217607027.990008 } }, "pairs":[ { "chainId":"solana", "dexId":"raydium", "pairAddress":"77JrcxAzPUEvn9o1YXmFm9zQid8etT4SCWVxVqE8VTTG", "baseToken":{ "address":"8wzYfqeqkjBwYBHMacBVen8tSuJqXiDtsCgmjnUJDSKM", "name":"PORTNOY", "symbol":"PORTNOY" }, "quoteToken":{ "address":"So11111111111111111111111111111111111111112", "name":"Wrapped SOL", "symbol":"SUN" }, "quoteTokenSymbol":"SUN", "price":"0.00003781", "priceUsd":"0.006096", "txns":{ "m5":{ "buys":790, "sells":496 }, "h1":{ "buys":5207, "sells":3570 ... }

But as we saw in the screenshot of the Messages, a WebSocket connection means dozens of messages exchanged, sometimes every second.

Not just a JSON.

How to ensure a continuous flow of messages?

Adding the while loop

To ensure that the connection does not close after the first message received, we will simply add a while loop.

To avoid saturating the target site, we will also add a 10 second pause between each message.

import asyncio import websockets import json import time async def dexscreener_scraper(): ... async with websockets.connect(uri, extra_headers=headers) as websocket: while True: message_raw = await websocket.recv() message = json.loads(message_raw) print(message) print('pause 10s :°') time.sleep(10) if __name__ == '__main__': asyncio.run(dexscreener_scraper())

So the code will work as follows:

  1. Open connection with async
  2. Enter the while loop
  3. Receive messages
  4. Take a 10-second break
  5. Start again

Now, we're going to sort through all this gargantuan flood of information.

Data parsing

We end up with a big JSON, with 4 primary keys:

  1. schemaVersion the schema type concerned
  2. type the type of message received
  3. stats general market statistics
  4. peers transaction information about our peers
  5. pairsCount the total number of peers listed on the dex

dexscreener websocket json message keys - image14.png

And in the pairs section, a list with the 200 Solana trending h6 pairs on DexScreener.

For each peer, an exhaustive JSON, which looks like this:

{ "chainId":"solana", "dexId":"raydium", "pairAddress":"6UYbX1x8YUcFj8YstPYiZByG7uQzAq2s46ZWphUMkjg5", "baseToken":{ "address":"3psH1Mj1f7yUfaD5gh6Zj7epE8hhrMkMETgv5TshQA4o", "name":"ate boden", "symbol":"floor" }, "quoteToken":{ "address":"So11111111111111111111111111111111111111112", "name":"Wrapped SOL", "symbol":"SUN" }, "quoteTokenSymbol":"SUN", "price":"0.0008552", "priceUsd":"0.1378", "txns":{ "m5":{ "buys":100, "sells":104 }, "h1":{ "buys":1381, "sells":1322 }, "h6":{ "buys":8599, "sells":8314 }, "h24":{ "buys":12408, "sells":12022 } }, "buyers":{ "m5":67, "h1":604, "h6":3190, "h24":4071 }, "sellers":{ "m5":70, "h1":599, "h6":2687, "h24":3525 }, "makers":{ "m5":125, "h1":1085, "h6":5134, "h24":6497 }, "volume":{ "m5":146154.45, "h1":1493348.38, "h6":11015417, "h24":13119735.64 }, "volumeBuy":{ "m5":66601.88, "h1":762583.78, "h6":5633787.49, "h24":6704894.52 }, "volumeSell":{ "m5":79552.56, "h1":730764.59, "h6":5381629.51, "h24":6414841.12 }, "priceChange":{ "m5":-3.45, "h1":10.76, "h6":175, "h24":255 }, "liquidity":{ "usd":1264268.15, "base":4577916, "quote":3926.1037 }, "marketCap":95193189, "pairCreatedAt":1709490601000, "ear":true, "profile":{ "ear":true, "website":true, "twitter":true, "linkCount":3, "imgKey":"d7e9ac" }, "c":"a", "a":"solamm" }

Long live jeo boden — the coin of conviction of this 2024 bull run. To buy it, click here: jeo boden | DexScreener.

We will now parse all the following attributes:

FIELDNAMES = [ "chain_id", "dex_id", "pair_address", "token_address", "token_name", "token_symbol", "token_m5_buys", "token_m5_sells", "token_h1_buys", "token_h1_sells", "token_h1_to_m5_buys", "token_liquidity", "token_market_cap", "token_created_at", "token_created_since", "token_eti", "token_header", "token_website", "token_twitter", "token_links", "token_img_key", "token_price_usd", "token_price_change_h24", "token_price_change_h6", "token_price_change_h1", "token_price_change_m5" ]

And create for each peer a properly structured dictionary, which we will save in a large DATA list.

Note that the time is in the format Unix Timestamp within the JSON. We convert it to a readable format with the methoddatetime.utcfromtimestamp.

With the following code for this second step.

import asyncio import websockets import json import time from datetime import datetime DATA = [] FIELDNAMES = [ "chain_id", "dex_id", "pair_address", "token_address", "token_name", "token_symbol", "token_m5_buys", "token_m5_sells", "token_h1_buys", "token_h1_sells", "token_h1_to_m5_buys", "token_liquidity", "token_market_cap", "token_created_at", "token_created_since", "token_eti", "token_header", "token_website", "token_twitter", "token_links", "token_img_key", "token_price_usd", "token_price_change_h24", "token_price_change_h6", "token_price_change_h1", "token_price_change_m5" ] async def dexscreener_scraper(): headers = { "Host": "io.dexscreener.com", "Connection": "Upgrade", "Pragma": "no-cache", "Cache-Control": "no-cache", "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36", "Upgrade": "websocket", "Origin": "https://dexscreener.com", "Sec-WebSocket-Version": 13, "Accept-Encoding": "gzip, deflate, br, zstd", "Accept-Language": "fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7" } type ="wss://io.dexscreener.com/dex/screener/pairs/h24/1?rankBy[key]=trendingScoreH6&rankBy[order]=desc" async with websockets.connect(uri, extra_headers=headers) as websocket: while True: message_raw = await websocket.recv() message = json.loads(message_raw) pairs = message["pairs"] assert pairs for pair in pairs: chain_id = pair["chainId"] dex_id = pair["dexId"] pair_address = pair["pairAddress"] assert pair_address token_address = pair["baseToken"]["address"] token_name = pair["baseToken"]["name"] token_symbol = pair["baseToken"]["symbol"] token_txns = pair["thx"] token_m5_buys = token_txns["m5"]["buys"] token_m5_sells = token_txns["m5"]["sells"] token_h1_buys = token_txns["h1"]["buys"] token_h1_sells = token_txns["h1"]["sells"] token_h1_to_m5_buys = round(token_m5_buys*12/token_h1_buys, 2) if token_m5_buys else None token_liquidity = pair["liquidity"]["usd"] token_market_cap = pair["marketCap"] token_created_at_raw = pair["pairCreatedAt"] token_created_at = token_created_at_raw / 1000 token_created_at = datetime.utcfromtimestamp(token_created_at) now_utc = datetime.utcnow() token_created_since = round((now_utc - token_created_at).total_seconds() / 60, 2) token_eti = pair.get("ear", False) token_header = pair.get("profile", {}).get("header", False) token_website = pair.get("profile", {}).get("website", False) token_twitter = pair.get("profile", {}).get("twitter", False) token_links = pair.get("profile", {}).get("linkCount", False) token_img_key = pair.get("profile", {}).get("imgKey", False) token_price_usd = pair["priceUsd"] token_price_change_h24 = pair["priceChange"]["h24"] token_price_change_h6 = pair["priceChange"]["h6"] token_price_change_h1 = pair["priceChange"]["h1"] token_price_change_m5 = pair["priceChange"]["m5"] VALUES = [ chain_id, dex_id, pair_address, token_address, token_name, token_symbol, token_m5_buys, token_m5_sells, token_h1_buys, token_h1_sells, token_h1_to_m5_buys, token_liquidity, token_market_cap, token_created_at, token_created_since, token_eti, token_header, token_website, token_twitter, token_links, token_img_key, token_price_usd, token_price_change_h24, token_price_change_h6, token_price_change_h1, token_price_change_m5 ] print(token_name, token_price_usd) row = dict(zip(FIELDNAMES, VALUES)) DATA.append(row) print('pause 10s :°') time.sleep(10) if __name__ == '__main__': asyncio.run(dexscreener_scraper())

Everything is in order!

We will finish this tutorial by exporting this data in CSV format.

Export to CSV file

Last step, because it gives greater overall readability, and it is simpler to process, we will export it all to CSV format.

Since we have a list of dictionaries, we can use csv.DictWriter.

writer = csv.DictWriter(f, fieldnames=FIELDNAMES, delimiter='\t')

Furthermore, we will save 200 lines every 10 seconds. How do I know when this backup took place?

We will add the collection timestamp to the name of each file.

file_created_at = int(time.time()) filename = 'dexscreener_%s.csv' % file_created_at

And the complete code… is available on the Gist, right there:

dexscreener_trending_solana_pairs_websocket_scraper.py

You can now launch the scraper, and… tada, all the data is instantly scraped, every 10 seconds, in an exhaustive, readable and structured file.

export dexscreener meme coins metrics to csv - image18.png

FAQ

Which programming language is most used for WebSocket scraping?

While browsing the web, we saw that 3 options emerged here and there:

  1. Python
  2. Go
  3. JavaScript

However, based on popularity, the answer is obvious.

python go and javascript google trends - image31.png

How to deal with so many DexScreener CSV files?

With 1 CSV file created every 10 seconds, you'll soon find yourself with a mountain of files to process.

How to prevent file inflation?

Export data to a large-scale structured SQL database.

With 3 simple advantages:

  1. Size under control
  2. Easy query processing
  3. Thread-safe

SQL Easy's How to Use SQL in Python: A Comprehensive Guide is the perfect place to start.

sql easy sql knowledge center snapshop - image10.png

Is there a DexScreener no-code web scraper?

No, not for the moment.

But if you are interested in the project, you can give us strength here! and add an upvote to speed up the development of the scraping tool:

DexScreener Live Crypto Prices Scraper | Lobstr

dexscreener crypto live prices scraper lobstr no code scraper idea - image29.png

Is it possible to scrape meme coin information from Twitter?

A meme coin obviously involves quantitative elements: number of transactions, liquidity, market cap, etc.

But it also relies on a strong community: the hodlers. Whose fidelity and size can also be measured quantitatively.

  1. Name of tweets
  2. Name of followers
  3. Number of views or likes per tweet …

This is what each coin highlights DexScreener, in the info section.

jeo boden social media links dexscreener - image3.png

And the right influencer’s tweet can cause a token’s valuation to explode.

We still all remember this unifying tweet, from 02/4/2021, which set the price of dogecoin on fire:

elon musk tweet doge coin is people crypto - image19.png

Is it possible to also scrape this information?

Yes completely!

If you need to scrape all of a person's tweets at regular intervals, and export it to a GoogleSheet, I recommend this powerful one in particular:

Twitter User Tweets Scraper | Lobstr

lobstr twitter user tweets scraper product page snapshot - image28.png

1516989175726.jpeg

Sasha Bouloudnine

Co-founder @ lobstr.io since 2019. Genuine data avid and lowercase aesthetic observer. Ensure you get the hot data you need.

Related Articles