We recently collected some stats to determine which web scraping API performs better. One key metric was latency, so I want to focus on the testing script here.
To keep things fair, all tests were run on the same machine and around the same time of day.
Table of Contents
- Step 1: Imports & Config
- Step 2: API-Specific Functions
- Step 3: Percentiles & Test Runner
- Step 4: Main Logic & API Keys
- Step 5: Results
Step 1: Imports & Config
Let’s start by installing the needed libraries (excluding preinstalled ones):
pip install requests pandas numpy
Here’s a quick summary of what each library does in this project:
Library | Purpose / Description |
---|---|
requests |
For sending HTTP requests and interacting with APIs |
time |
Provides time-related functions (e.g. delays, timestamps) |
pandas |
Data manipulation and analysis using DataFrames |
numpy |
Numerical computing, array operations |
json |
Parsing and creating JSON-formatted data |
Now, import them into your project:
import requests
import time
import pandas as pd
import numpy as np
import json
Step 2: API-Specific Functions
Make a list of the web scraping APIs you want to test. In our case, we’re comparing the top 3: HasData, OxyLabs, and ScrapingBee.
Let’s set a test URL and define how many times each API will be called:
TEST_URL = "https://httpbin.org/html"
N_REPEATS = 100
Next, create function templates to measure response times:
def test_hasdata(api_key):
pass
def test_oxylabs(username, password):
pass
def test_scrapingbee(api_key):
pass
For example, a function for HasData might look like this:
def test_hasdata(api_key):
times = []
for _ in range(N_REPEATS):
url = "https://api.hasdata.com/scrape/web"
payload = json.dumps({
"url": TEST_URL,
"proxyType": "datacenter",
"proxyCountry": "US",
})
headers = {
'Content-Type': 'application/json',
'x-api-key': api_key
}
start = time.time()
resp = requests.request("POST", url, headers=headers, data=payload)
times.append(time.time() - start)
return times
Now do the same for the other APIs you want to compare.
Step 3: Percentiles & Test Runner
We’ll need a function to calculate percentiles:
def calc_percentiles(times):
return {
"p50": round(np.percentile(times, 50), 3),
"p75": round(np.percentile(times, 75), 3),
"p95": round(np.percentile(times, 95), 3)
}
This helps us understand the fastest response time for 50%, 75%, and 95% of requests.
Then write a function to run the tests:
def run_all_tests(credentials):
results = {}
results["HasData"] = calc_percentiles(test_hasdata(credentials["HasData"]))
results["Oxylabs"] = calc_percentiles(test_oxylabs(credentials["Oxylabs"]["username"], credentials["Oxylabs"]["password"]))
results["ScrapingBee"] = calc_percentiles(test_scrapingbee(credentials["ScrapingBee"]))
return pd.DataFrame(results).T
Step 4: Main Logic & API Keys
Now, let’s put everything together in a main function and add credentials for the APIs you’re testing:
if __name__ == "__main__":
credentials = {
"HasData": "YOUR-API-key",
"Oxylabs": {"username": "YOUR-USERNAME", "password": "YOUR-PASSWORD"},
"ScrapingBee": "YOUR-API-key"
}
df = run_all_tests(credentials)
print(df)
Step 5: Results
After running the script, you'll get real performance data for your selected APIs. For our set, here’s an example of the results:
Of course, judging web scraping APIs by speed alone isn't fair. In the next post, we’ll dig deeper into how these three services compare overall.
You can also check out the full article where we compared the 7 best web scraping APIs, this script was originally built for that.
Extra Resources:
Best Web Scraping APIs in 2025
Google SERP APIs Ranked by Speed, Cost, and Pain Points
Join to our Discord
What do you look for when choosing a web scraping API?
Top comments (0)