Example 1
Let's now create a project that will scrap with mode 1
for some URLs I'll send using the provided endpoints.
To create the project, you can use the POST projects endpoint like this:
Request on Linux:
curl --location 'https://api.scrapingpros.com/projects' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer Your-Api-Key' \
--data '{"name": "Example 1", "priority":"1", "description":"project-description"}'
Request on Windows CMD:
curl --location "https://api.scrapingpros.com/projects" ^
--header "Content-Type: application/json" ^
--header "Authorization: Bearer Your-Api-Key" ^
--data "{\"name\": \"Example 1\", \"priority\":\"1\", \"description\":\"project-description\"}"
Response:
{
"message": "Project successfully created.",
"project": {
"id": 34,
"client_id": "my_client_id",
"name": "Example 1",
"description": "project-description",
"cost": null,
"priority": 1,
"status": "A",
"created_at": "2024-10-09T18:33:19.000Z",
"updated_at": "2024-10-09T18:33:19.000Z"
}
}
Now that we have created a project, let's start creating a batch and adding jobs to it: Let's now create a batch:
Request
curl -X POST https://api.scrapingpros.com/batches \
-H 'Content-Type: application/json' \
--header 'Authorization: Bearer "Your-api-token"' \
-d '{
"project": 32,
"name" : "First batch",
"priority" : 5,
"max_requests" : 1000
}'
Response
{
"message": "ok",
"batch": {
"id": 105,
"client_id": 16,
"project_id": 32,
"name": "First batch",
"cost": 0,
"priority": 5,
"status": "A",
"max_requests": 1000,
"created_at": "2024-09-23T15:32:41.000Z",
"updated_at": "2024-09-23T15:32:41.000Z"
}
}
Now we should append some jobs to this batch, like this:
curl -X POST https://api.scrapingpros.com/batches/104/append-jobs \
-H 'Content-Type: application/json' \
--header 'Authorization: Bearer "Your-api-token"' \
-d '{
"jobs": [
{
"url": "http://example.com/page1",
"scrap_mode": 2,
"arguments": {}
}
]
}'
Response
{
"response": {
"total_jobs": 2,
"total_invalid_jobs": 0,
"invalid_jobs": []
}
}
Please check the documentation for this endpoint if you want to know more about the different scrap modes available.
Now let's run this batch:
curl -X POST https://api.scrapingpros.com/batches/105/run \
-H 'Content-Type: application/json' \
--header 'Authorization: Bearer Your-api-token' \
-d '{}'
Response
{
"message": "batch id: 105 was set to run successfully. 45 jobs set to run"",
"batch": {
"id": 105,
"client_id": 16,
"project_id": 32,
"name": "First batch",
"cost": 0,
"priority": 5,
"status": "A",
"max_requests": 1000,
"created_at": "2024-09-23T15:32:41.000Z",
"updated_at": "2024-09-23T15:32:41.000Z"
}
}
That's it! all jobs have been set to run, we can check the htmls and status of this jobs using:
curl -X GET "https://api.scrapingpros.com/get_data/batch/105?html_only=true" \
--header 'Authorization: Bearer Your-api-key' \
-H "Content-Type: application/json"
Response
"pending": 0,
"results_count": 43,
"batch_id": "105",
"results": [
{
"job_id": 416352,
"url": "Url",
"html": "html obtained"
}
]
}
Here, we got get a list of all htmls obtained for this batch, we can also request for an html from a specific jobs using the same endpoint, but adding the job_id instead.