Example 1

Let's now create a project that will scrap with mode 1 for some URLs I'll send using the provided endpoints.

To create the project, you can use the POST projects endpoint like this:

Request on Linux:

curl --location 'https://api.scrapingpros.com/projects' \
    --header 'Content-Type: application/json' \
    --header 'Authorization: Bearer Your-Api-Key' \
    --data '{"name": "Example 1", "priority":"1", "description":"project-description"}'

Request on Windows CMD:

curl --location "https://api.scrapingpros.com/projects" ^
    --header "Content-Type: application/json" ^
    --header "Authorization: Bearer Your-Api-Key" ^
    --data "{\"name\": \"Example 1\", \"priority\":\"1\", \"description\":\"project-description\"}"

Response:

{
    "message": "Project successfully created.",
    "project": {
        "id": 34,
        "client_id": "my_client_id",
        "name": "Example 1",
        "description": "project-description",
        "cost": null,
        "priority": 1,
        "status": "A",
        "created_at": "2024-10-09T18:33:19.000Z",
        "updated_at": "2024-10-09T18:33:19.000Z"
    }
}

Now that we have created a project, let's start creating a batch and adding jobs to it: Let's now create a batch:

Request

curl -X POST https://api.scrapingpros.com/batches \
  -H 'Content-Type: application/json' \
  --header 'Authorization: Bearer "Your-api-token"' \
  -d '{
    "project": 32,
    "name" : "First batch",
    "priority" : 5,
    "max_requests" : 1000
  }'

Response

{
    "message": "ok",
    "batch": {
        "id": 105,
        "client_id": 16,
        "project_id": 32,
        "name": "First batch",
        "cost": 0,
        "priority": 5,
        "status": "A",
        "max_requests": 1000,
        "created_at": "2024-09-23T15:32:41.000Z",
        "updated_at": "2024-09-23T15:32:41.000Z"
    }
}

Now we should append some jobs to this batch, like this:

curl -X POST https://api.scrapingpros.com/batches/104/append-jobs \
  -H 'Content-Type: application/json' \
  --header 'Authorization: Bearer "Your-api-token"' \
  -d '{
    "jobs": [
      {
        "url": "http://example.com/page1",
        "scrap_mode": 2,
        "arguments": {}
      }
      ]
  }'

Response

{
    "response": {
        "total_jobs": 2,
        "total_invalid_jobs": 0,
        "invalid_jobs": []
    }
}

Please check the documentation for this endpoint if you want to know more about the different scrap modes available.

Now let's run this batch:

curl -X POST https://api.scrapingpros.com/batches/105/run \
  -H 'Content-Type: application/json' \
  --header 'Authorization: Bearer Your-api-token' \
  -d '{}'

Response

{
    "message": "batch id: 105 was set to run successfully. 45 jobs set to run"",
    "batch": {
        "id": 105,
        "client_id": 16,
        "project_id": 32,
        "name": "First batch",
        "cost": 0,
        "priority": 5,
        "status": "A",
        "max_requests": 1000,
        "created_at": "2024-09-23T15:32:41.000Z",
        "updated_at": "2024-09-23T15:32:41.000Z"
            }
}

That's it! all jobs have been set to run, we can check the htmls and status of this jobs using:

curl -X GET "https://api.scrapingpros.com/get_data/batch/105?html_only=true" \
--header 'Authorization: Bearer Your-api-key' \
-H "Content-Type: application/json"

Response

    "pending": 0,
    "results_count": 43,
    "batch_id": "105",
    "results": [
        {
            "job_id": 416352,
            "url": "Url",
            "html": "html obtained"
        }
    ]
}

Here, we got get a list of all htmls obtained for this batch, we can also request for an html from a specific jobs using the same endpoint, but adding the job_id instead.