Invoice Extraction

We extract the necessary information to process invoices

Let's say we get a lot of invoices as a PDF. But for each invoice, we only want to extract the first and last name of the person that gets billed and the total amount to pay.

31KB

invoice.pdf

pdf

Now let's construct a Shape to extract those fields.

[
  {
    "name": "first_name",
    "type": "string",
    "description": "The first name of the one who receives the bill.",
    "isArray": false
  },
  {
    "name": "last_name",
    "type": "string",
    "description": "The last name of the one who receives the bill.",
    "isArray": false
  },
  {
    "name": "total",
    "type": "number",
    "description": "Total amount to pay",
    "isArray": false
  }
]

Once the shape is defined, we can call the /extract-document endpoint with it and the invoice PDF in the payload to create this job:

curl -X POST "https://waveline.ai/api/v1/extract-document" \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer YOUR_API_KEY" \
     -d '{
          "fileName": "invoice.pdf",
          "contentType": "application/pdf",
          "base64Content": "JVBERi0xLjMKMSAwIG9iago8PC9UeXBlL0NhdGF...",
          "shape": YOUR_SHAPE
        }'

We then process your call. Typically the job completion time lies between 10s and 3 minutes. From our request, we receive the following response:

{
    "id": "a5ecc735-c48e-43ea-a739-d42bfb19edb3" 
    "status": "CREATED"; 
    "type": "extract"; 
    "result": null
    "urls": {
        "get": "https://waveline.ai/api/v1/jobs/a5ecc735-c48e-43ea-a739-d42bfb19edb3"; 
    }
}

With urls["get"] we can now query that job. This calls our job endpoint with the correct job_id conveniently already pre-filled. If we call this URL 20s later when the job has finished, we get back the following:

{
    "id": "a5ecc735-c48e-43ea-a739-d42bfb19edb3" 
    "status": "FINISHED"; 
    "type": "extract"; 
    "result": {
        "first_name": "Ben",
        "last_name": "Timond",
        "total": 330.75,
     },
    "urls": {
        "get": "https://waveline.ai/api/v1/jobs/a5ecc735-c48e-43ea-a739-d42bfb19edb3"; 
    }
}

In this response, we see the job status has changed to FINISHED and the result field now contains our requested information.

PreviousDataShapeElement NextOrder Table Extraction

Last updated 2 years ago