Invoice Extraction
We extract the necessary information to process invoices
Let's say we get a lot of invoices as a PDF. But for each invoice, we only want to extract the first and last name of the person that gets billed and the total amount to pay.

Now let's construct a Shape to extract those fields.
[
{
"name": "first_name",
"type": "string",
"description": "The first name of the one who receives the bill.",
"isArray": false
},
{
"name": "last_name",
"type": "string",
"description": "The last name of the one who receives the bill.",
"isArray": false
},
{
"name": "total",
"type": "number",
"description": "Total amount to pay",
"isArray": false
}
]
Once the shape is defined, we can call the /extract-document
endpoint with it and the invoice PDF in the payload to create this job:
curl -X POST "https://waveline.ai/api/v1/extract-document" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"fileName": "invoice.pdf",
"contentType": "application/pdf",
"base64Content": "JVBERi0xLjMKMSAwIG9iago8PC9UeXBlL0NhdGF...",
"shape": YOUR_SHAPE
}'
We then process your call. Typically the job completion time lies between 10s and 3 minutes. From our request, we receive the following response:
{
"id": "a5ecc735-c48e-43ea-a739-d42bfb19edb3"
"status": "CREATED";
"type": "extract";
"result": null
"urls": {
"get": "https://waveline.ai/api/v1/jobs/a5ecc735-c48e-43ea-a739-d42bfb19edb3";
}
}
With urls["get"]
we can now query that job. This calls our job endpoint with the correct job_id
conveniently already pre-filled.
If we call this URL 20s later when the job has finished, we get back the following:
{
"id": "a5ecc735-c48e-43ea-a739-d42bfb19edb3"
"status": "FINISHED";
"type": "extract";
"result": {
"first_name": "Ben",
"last_name": "Timond",
"total": 330.75,
},
"urls": {
"get": "https://waveline.ai/api/v1/jobs/a5ecc735-c48e-43ea-a739-d42bfb19edb3";
}
}
In this response, we see the job status has changed to FINISHED
and the result
field now contains our requested information.
Last updated