/extract-document

Create a new Job for extraction.

Create an Extraction Job

POST https://waveline.ai/api/v1/extract-document

Creates a new job that extracts information from a file using the specified shape.

Headers

Name

Type

Description

Content-Type

String

Should be application/json.

Authorization*

String

Bearer <YOUR_API_KEY>

Request Body

Name

Type

Description

fileName*

String

The name of the file. The suffix may be used by the AI for smarter extraction.

contentType*

String

MIME type of the file, such as text/plain or application/pdf.

base64Content

String

⚠️ Only provide one of the three content properties. A string containing a base64 representation of the document to process.

Only accepts file sizes under 4.5MB, please use contentUrl for larger files.

shape*

Shape

Object of type Shape that descibes what you want to extract

contentUrl

String

⚠️ Only provide one of the three content properties. A URL pointing to your data. (e.g. https://example.com/invoice.pdf)

textContent

String

⚠️ Only provide one of the three content properties.

A string containing the paintext contents to process.

{
    "id": string,
    "createdAt": string,
    "status": "CREATED",
    "type": "extract",
    "pages": number, // Number of billed pages in this job
    "fileName": string,
    "result": null, // Is null after creation
    "urls": {
        "get": string; // Query this URL to get the status/result of your job
    }
}

{
    "error": string
}

{
    "error": string
}

{
    "error": string
}

{
    "error": string
}

Example Usage

Here's an example of a JSON payload for the /extract-document endpoint. In this example, we extract the name and the total amount to pay from an invoice PDF file.

{
  "fileName": "invoice.pdf",
  "contentType": "application/pdf",
  "base64Content": "JVBERi0xLjMKMSAwIG9iago8PC9UeXBlL0NhdGF...",
  "shape": [
    {
      "name": "name",
      "type": "string",
      "description": "The name of the person who needs to pay the invoice",
      "isArray": false
    },
      "name": "total",
      "type": "number",
      "description": "Total amount to pay",
      "isArray": false
    }
  ]
}

To send this payload to the /extract-document endpoint, use the following curl command:

curl -X POST "https://waveline.ai/api/v1/extract-document" \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer YOUR_API_KEY" \
     -d '{ ... JSON payload ... }'

You can get an API key here if you already have an account.

Additional Notes

If our system can't find the answer or is unsure of how to fill a field, we put null.
Use accurate and descriptive names and descriptions for your shape elements. This helps us and will improve our performance.

PreviousGetting Started Next/guess-shape

Last updated 2 years ago