# /extract-document

## Create an Extraction Job

<mark style="color:green;">`POST`</mark> `https://waveline.ai/api/v1/extract-document`

Creates a new job that extracts information from a file using the specified shape.

#### Headers

| Name                                            | Type   | Description                   |
| ----------------------------------------------- | ------ | ----------------------------- |
| Content-Type                                    | String | Should be `application/json`. |
| Authorization<mark style="color:red;">\*</mark> | String | `Bearer <YOUR_API_KEY>`       |

#### Request Body

| Name                                          | Type   | Description                                                                                                                                                                                                                                                                                                                             |
| --------------------------------------------- | ------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| fileName<mark style="color:red;">\*</mark>    | String | The name of the file. The suffix may be used by the AI for smarter extraction.                                                                                                                                                                                                                                                          |
| contentType<mark style="color:red;">\*</mark> | String | MIME type of the file, such as `text/plain` or `application/pdf`.                                                                                                                                                                                                                                                                       |
| base64Content                                 | String | <p><span data-gb-custom-inline data-tag="emoji" data-code="26a0">⚠️</span> <strong>Only provide one of the three content properties.</strong><br><br>A string containing a base64 representation of the document to process.</p><p></p><p>Only accepts file sizes under 4.5MB, please use <code>contentUrl</code> for larger files.</p> |
| shape<mark style="color:red;">\*</mark>       | Shape  | Object of type [Shape](https://docs.waveline.ai/extract/types/shape) that descibes what you want to extract                                                                                                                                                                                                                             |
| contentUrl                                    | String | <p><span data-gb-custom-inline data-tag="emoji" data-code="26a0">⚠️</span> <strong>Only provide one of the three content properties.</strong><br><br>A URL pointing to your data. (e.g. <https://example.com/invoice.pdf>)</p>                                                                                                          |
| textContent                                   | String | <p><span data-gb-custom-inline data-tag="emoji" data-code="26a0">⚠️</span> <strong>Only provide one of the three content properties.</strong></p><p></p><p>A string containing the paintext contents to process.</p>                                                                                                                    |

{% tabs %}
{% tab title="201: Created The job has been successfully created." %}

<pre class="language-typescript"><code class="lang-typescript"><strong>{
</strong>    "id": string,
    "createdAt": string,
    "status": "CREATED",
    "type": "extract",
    "pages": number, // Number of billed pages in this job
    "fileName": string,
    "result": null, // Is null after creation
    "urls": {
        "get": string; // Query this URL to get the status/result of your job
    }
}
</code></pre>

{% endtab %}

{% tab title="500: Internal Server Error An internal server error on our side happened. Please report to <team@waveline.ai> if this happens." %}

```typescript
{
    "error": string
}
```

{% endtab %}

{% tab title="400: Bad Request Missing body, wrong structure, ..." %}

<pre class="language-typescript"><code class="lang-typescript">{
<strong>    "error": string
</strong>}
</code></pre>

{% endtab %}

{% tab title="401: Unauthorized Provided API key is not valid." %}

```typescript
{
    "error": string
}
```

{% endtab %}

{% tab title="402: Payment Required Your account is missing a billing method." %}

<pre class="language-typescript"><code class="lang-typescript">{
<strong>    "error": string
</strong>}
</code></pre>

{% endtab %}
{% endtabs %}

## Example Usage

Here's an example of a JSON payload for the `/extract-document` endpoint.\
In this example, we extract the name and the total amount to pay from an invoice PDF file.&#x20;

```json
{
  "fileName": "invoice.pdf",
  "contentType": "application/pdf",
  "base64Content": "JVBERi0xLjMKMSAwIG9iago8PC9UeXBlL0NhdGF...",
  "shape": [
    {
      "name": "name",
      "type": "string",
      "description": "The name of the person who needs to pay the invoice",
      "isArray": false
    },
      "name": "total",
      "type": "number",
      "description": "Total amount to pay",
      "isArray": false
    }
  ]
}
```

To send this payload to the `/extract-document` endpoint, use the following `curl` command:

```bash
curl -X POST "https://waveline.ai/api/v1/extract-document" \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer YOUR_API_KEY" \
     -d '{ ... JSON payload ... }'
```

You can [get an API key here](https://waveline.ai/extract/dashboard/api-keys) if you already have an account.&#x20;

## Additional Notes

* If our system can't find the answer or is unsure of how to fill a field, we put `null`.
* Use accurate and descriptive names and descriptions for your shape elements. This helps us and will improve our performance.
