# /extract-document

## Create an Extraction Job

<mark style="color:green;">`POST`</mark> `https://waveline.ai/api/v1/extract-document`

Creates a new job that extracts information from a file using the specified shape.

#### Headers

| Name                                            | Type   | Description                   |
| ----------------------------------------------- | ------ | ----------------------------- |
| Content-Type                                    | String | Should be `application/json`. |
| Authorization<mark style="color:red;">\*</mark> | String | `Bearer <YOUR_API_KEY>`       |

#### Request Body

| Name                                          | Type   | Description                                                                                                                                                                                                                                                                                                                             |
| --------------------------------------------- | ------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| fileName<mark style="color:red;">\*</mark>    | String | The name of the file. The suffix may be used by the AI for smarter extraction.                                                                                                                                                                                                                                                          |
| contentType<mark style="color:red;">\*</mark> | String | MIME type of the file, such as `text/plain` or `application/pdf`.                                                                                                                                                                                                                                                                       |
| base64Content                                 | String | <p><span data-gb-custom-inline data-tag="emoji" data-code="26a0">⚠️</span> <strong>Only provide one of the three content properties.</strong><br><br>A string containing a base64 representation of the document to process.</p><p></p><p>Only accepts file sizes under 4.5MB, please use <code>contentUrl</code> for larger files.</p> |
| shape<mark style="color:red;">\*</mark>       | Shape  | Object of type [Shape](/extract/types/shape.md) that descibes what you want to extract                                                                                                                                                                                                                                                  |
| contentUrl                                    | String | <p><span data-gb-custom-inline data-tag="emoji" data-code="26a0">⚠️</span> <strong>Only provide one of the three content properties.</strong><br><br>A URL pointing to your data. (e.g. <https://example.com/invoice.pdf>)</p>                                                                                                          |
| textContent                                   | String | <p><span data-gb-custom-inline data-tag="emoji" data-code="26a0">⚠️</span> <strong>Only provide one of the three content properties.</strong></p><p></p><p>A string containing the paintext contents to process.</p>                                                                                                                    |

{% tabs %}
{% tab title="201: Created The job has been successfully created." %}

<pre class="language-typescript"><code class="lang-typescript"><strong>{
</strong>    "id": string,
    "createdAt": string,
    "status": "CREATED",
    "type": "extract",
    "pages": number, // Number of billed pages in this job
    "fileName": string,
    "result": null, // Is null after creation
    "urls": {
        "get": string; // Query this URL to get the status/result of your job
    }
}
</code></pre>

{% endtab %}

{% tab title="500: Internal Server Error An internal server error on our side happened. Please report to <team@waveline.ai> if this happens." %}

```typescript
{
    "error": string
}
```

{% endtab %}

{% tab title="400: Bad Request Missing body, wrong structure, ..." %}

<pre class="language-typescript"><code class="lang-typescript">{
<strong>    "error": string
</strong>}
</code></pre>

{% endtab %}

{% tab title="401: Unauthorized Provided API key is not valid." %}

```typescript
{
    "error": string
}
```

{% endtab %}

{% tab title="402: Payment Required Your account is missing a billing method." %}

<pre class="language-typescript"><code class="lang-typescript">{
<strong>    "error": string
</strong>}
</code></pre>

{% endtab %}
{% endtabs %}

## Example Usage

Here's an example of a JSON payload for the `/extract-document` endpoint.\
In this example, we extract the name and the total amount to pay from an invoice PDF file.&#x20;

```json
{
  "fileName": "invoice.pdf",
  "contentType": "application/pdf",
  "base64Content": "JVBERi0xLjMKMSAwIG9iago8PC9UeXBlL0NhdGF...",
  "shape": [
    {
      "name": "name",
      "type": "string",
      "description": "The name of the person who needs to pay the invoice",
      "isArray": false
    },
      "name": "total",
      "type": "number",
      "description": "Total amount to pay",
      "isArray": false
    }
  ]
}
```

To send this payload to the `/extract-document` endpoint, use the following `curl` command:

```bash
curl -X POST "https://waveline.ai/api/v1/extract-document" \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer YOUR_API_KEY" \
     -d '{ ... JSON payload ... }'
```

You can [get an API key here](https://waveline.ai/extract/dashboard/api-keys) if you already have an account.&#x20;

## Additional Notes

* If our system can't find the answer or is unsure of how to fill a field, we put `null`.
* Use accurate and descriptive names and descriptions for your shape elements. This helps us and will improve our performance.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.waveline.ai/extract/endpoints/extract-document.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
