Waveline Extract
  • Waveline Extract
    • Introduction
    • Getting Started
  • Endpoints
    • /extract-document
    • /guess-shape
    • /raw-extract
    • /jobs
    • /jobs/{id}
    • /me
  • Types
    • Shape
    • DataShapeElement
  • Examples
    • Invoice Extraction
    • Order Table Extraction
    • Email Extraction
    • CV Extraction
    • Raw Extraction
  • Additional Material
    • FAQ
    • Limitations
  • Pricing
    • Pages
Powered by GitBook
On this page
  1. Examples

Order Table Extraction

PreviousInvoice ExtractionNextEmail Extraction

Last updated 1 year ago

In this example, we are a company that sells many different products. Our customer sends us a small table of all the items they want to buy. However, since the customer creates the table, it can look slightly different every time. For example, they write Product ID instead of product_id, Qty instead of Quantity, and the shipping times have no clear structure.

Product ID
Qty
Unit price
Manufacturar
Shipping Time

TZX22-EHZ2

100

2.76$

Tusp

1-2 days

ZUI23-772L6

250

5.00$

-

1 week

UIUU-13BMW

340'000

0.001$

puma

About a month

QUE2-AIME2

56

45.56$

lebra

Tomorrow

Waveline Extract makes it easy to unify these fields into one format we define.

Let's construct a to extract product_id, unit_price, quantity, and shipping_time for each product:

[
  {
    "name": "products",
    "type": "object",
    "description": "All products from the table",
    "isArray": true,
    "elements": [
      {
        "name": "product_id",
        "type": "string",
        "description": "The id of that product. aka product number",
        "isArray": false
      },
      {
        "name": "quantity",
        "type": "number",
        "description": "Quantity of how many units. Aka Qty",
        "isArray": false
      },
      {
        "name": "unit_price",
        "type": "number",
        "description": "Unit price of that product in dollars",
        "isArray": false
      },
      {
        "name": "shipping_time",
        "type": "string",
        "description": "Time it takes to ship this product. (In days)",
        "isArray": false
      }
    ]
  }
]
curl -X POST "https://waveline.ai/api/v1/extract-document" \
     -H "Content-Type: application/json" \
     -H "Authorization: Bearer YOUR_API_KEY" \
     -d '{
          "fileName": "OrderTable.txt",
          "contentType": "application/pdf",
          "base64Content": "JVBERi0xLjMKMSAwIG9iago8PC9UeXBlL0NhdGF...",
          "shape": YOUR_SHAPE
        }'
{
  "products": [
    {
      "product_id": "TZX22-EHZ2",
      "quantity": "100",
      "unit_price": 2.76,
      "shipping_time": "1-2 days"
    },
    {
      "product_id": "ZUI23-772L6",
      "quantity": "250",
      "unit_price": 5,
      "shipping_time": "1 week"
    },
    {
      "product_id": "UIUU-13BMW",
      "quantity": "340000",
      "unit_price": 0.001,
      "shipping_time": "About a month"
    },
    {
      "product_id": "QUE2-AIME2",
      "quantity": "56",
      "unit_price": 45.56,
      "shipping_time": "Tomorrow"
    }
  ]
}

As we can see above, all the fields have successfully been unified into the format we defined!

We can now call the endpoint with this shape and the table as the payload to create the job:

After some time, we query for the result of this job with the endpoint and get the following in the result field:

/extract-document
job
Shape
7KB
OrderTable.docx