Order Table Extraction
In this example, we are a company that sells many different products. Our customer sends us a small table of all the items they want to buy. However, since the customer creates the table, it can look slightly different every time. For example, they write Product ID instead of product_id
, Qty
instead of Quantity
, and the shipping times have no clear structure.
TZX22-EHZ2
100
2.76$
Tusp
1-2 days
ZUI23-772L6
250
5.00$
-
1 week
UIUU-13BMW
340'000
0.001$
puma
About a month
QUE2-AIME2
56
45.56$
lebra
Tomorrow
Waveline Extract makes it easy to unify these fields into one format we define.
Let's construct a Shape to extract product_id
, unit_price
, quantity
, and shipping_time
for each product:
[
{
"name": "products",
"type": "object",
"description": "All products from the table",
"isArray": true,
"elements": [
{
"name": "product_id",
"type": "string",
"description": "The id of that product. aka product number",
"isArray": false
},
{
"name": "quantity",
"type": "number",
"description": "Quantity of how many units. Aka Qty",
"isArray": false
},
{
"name": "unit_price",
"type": "number",
"description": "Unit price of that product in dollars",
"isArray": false
},
{
"name": "shipping_time",
"type": "string",
"description": "Time it takes to ship this product. (In days)",
"isArray": false
}
]
}
]
We can now call the /extract-document
endpoint with this shape and the table as the payload to create the job:
curl -X POST "https://waveline.ai/api/v1/extract-document" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"fileName": "OrderTable.txt",
"contentType": "application/pdf",
"base64Content": "JVBERi0xLjMKMSAwIG9iago8PC9UeXBlL0NhdGF...",
"shape": YOUR_SHAPE
}'
After some time, we query for the result of this job with the job endpoint and get the following in the result
field:
{
"products": [
{
"product_id": "TZX22-EHZ2",
"quantity": "100",
"unit_price": 2.76,
"shipping_time": "1-2 days"
},
{
"product_id": "ZUI23-772L6",
"quantity": "250",
"unit_price": 5,
"shipping_time": "1 week"
},
{
"product_id": "UIUU-13BMW",
"quantity": "340000",
"unit_price": 0.001,
"shipping_time": "About a month"
},
{
"product_id": "QUE2-AIME2",
"quantity": "56",
"unit_price": 45.56,
"shipping_time": "Tomorrow"
}
]
}
As we can see above, all the fields have successfully been unified into the format we defined!
Last updated