Shape
We use Shape to specify what you want to extract from a document. It's an array where each element is a DataShapeElement.
type Shape = DataShapeElement[];
Create a Shape
There are three main ways to build a Shape for your use case:
Use our intuitive shape tools in the dashboard.
Let us guess the shape you want to extract with our guess-shape endpoint.
Define your own shape based on our type definition.
Examples
Minimal Example
We extract the title of a provided newspaper article. The title should be a string.
[
{
"name": "title",
"type": "string",
"description": "The title of this newspaper article",
"isArray": false
}
]
Multiple fields
We extract multiple fields. Let's assume we process and invoice. We extract the name of the recipient and the total amount to pay.
[
{
"name": "last_name",
"type": "string",
"description": "The last name of the recipient ",
"isArray": false
},
"name": "total",
"type": "number",
"description": "Total amount to pay",
"isArray": false
}
]
Array
If we have a field with multiple answers, we can set isArray
to true. In this example, we process a conference paper and want to get back all the authors of that paper.
[
{
"name": "authors",
"type": "string",
"description": "All authors that are part of this paper",
"isArray": true
}
]
Object
In this example, we want to process mechanical parts that are mentioned in our document. For these parts, we specify an object with two properties: part_id
and shipping_price
. Both properties could be saved separately like in the example above, but in this case, we can simply save them as properties of this object:
[
{
"name": "part",
"type": "object",
"description": "A mechanical part to be shipped.",
"isArray": false,
"elements": [
{
"name": "part_id",
"type": "number",
"description": "The id of that part",
"isArray": false
},
{
"name": "shipping_price",
"type": "string",
"description": "Shipping price of that part",
"isArray": false
}
]
}
]
Advanced Shape
This example is a more complex variation of the previous one. Since our document can contain multiple parts to ship, we can simply set the isArray
property to true
and our results will contain a list of parts! Additionally, we can create another field called date that holds the current date.
"shape": [
{
"name": "part",
"type": "object",
"description": "Mechanical parts to be shipped",
"isArray": true,
"elements": [
{
"name": "part_id",
"type": "number",
"description": "The id of that part",
"isArray": false
},
{
"name": "sizes",
"type": "string",
"description": "Different sizes that this part can be produced",
"isArray": true
}
]
},
{
"name": "date",
"type": "string",
"description": "Current date in the format MM-DD-YYYY",
"isArray": false
}
]
Last updated