Shape

We use Shape to specify what you want to extract from a document. It's an array where each element is a DataShapeElement.

type Shape = DataShapeElement[];

Create a Shape

There are three main ways to build a Shape for your use case:

  • Use our intuitive shape tools in the dashboard.

  • Let us guess the shape you want to extract with our guess-shape endpoint.

  • Define your own shape based on our type definition.

Examples

Minimal Example

We extract the title of a provided newspaper article. The title should be a string.

[
  {
    "name": "title",
    "type": "string",
    "description": "The title of this newspaper article",
    "isArray": false
  }
]

Multiple fields

We extract multiple fields. Let's assume we process and invoice. We extract the name of the recipient and the total amount to pay.

[
  {
    "name": "last_name",
    "type": "string",
    "description": "The last name of the recipient ",
    "isArray": false
  },
    "name": "total",
    "type": "number",
    "description": "Total amount to pay",
    "isArray": false
  }
]

Array

If we have a field with multiple answers, we can set isArray to true. In this example, we process a conference paper and want to get back all the authors of that paper.

[
  {
    "name": "authors",
    "type": "string",
    "description": "All authors that are part of this paper",
    "isArray": true
  }
]

Object

In this example, we want to process mechanical parts that are mentioned in our document. For these parts, we specify an object with two properties: part_id and shipping_price. Both properties could be saved separately like in the example above, but in this case, we can simply save them as properties of this object:

[
  {
    "name": "part",
    "type": "object",
    "description": "A mechanical part to be shipped.",
    "isArray": false,
    "elements": [
      {
        "name": "part_id",
        "type": "number",
        "description": "The id of that part",
        "isArray": false
      },
      {
        "name": "shipping_price",
        "type": "string",
        "description": "Shipping price of that part",
        "isArray": false
      }
    ]
  }
]

Advanced Shape

This example is a more complex variation of the previous one. Since our document can contain multiple parts to ship, we can simply set the isArray property to true and our results will contain a list of parts! Additionally, we can create another field called date that holds the current date.

"shape": [
  {
    "name": "part",
    "type": "object",
    "description": "Mechanical parts to be shipped",
    "isArray": true,
    "elements": [
      {
        "name": "part_id",
        "type": "number",
        "description": "The id of that part",
        "isArray": false
      },
      {
        "name": "sizes",
        "type": "string",
        "description": "Different sizes that this part can be produced",
        "isArray": true
      }
    ]
  },
  {
    "name": "date",
    "type": "string",
    "description": "Current date in the format MM-DD-YYYY",
    "isArray": false
  }
]

Last updated