Email Extraction

In this example, we imagine we are a hotel that receives E-Mails from new customers who want to book a room. We want to extract the following information:

  • Name of the guest (first and last name)

  • Room preferences (Things like the guests preferred room location, room size, bed type etc.)

  • Reservation number (Unique number that identifies this reservation)

  • Stay details (Start date and end date of the stay)

Emails from guests could look the following way:

Dear Resort Aroma,

I would like to book a room from the 25th of April until the 30th of April.
When I looked through the website, I saw that you have rooms with a balcony. 
It would be wonderful if I could get such a room.  

Richard Fulmar

We first construct a corresponding Shape to extract those fields:

    "name": "name",
    "type": "string",
    "description": "First and last name of the guest",
    "isArray": false
    "name": "room_preference",
    "type": "string",
    "description": "Guest's preferred room location, room size, bed type,...",
    "isArray": false
    "name": "reservation_number",
    "type": "number",
    "description": "Unique number that identifies this reservation",
    "isArray": false
    "name": "stay_details",
    "type": "string",
    "description": "Start date and end date of the stay. (MM-DD-YYYY). It's 2023",
    "isArray": false

With the shape defined, we call the /extract-document endpoint with this shape and the table as the payload:

curl -X POST "" \
     -H "Content-Type: text/plain" \
     -H "Authorization: Bearer YOUR_API_KEY" \
     -d '{
          "fileName": "email134.txt",
          "contentType": "application/pdf",
          "base64Content": "JVBERi0xLjMKMSAwIG9iago8PC9UeXBlL0NhdGF...",
          "shape": YOUR_SHAPE

After some time, we query for the result of this job with the job endpoint and find the following in the result field:

  "name": "Richard Fulmar",
  "room_preference": "room with a balcony",
  "reservation_number": "unsure",
  "stay_details": "04-25-2023 to 04-30-2023"

Last updated