Back to main page

API:Data

Concept

aito currently takes table content in json format Support tool for other format (e.g: csv) will be updated in the near future

API: Data is used to populate table content

API end point

/api/v1/data

Structure

Available operations:

  1. Upload a table's rows using batch (Currently restricted to payloads of 6MB and less)
  2. Upload a table using stream API
  3. Upload a table using file upload API

NOTE: Stream API and file upload API should be considered to be in BETA phase. Report any issues to us, please.

aito's table:

A table's columns is based on the defined schema

Example:

The table customers was defined in the schema as:

    "customers": {
      "type":"table",
      "columns": {
        "id"   : {"type": "Int" },
        "name" : {"type": "String" },
        "tags" : {"type": "Text", "analyzer": "Whitespace" }
      }
    }

Table customers should be JSON array that contains the records of the table. For example:

    [
      {"id":0, "name":"anne",     "tags":"london 20s"},
      {"id":1, "name":"bob",      "tags":"nyc 20s"},
      {"id":2, "name":"cecilia",  "tags":"nyc 30s"},
      {"id":3, "name":"david",    "tags":"london 30s"},
      {"id":4, "name":"elisabeth","tags":"london 40s"},
      {"id":5, "name":"felix",    "tags":"nyc 40s"},
      {"id":6, "name":"grace",    "tags":"nyc 50s"},
      {"id":7, "name":"harald",   "tags":"london 50s"},
      {"id":8, "name":"iris",     "tags":"nyc 60s"},
      {"id":9, "name":"john",     "tags":"london 60s"},
      {"id":10, "name":"jane",    "tags":"nyc 60s"}
    ]

1. Upload a table's rows using batch

The batch upload is used for payloads of 6MB and less

To upload a table's rows into aito, send a POST request to

<_your_env_url_>/api/v1/data/<_table_name_>/batch

with the provided read-write api_key and body is the table content in JSON array format as the example above

The training environment has already been populate with the customers, products, and impressions table

2. Upload a table using stream API

The stream API:

    <_your_env_url_>/api/v1/data/<_table_name_>/stream

will allow you to upload data similar to the batch API, but without having to format it into a json array. The data should instead be formatted as individual json elements separated by a newline. The lines will be parsed in FIFO order, much the same as in the batch API, and stored into the database in a single operation.

3. Upload a table using file upload API

The file API, allows handling massive data sizes, limited by the database capacity, and to a lesser degree by S3 size restrictions. The file API accepts data in the same format as the stream-API, but stored into a file, which is uploaded and handled asynchronously by the database on a best effort basis.

The format of the uploaded file should be newline delimited JSON or ndjson, i.e. the individual json-elements separated by a newline character. Note that this is not the same as a JSON array, but rather individual elements. This is to allow upload size far exceeding the size of the size that can be kept in memory at any instant. The file should additionally be gzipped to reduce the size of the transferred data. In summary, the file should:

  1. be in ndjson-format
  2. be gzipped before upload

The file API is not a single API, but requires at a minimum three calls (per table). The sequence is as follows:

  1. Initialise the upload process with a POST to /api/v1/data/<table_name>/file.

    curl -X POST https://environment.api.aito.ai/api/v1/data/<table_name>/file
    

    The response is equivalent to this example (see Aitoai test/demo env, or at your own environment in the path /api-docs/v3/#/data/post_api_data__table__file. With this URL in the response you will be able to upload your data file):

     {
         "id": "9b38de74-0694-46ad-b239-1af5626f1fc9",
         "url": "https://some-s3-bucket.s3.eu-west-1.amazonaws.com/theenv/thetable/...",
         "method": "PUT",
         "expires": "2018-07-30T14:41:55"
     }
    
  2. Upload the file to S3, using the signed URL you received. The upload can be done with any client, but in curl the command would be

    curl -X PUT -T data_as.ndjson.gz "https://some-s3-bucket.s3.eu-west-1.amazonaws.com/theenv/thetable/..."
    

    The data is expected to be uploaded with PUT, and the data as the body of the message, not e.g. as a form-upload.

  3. Trigger the database process by a sending a POST request to the path /api/v1/data/<table_name>/file/<id>

    curl -X POST https://environment.api.aito.ai/api/v1/data/<table_name>/file/9b38de74-0694-46ad-b239-1af5626f1fc9
    
  4. Sending a GET request to the same URL will show the current progress of the operation. The status will signal when the process has been finished.

    curl https://environment.api.aito.ai/api/v1/data/<table_name>/file/9b38de74-0694-46ad-b239-1af5626f1fc9
    

    And the response, which shows you the progress, as well as the last failing rows (max 50)

     {
         "status": {
             "finished": true,
             "completedCount": 20,
             "lastSuccessfulElement": {
                 "primaryName": "Ania Josse",
                 "birthYear": null,
                 "nconst": "nm123456",
                 "deathYear": null,
                 "primaryProfession": "actress,miscellaneous"
             },
             "startedAt": "20180730T172233.473+0300"
         },
         "errors": {
             "message": "Last 0 failing rows",
             "rows": null
         }
     }
    

Non-error rows is always populated. You can see the error message from the status to fix erronouse data and populate again