Back to main page

How to import data to Aito?

Concept

Aito currently takes table content in JSON format A support tool for other formats (e.g: csv) will be updated in the near future

API: Data is used to populate table content

API endpoint

/api/v1/data

Structure

Available operations:

  1. Upload a table's rows using batch (Currently restricted to payloads of 6MB and less)
  2. Upload a table using file upload API
  3. Delete rows using _delete end point

NOTE: The Stream API and file upload API should be considered to be in the beta-phase. Report any issues to us via support@aito.ai.

Aito's table:

A table's columns are based on the defined schema.

Example:

The table customers was defined in the schema as:

    "customers": {
      "type":"table",
      "columns": {
        "id"   : {"type": "Int" },
        "name" : {"type": "String" },
        "tags" : {"type": "Text", "analyzer": "Whitespace" }
      }
    }

Table customers should be JSON array that contains the records of the table. For example:

    [
      {"id":0, "name":"anne",     "tags":"london 20s"},
      {"id":1, "name":"bob",      "tags":"nyc 20s"},
      {"id":2, "name":"cecilia",  "tags":"nyc 30s"},
      {"id":3, "name":"david",    "tags":"london 30s"},
      {"id":4, "name":"elisabeth","tags":"london 40s"},
      {"id":5, "name":"felix",    "tags":"nyc 40s"},
      {"id":6, "name":"grace",    "tags":"nyc 50s"},
      {"id":7, "name":"harald",   "tags":"london 50s"},
      {"id":8, "name":"iris",     "tags":"nyc 60s"},
      {"id":9, "name":"john",     "tags":"london 60s"},
      {"id":10, "name":"jane",    "tags":"nyc 60s"},
      {"id":11, "name":"nancy",   "tags":"london 40s"}
    ]

1. Upload a table's rows using batch

The batch upload is used for payloads of 6MB and less

To upload a table's rows into Aito, send a POST request to

<_your_env_url_>/api/v1/data/<_table_name_>/batch

with the provided read-write api_key and body is the table content in JSON array format as the example above

The training environment has already been populated with the customers, products, and impressions table

2. Upload a table using file upload API

The file API allows circumventing the API payload size limit by allowing upload of large data sets. The file API accepts data in ndjson format (see http://ndjson.org/ for more details) , stored into a file. The data file is uploaded to AWS S3 and handled asynchronously by the database on the best effort basis.

The format of the uploaded file should be newline delimited JSON or ndjson, i.e. the individual json-elements separated by a newline character. Note that this is not the same as a JSON array, but rather individual elements. The file should additionally be gzipped to reduce the size of the transferred data.

  1. be in ndjson-format
  2. be gzipped before upload

The file API is not a single API, but requires a minimum of three calls (per table). The sequence is as follows:

  1. Initialise the upload process with a POST to /api/v1/data/<table_name>/file.

    curl -X POST https://environment.api.aito.ai/api/v1/data/<table_name>/file
    

    The response is equivalent to this example (see Aitoai test/demo env, or at your own environment in the path /api-docs/v3/#/data/post_api_data__table__file. With this URL in the response you will be able to upload your data file):

     {
         "id": "9b38de74-0694-46ad-b239-1af5626f1fc9",
         "url": "https://some-s3-bucket.s3.eu-west-1.amazonaws.com/theenv/thetable/...",
         "method": "PUT",
         "expires": "2018-07-30T14:41:55"
     }
    
    • id is the job id, and is used to control the upload process. It is used later to trigger populating data into Aito and to see the status of the process.
    • url is the S3 URL to which to upload the data file.
    • method is the only allowed method for the upload, currently always PUT.
    • expires is the latest time when the upload can start. After that the upload URL expires and is no longer valid.
  2. Upload the file to S3, using the signed URL you received. The upload can be done with any client, but in curl the command would be

    curl -X PUT -T data_as.ndjson.gz "https://some-s3-bucket.s3.eu-west-1.amazonaws.com/theenv/thetable/..."
    

    The data is expected to be uploaded with PUT, and the data as the body of the message, not e.g. as a form-upload.

  3. Trigger the database process by a sending a POST request to the path /api/v1/data/<table_name>/file/<id>

    curl -X POST https://environment.api.aito.ai/api/v1/data/<table_name>/file/9b38de74-0694-46ad-b239-1af5626f1fc9
    
  4. Sending a GET request to the same URL will show the current progress of the operation. The status will signal when the process has been finished.

    curl https://environment.api.aito.ai/api/v1/data/<table_name>/file/9b38de74-0694-46ad-b239-1af5626f1fc9
    

    And the response, which shows you the progress, as well as the last failing rows (max 50)

at { "status": { "finished": true, "completedCount": 20, "lastSuccessfulElement": { "primaryName": "Ania Josse", "birthYear": null, "nconst": "nm123456", "deathYear": null, "primaryProfession": "actress,miscellaneous" }, "startedAt": "20180730T172233.473+0300" }, "errors": { "message": "Last 0 failing rows", "rows": null } }

Non-error rows are always populated. You can see the error message from the status to fix erroneous data and populate again

3. Delete rows using _delete end point

To delete rows, send a POST request to

<_your_env_url_>/api/v1/data/_delete

with the provided read-write api_key.

The POST request body must contain a from field describing the target table and a where-clause identifying the deleted rows. All rows matching the where-clause proposition are deleted.

Please note, that an empty proposition will match/delete everything.

API end point

/api/v1/data/_delete

Format:

    { "from" : From, "where" : Proposition }

See also:

Examples:

As an example, the following _delete request will delete the user 4:

    POST /api/v1/data/_delete

    {
      "from" : "users",
      "where" : {
        "id" : 4
      }
    }

As another example, the following request deletes all impressions from the test users.

    POST /api/v1/data/_delete

    {
      "from" : "impressions",
      "where" : {
        "user.tags" : { "$match" : "test" }
      }
    }

As another example, the following request deletes all impressions with an old timestamp:

    POST /api/v1/data/_delete

    {
      "from" : "impressions",
      "where" : {
        "timestamp" : {
          "$lte": 1510444800
        }
      }
    }

As a warning: the following request will delete all users:

    POST /api/v1/data/_delete

    {
      "from" : "users",
      "where" : {}
    }

It is recommended to use the _search end point to see which rows the query matches before running the deletes by hand.