Back to main page

Evaluate

Evaluate API can be used to evaluate statistical inference performance in train/test split

Description

NOTE: Evaluate API is under progress and it will change!

NOTE: Evaluate uses unoptimized selections to work, which come with a performance penalty.

API end point

/api/v1/_evaluate

Format:

    {
      "from" : From, 
      "train" : null | Proposition, 
      "test" : Proposition, 
      "select" : null | Selection, 
      "evaluate" : EvaluateOperation
    }

See also:

Evaluate predict

When the user sends a message, chatbot needs to predict correct response operation. Evaluate the correctness of this response operation, for the last user based on the user's message. In this example, the samples, where the user is 10, are selected for testing. All the samples, where the user is not 10 are selected as the training sample by default.

POST /api/v1/_evaluate

    {
      "from":"messages",
      "test" : { "user" : 10 },
      "evaluate": {
        "where":["message"],
        "predict" : "operation"
      },
      "select" : ["trainSamples", "testSamples", "error", "baseError"]
    }

Result

Error describes that how often the prediction was correct for the test data. Base error is the error, gained by ignoring the input features and instead always predicting the most common item. For example, if you have two options, where A likelihood is 60% and B likelihood is 40%, base accuracy would be 60%, and it would be based on the scenario, where prediction is always A.

    {
      "trainSamples" : 20,
      "testSamples" : 10,
      "error" : 0.0,
      "baseError" : 0.9
    }

Evaluate match

Let's test, how well the chatbot manages to recommend products. In the example, we are using only the samples, where the operation is recommended. The training data and test data are split using a modulo on the row index.

POST /api/v1/_evaluate

    {
      "from":"messages",
      "train" : { "operation" : "recommend", "$index": {"$or":[{"$mod":[3, 0]}, {"$mod":[3, 1]}]}},
      "test" : { "operation" : "recommend", "$index": {"$mod":[3, 2]} },
      "evaluate": {
        "where":["message"],
        "match" : "product"
      },
      "select" : ["trainSamples", "testSamples", "meanRank", "baseMeanRank"]
    }

Result

In this case, we are using meanRank and meanBaseRank to evaluate the results. Note: that there are only 2 training samples, which partly explains the quality of the results.

    {
      "trainSamples" : 3,
      "testSamples" : 2,
      "meanRank" : 5.0,
      "baseMeanRank" : 3.5
    }

Evaluate similar

Let's test, how well the chatbot manages to recommend products with TD-IDF based full-text search. We can drop the training data, because it is not needed in searches.

POST /api/v1/_evaluate

    {
      "from"  : "messages",
      "train" : { "$index" : -1 },
      "test"  : { "operation" : "recommend" },
      "evaluate": {
        "query": "message",
        "get": "product",
        "similarity" : ["title", "description"]
      },
      "select" : ["trainSamples", "testSamples", "meanRank", "baseMeanRank"]
    }

Result

In this case, we are using meanRank and meanBaseRank to evaluate the results.

    {
      "trainSamples" : 0,
      "testSamples" : 5,
      "meanRank" : 3.4,
      "baseMeanRank" : 5.0
    }