Validation of External Models
This example shows how to validate models existing outside of MATLAB® using Modelscape™ software. An external model can be implemented in Python®, R, SAS®, or it can be a MATLAB model deployed on a web service.
This example covers the use of external models from the point of view of a model validator or other model end user, under the assumption that a model has been deployed to a microservice created using, for example, Flask (Python) or Plumber (R). The example also explains how to implement these microservices and alternative interfaces.
You call an external model from MATLAB to evaluate it with different inputs. You then implement the API for any external model so you can call it from MATLAB.
Call External Model from MATLAB
Set up an interface and call an externally deployed model. This example uses a Python model, but you can customize the example for a model in other programming languages.
Configure External Model
Use the Python code in the Flask Interface section to set up a mock credit scoring model. This model adds noise to the input data, scales it, and returns the value as the output credit score of the applicant. Run the script to make this model available in a test development server. The output shows the URL of the server, such as http://127.0.0.1:5000
. Set this URL as the value of ModelURL
below. In an actual model validation exercise, the model developer can provide this information as part of the validation request.
ModelURL = "http://127.0.0.1:5000/";
Set up a connection to this model.
extModel = externalModelClient("RootURL",ModelURL)
extModel = ExternalModel with properties: InputNames: "income" ParameterNames: "weight" OutputNames: "score"
The model expects a single input called income
and a single parameter called weight
. The function returns a single output called score
. The model developer should have explained the types and sizes of the model in the model documentation. You can also find this information in the InputDefinition
, ParameterDefinition
, and OutputDefinition
properties of extModel
. The sizes
property is empty, which indicates that the model expects a scalar.
extModel.InputDefinition
ans = struct with fields:
name: "income"
sizes: []
dataType: [1×1 struct]
extModel.InputDefinition.dataType
ans = struct with fields:
name: "double"
Evaluate Model
Use the evaluate
function of ExternalModel
on your model. This function expects two inputs:
The first input must be a table. Each row of the table must consist of the data for a single customer, or a single 'run'. The table is then a 'batch of runs'. The variable names of the table must match the
InputNames
shown byExternalModel
.The second input is a struct whose fields match the
ParameterNames
shown byExternalModel
. The values carried by this struct apply to all the runs in the batch. If the model has no parameters, omit this input.
The primary output is a table whose variable names match the OutputNames
shown by ExternalModel.
The rows correspond to the runs in the input batch. There may also be run-specific diagnostics consisting of one struct per run and a single batch diagnostic struct.
Use random numbers in the range [0, 100,000] as customer incomes for the input data. For parameters, use a weight of 1.1.
N = 1000;
income = 1e5*rand(N,1);
inputData = table(income,VariableNames="income");
parameters = struct(weight=1.1);
Call your model. The output is a table with the same size as the inputs.
[modelScores, diagnostics, batchDiagnostics] = evaluate(extModel, inputData, parameters); head(modelScores)
score _______ Row_1 102.2 Row_2 131.1 Row_3 45.658 Row_4 24.337 Row_5 -1.0934 Row_6 29.777 Row_7 55.824 Row_8 98.42
Create a mock response variable by thresholding the income. Validate the scores of the model against this response variable.
defaultIndicators = income < 20000; aurocMetric = mrm.data.validation.pd.AUROC(defaultIndicators, modelScores.score); formatResult(aurocMetric)
ans = "Area under ROC curve is 0.8462"
visualize(aurocMetric);
The model returns some diagnostics to illustrate their size and shape. The run-specific diagnostics
are a single structure with a field for every run.
diagnostics.Row_125
ans = struct with fields:
noise: -4.9172e+04
In this case, each structure contains a noise term that the model uses to predict the credit score. Batch diagnostics consist of a single structure containing information shared across all runs. In this case, the structure contains the elapsed valuation time at the server side.
batchDiagnostics
batchDiagnostics = struct with fields:
valuationTime: 0.0020
Extra Arguments
Internally, ExternalModel
talks to the model through a REST API. If necessary, modify the headers and HTTP options used in this exchange by passing extra Headers
and Options
arguments to externalModelClient. The
Headers
arguments must be matlab.net.http.HeaderField
objects. The Options
arguments must be matlab.net.http.HTTPOptions
objects.
For example, extend the connection timeout to 20 seconds.
options = matlab.net.http.HTTPOptions(ConnectTimeout=20); extModel = externalModelClient(RootURL=ModelURL,Options=options)
Implement External Model Interface
Implement an API for an external model to call it from MATLAB.
The externalModelClient
function creates an mrm.validation.external.ExternalModel
object. This object communicates with the external model through a REST API. The object works with any model that implements the API you implement in this example.
Endpoints
The API must implement two endpoints:
/signature
must accept a GET request and return a JSON string carrying the information about inputs, parameters, and outputs./
evaluate
must accept a POST request with inputs and parameters in a JSON format and must return a payload containing outputs, diagnostics, and batch diagnostics as a JSON string.
The status code for a successful response must be 200 OK
, which is the default in Flask, for example.
Evaluation Inputs
The /evaluate
endpoint accepts a payload in this format.
The columns
in inputs
list the input names, index
specifies the row names, data
contains the actual input data one row at the time, and parameters
records the parameters with their values. The asterisks indicate the values, for example, double
or string
data.
The inputs
datum is compatible with the construction of Pandas DataFrames with split
orientation. For more information, see the example implementation in the Flask Interface section.
Response Formats
The /signature
endpoint returns a payload in this format:
The /evaluate
endpoint returns a payload in this format:
The outputs
data are compatible with the JSON output of Pandas dataframes with split
orientation.
Work with Alternative APIs
You can make external models available to a model validator in MATLAB when the default API is impossible or inconvenient to implement, for example, when your organization already has a preferred REST API for evaluating models. In this case, implement an API class that inherits from mrm.validation.external.ExternalModelBase
. Package this implementation in a +mrm/+validation/+external/
folder on the MATLAB path.
This custom API must populate the InputNames
, ParameterNames
, and OutputNames
properties shown to the validator after an externalModelClient
call. The API must also implement the evaluate
function, which takes as inputs a table and a structure as in the default API ExternalModel
. The custom API must serialize the inputs, manage the REST API calls, and deserialize the outputs into tables and structures, as shown in Evaluate Model.
When you implement a custom API as, say, mrm.validation.external.CustomAPI
, the validator can initialize a connection to the model through this client by adding an APIType
argument to the externalModelClient
call. This API also passes additional arguments to CustomAPI
.
extModelNew = externalModelClient("APIType", "CustomAPI", "RootURL", ModelURL)
Flask Interface
This Python code configures the external model you use in the first part of this example.
from flask import Flask, request, jsonify import pandas as pd import numpy as np import time toyModel = Flask(__name__) @toyModel.route('/evaluate', methods=['POST']) def calc(): start = time.time() data = request.get_json() inputData = data['inputs'] inputDF = pd.DataFrame(inputData['data'], columns=inputData['columns'], index=inputData['index']) parameters = data['parameters'] noise = np.random.uniform(low = -50000, high=50000, size=inputDF.shape) outDF = inputDF.rename(columns={'income':'score'}) outDF = outDF.add(noise) outDF = outDF.mul(parameters['weight']/1000) diagnostics = pd.DataFrame(noise, columns=["noise"], index=inputDF.index) end = time.time() batchDiagnostics = {'valuationTime' : end - start} output = {'outputs': outDF.to_json(orient='split'), 'diagnostics' : diagnostics.to_dict(orient='index'), 'batchDiagnostics' : batchDiagnostics} return output @toyModel.route('/signature', methods=['GET']) def getInputs(): outData = { 'inputs': [{"name": "income", "dataType": {"name": "double"},"sizes": []}], 'parameters': [{"name": "weight", "dataType": {"name": "double"}, "sizes": []}], 'outputs': [{"name": "score", "dataType": {"name": "double"}, "sizes": []}] } return(jsonify(outData)) if __name__ == '__main__': toyModel.run(debug=True, host='0.0.0.0')