Azure Machine Learning (a.k.a AzureML): AzureML Machine Learning Models Summary

Two-Class (Binary) Classifiers

Binary classifiers able to learn how to predict binary outcomes. Binary classifiers are always supervised learning problems. The Scored Label is either 1 or 0. This is probably the most common type of Machine Learning algorithm.

In an AzureML binary classifier the Scored Probability is the probability that the Label should be 1. If the Scored Probability is less than 0.5 the Scored Label will be 0.

Two-Class Boosted Decision Tree

Visualized as large number of small trees. A Microsoft support person told me that the last tree is the one it uses, but I'm not convinced of that. The first tree usually looks pretty good.

Two-Class Boosted Decision Forest

Visualized a small number of large trees. The last tree is the one it uses.

Can run on Live date with nulls (or NaN's) and scores records where some fields are null/Nan.

Two-Class Bayes Point Machine

Visualized as feature-sets with weights

Two-Class Logistic Regression

Visualized as feature-sets with weights

Can run on Live date with nulls (or NaN's) but only scores records where all fields have values (i.e are not null/Nan)

Two-Class Neural Network

Can run on Live date with nulls (or NaN's) but only scores records where all fields have values (i.e are not null/Nan)

Two-Class Averaged Perceptron

Visualized as feature-sets with weights

Can run on Live date with nulls (or NaN's) but only scores records where all fields have values (i.e are not null/Nan)

Multiclass Classifiers

Multiclass classifiers are always supervised learning problems.

Multiclass Decision Forest
Multiclass Decision Jungle
Multiclass Logistic Regression
Multiclass Neural Network
One-vs-All Multiclass

Regression

Regression models predict where a record might appear along a continuum given the supplied features. For example, predicting a house price based on features of the house. Regression problems are by nature always Supervised.

The scored dataset has two labels

Scored Label Mean
Scored Label Standard Deviation

These are attempts to put the supplied labels on a continuum.

Evaluation of Regression models can be done using one or more of the following statistics

Negative Log Likelihood
Mean Absolute Error
Root Mean Squared Error
Relative Absolute Error
Relative Squared Error
Coefficient of Determination

AzureML currently supports the following Regression algorithms:-

Bayesian Linear Regression

The trained model has no useful visualization.

The “Scored Label Mean” is the prediction, and “Scored Label Standard Deviation” is the uncertainty around that prediction.

Boosted Decision Tree Regression

The trained model is visualized as 100 decision trees.

Decision Forest Regression

The trained model is visualized as 8 huge decision trees.

The “Scored Label Mean” is the prediction, and “Scored Label Standard Deviation” is the uncertainty around that prediction.

Fast Forest Quantile Regression

The trained model has no useful visualization.

Linear Regression

Neural Network Regression

The trained model has no useful visualization.

Ordinal Regression

Poisson Regression

The trained model is visualized as a series of features and weights.

Anomaly Detection

Anomaly Detection is normally unsupervised. We don't know in advance what an anomaly is, we can only train the algorithm on “normal” data. An email spam detector is a typical Anomaly Detection problem.

In AzureML Anomaly Detection the Scored Probability is the probability that the record is an Anomaly. If the Scored Probability is high the record is an anomaly, if it's low, the record closely matches the test data.

AzureML supports two algorithms suitable for Anomaly Detection

One-Class SVM (Support Vector Machine)

The One Class Support Vector Machine has no useful Visualization.

A trained One-Class model will run on Live data that contains null (or NaN) fields, but it will only Score records where all fields are present (i.e. the record has no null or NaN fields)

PCA (Principal Component Analysis) -Based Anomaly Detection

The PCA Anomaly Detection Model had no useful Visualization.

A trained PCA-Based model will fail to run on Live data if the data contains null (or NaN) fields.

Clustering

Clustering Models learn how to group records into n-clusters. This can be done either Supervised or Unsupervised.

Supervised clustering is used for predicting which records should fall into which predefined categories. This is really just a Multi-Class classifier.

An unsupervised clustering algorithm will define it's own categorizations which may not correspond to anything intuitive to the human observer. In the real-world these algorithms are used for Data Discovery problems such as discovering market segmentation.

Azure Machine Learning (a.k.a AzureML)

Tuesday, October 18, 2016

AzureML Machine Learning Models Summary