Thursday, October 20, 2016

Model Training

For Classifiers

Module	Description
Train Model	Trains a classification or regression model from a training set. Takes an Untrained Model and the Training Data and Trains a Model.
Tune Hyperparameters	Many Models have Hyperparameters, so it's better to replace the Train Model with a Tune Hyperparameters module. It takes an Untrained Model, the Training Data, and some Validation Data and Trains a Model and Tunes the Hyperparameters.

For Regression

Module	Description
Train Model	Trains a classification or regression model from a training set. Takes an Untrained Model and the Training Data and Trains a Model.
Tune Hyperparameters	Many Models have Hyperparameters, so it's better to replace the Train Model with a Tune Hyperparameters module. It takes an Untrained Model, the Training Data, and some Validation Data and Trains a Model and Tunes the Hyperparameters.

For Anomaly Detection

Module	Description
Train Anomaly Detection Model	Trains an anomaly detector model and labels data from a training set. Takes an Untrained Model and the Training Data and Trains a Model.

For Clustering

Module	Description
Train Clustering Model	Trains a clustering model and assigns data from the training set to clusters. Takes an Untrained Model and the Training Data and Trains a Model.
Sweep Clustering	Performs a parameter sweep on a clustering model to determine the optimum parameter settings and trains the best model. Takes an Untrained Model and the Training Data and Trains a Model.

Wednesday, October 19, 2016

Statistical Measures

Azure ML Evaluation results often include some statistical measures that need some explanation.

Here is a brief summary:-

For Classifiers

Measure	Description
True Positive	A count of the number of positive outcomes that the algorithm predicted correctly (TP)
True Negative	A count of the number of negative outcomes that the algorithm predicted correctly (TN)
False Positive	A count of the number of positive outcomes that the algorithm predicted incorrectly (FP)
False Negative	A count of the number of negative outcomes that the algorithm predicted incorrectly (FN)
Precision	The proportion of predicted positives that are classified correctly: TP/(TP+FP)
Recall	The proportion of actual positives which are classified correctly: TP/(TP+FN)
Accuracy	The proportion of all values classified correctly: (TP+TN)/(TP+TN+FP+FN). Accuracy is not a reliable metric for the real performance of a classifier.
F1 Score	The F1 score is the harmonic mean of precision and recall: F1 = 2 * ((precision/recall)/(precision+recall)). The F1 Score is a good metric for the real performance of a classifier since it includes both precision and recall.
AUC	Area Under the Curve: This is the are under the Receiver Operating Characteristic (ROC) curve. This is the probability that the classifier will rank a randomly chosen positive instance higher than a randomly chosen negative instance. The AUC is a good metric for the real performance of a classifier since it includes both precision and recall.

For Regression

Measure	Description
Negative Loglikelihood	The Negative Loglikelihood is a measure of the variance of the actual data from the predicted values. A regression model attempts to reach the lowest Negative Loglikelihood. A low value indicates a well trained model.
Mean Absolute Error	A low value indicates a well trained model.
Root Mean Squared Error	A low value indicates a well trained model.
Relative Absolute Error	A low value indicates a well trained model.
Relative Squared Error	A low value indicates a well trained model.
Coefficient of Determination (R2)	A statistical measure of how well the regression line approximates the real data points. The coefficient of determination ranges from 0 to 1. An R2 of 1 indicates that the regression line perfectly fits the data, but low values can be entirely normal.

For Clustering

It's not clear to me yet what best indicates a well trained Clustering model.

Measure	Description
Average Distance to Cluster Center	The average closeness of all points in a cluster to the centroid of that cluster.
Average Distance to Other Center	The average closeness of all points in a cluster to the centroid of all clusters.
Number of Points	The number of points in that cluster
Maximal Distance To Cluster Center	The sum of the distances between each point and the centroid of that point’s cluster.

Tuesday, October 18, 2016

AzureML Machine Learning Models Summary

Two-Class (Binary) Classifiers

Binary classifiers able to learn how to predict binary outcomes. Binary classifiers are always supervised learning problems. The Scored Label is either 1 or 0. This is probably the most common type of Machine Learning algorithm.

In an AzureML binary classifier the Scored Probability is the probability that the Label should be 1. If the Scored Probability is less than 0.5 the Scored Label will be 0.

Two-Class Boosted Decision Tree

Visualized as large number of small trees. A Microsoft support person told me that the last tree is the one it uses, but I'm not convinced of that. The first tree usually looks pretty good.

Two-Class Boosted Decision Forest

Visualized a small number of large trees. The last tree is the one it uses.

Can run on Live date with nulls (or NaN's) and scores records where some fields are null/Nan.

Two-Class Bayes Point Machine

Visualized as feature-sets with weights

Two-Class Logistic Regression

Visualized as feature-sets with weights

Can run on Live date with nulls (or NaN's) but only scores records where all fields have values (i.e are not null/Nan)

Two-Class Neural Network

Can run on Live date with nulls (or NaN's) but only scores records where all fields have values (i.e are not null/Nan)

Two-Class Averaged Perceptron

Visualized as feature-sets with weights

Can run on Live date with nulls (or NaN's) but only scores records where all fields have values (i.e are not null/Nan)

Multiclass Classifiers

Multiclass classifiers are always supervised learning problems.

Multiclass Decision Forest
Multiclass Decision Jungle
Multiclass Logistic Regression
Multiclass Neural Network
One-vs-All Multiclass

Regression

Regression models predict where a record might appear along a continuum given the supplied features. For example, predicting a house price based on features of the house. Regression problems are by nature always Supervised.

The scored dataset has two labels

Scored Label Mean
Scored Label Standard Deviation

These are attempts to put the supplied labels on a continuum.

Evaluation of Regression models can be done using one or more of the following statistics

Negative Log Likelihood
Mean Absolute Error
Root Mean Squared Error
Relative Absolute Error
Relative Squared Error
Coefficient of Determination

AzureML currently supports the following Regression algorithms:-

Bayesian Linear Regression

The trained model has no useful visualization.

The “Scored Label Mean” is the prediction, and “Scored Label Standard Deviation” is the uncertainty around that prediction.

Boosted Decision Tree Regression

The trained model is visualized as 100 decision trees.

Decision Forest Regression

The trained model is visualized as 8 huge decision trees.

The “Scored Label Mean” is the prediction, and “Scored Label Standard Deviation” is the uncertainty around that prediction.

Fast Forest Quantile Regression

The trained model has no useful visualization.

Linear Regression

Neural Network Regression

The trained model has no useful visualization.

Ordinal Regression

Poisson Regression

The trained model is visualized as a series of features and weights.

Anomaly Detection

Anomaly Detection is normally unsupervised. We don't know in advance what an anomaly is, we can only train the algorithm on “normal” data. An email spam detector is a typical Anomaly Detection problem.

In AzureML Anomaly Detection the Scored Probability is the probability that the record is an Anomaly. If the Scored Probability is high the record is an anomaly, if it's low, the record closely matches the test data.

AzureML supports two algorithms suitable for Anomaly Detection

One-Class SVM (Support Vector Machine)

The One Class Support Vector Machine has no useful Visualization.

A trained One-Class model will run on Live data that contains null (or NaN) fields, but it will only Score records where all fields are present (i.e. the record has no null or NaN fields)

PCA (Principal Component Analysis) -Based Anomaly Detection

The PCA Anomaly Detection Model had no useful Visualization.

A trained PCA-Based model will fail to run on Live data if the data contains null (or NaN) fields.

Clustering

Clustering Models learn how to group records into n-clusters. This can be done either Supervised or Unsupervised.

Supervised clustering is used for predicting which records should fall into which predefined categories. This is really just a Multi-Class classifier.

An unsupervised clustering algorithm will define it's own categorizations which may not correspond to anything intuitive to the human observer. In the real-world these algorithms are used for Data Discovery problems such as discovering market segmentation.

Azure Machine Learning (a.k.a AzureML)

Thursday, October 20, 2016

Model Training

For Classifiers

For Regression

For Anomaly Detection

For Clustering

Wednesday, October 19, 2016

Statistical Measures

For Classifiers

For Regression

For Clustering

Tuesday, October 18, 2016

AzureML Machine Learning Models Summary

Two-Class (Binary) Classifiers

Two-Class Boosted Decision Tree

Two-Class Boosted Decision Forest

Two-Class Bayes Point Machine

Two-Class Logistic Regression

Two-Class Neural Network

Two-Class Averaged Perceptron

Multiclass Classifiers

Regression

Bayesian Linear Regression

Boosted Decision Tree Regression

Decision Forest Regression

Fast Forest Quantile Regression

Linear Regression

Neural Network Regression

Ordinal Regression

Poisson Regression

Anomaly Detection

One-Class SVM (Support Vector Machine)

PCA (Principal Component Analysis) -Based Anomaly Detection

Clustering

K-Means Clustering