Binary classifiers able to learn how to predict binary outcomes. Binary classifiers are always supervised learning problems. The Scored Label is either 1 or 0. This is probably the most common type of Machine Learning algorithm.
In an AzureML binary classifier the Scored Probability is the probability that the Label should be 1. If the Scored Probability is less than 0.5 the Scored Label will be 0.
Visualized as large number of small trees. A Microsoft support person told me that the last tree is the one it uses, but I'm not convinced of that. The first tree usually looks pretty good.
Visualized a small number of large trees. The last tree is the one it uses.
Can run on Live date with nulls (or NaN's) and scores records where some fields are null/Nan.
Visualized as feature-sets with weights
Visualized as feature-sets with weights
Can run on Live date with nulls (or NaN's) but only scores records where all fields have values (i.e are not null/Nan)
Can run on Live date with nulls (or NaN's) but only scores records where all fields have values (i.e are not null/Nan)
Visualized as feature-sets with weights
Can run on Live date with nulls (or NaN's) but only scores records where all fields have values (i.e are not null/Nan)
Multiclass classifiers are always supervised learning problems.
Multiclass Decision Forest
Multiclass Decision Jungle
Multiclass Logistic Regression
Multiclass Neural Network
One-vs-All Multiclass
Regression models predict where a record might appear along a continuum given the supplied features. For example, predicting a house price based on features of the house. Regression problems are by nature always Supervised.
The scored dataset has two labels
Scored Label Mean
Scored Label Standard Deviation
These are attempts to put the supplied labels on a continuum.
Evaluation of Regression models can be done using one or more of the following statistics
AzureML currently supports the following Regression algorithms:-
The trained model has no useful visualization.
The “Scored Label Mean” is the prediction, and “Scored Label Standard Deviation” is the uncertainty around that prediction.
The trained model is visualized as 100 decision trees.
The trained model is visualized as 8 huge decision trees.
The “Scored Label Mean” is the prediction, and “Scored Label Standard Deviation” is the uncertainty around that prediction.
The trained model has no useful visualization.
The trained model has no useful visualization.
The trained model is visualized as a series of features and weights.
Anomaly Detection is normally unsupervised. We don't know in advance what an anomaly is, we can only train the algorithm on “normal” data. An email spam detector is a typical Anomaly Detection problem.
In AzureML Anomaly Detection the Scored Probability is the probability that the record is an Anomaly. If the Scored Probability is high the record is an anomaly, if it's low, the record closely matches the test data.
AzureML supports two algorithms suitable for Anomaly Detection
The One Class Support Vector Machine has no useful Visualization.
A trained One-Class model will run on Live data that contains null (or NaN) fields, but it will only Score records where all fields are present (i.e. the record has no null or NaN fields)
The PCA Anomaly Detection Model had no useful Visualization.
A trained PCA-Based model will fail to run on Live data if the data contains null (or NaN) fields.
Clustering Models learn how to group records into n-clusters. This can be done either Supervised or Unsupervised.
Supervised clustering is used for predicting which records should fall into which predefined categories. This is really just a Multi-Class classifier.
An unsupervised clustering algorithm will define it's own categorizations which may not correspond to anything intuitive to the human observer. In the real-world these algorithms are used for Data Discovery problems such as discovering market segmentation.