Microsoft Linear Regression Algorithm Technical Reference

Indicates that the column cannot contain a null.


A verification email has been sent to your email address. Please click the link in the message to continue. The Locker is an important tool that supports the efforts of all Canadians involved in coach education. Access to this site will provide all coaches with the opportunity to track their progress and guide their development.

The Locker has been developed by the Coaching Association of Canada to support its mission of enhancing the experiences of all Canadian athletes through quality coaching. If you have any questions please contact us by email at coach coach. JavaScript must be enabled to use the Locker. Instructions A verification email has been sent to your email address. Invalid username or password Would you like to reset your password? This account is pending approval This account is under age.

Registering with The Locker To register for an account please provide your name, birthday, and email address. These are necessary to uniquely identify you in The Locker. Terms and Conditions Privacy Policy. Registration I want to create a Locker account for access to my transcript, eLearning opportunities, to sign up for coaching events, and more! In other words, although linear regression is based on a decision tree, the tree contains only a single root and no branches: With the parameter set in this way, the algorithm will never create a split, and therefore performs a linear regression.

The variable Y represents the output variable, X represents the input variable, and a and b are adjustable coefficients. You can retrieve the coefficients, intercepts, and other information about the regression formula by querying the completed mining model. All Analysis Services data mining algorithms automatically use feature selection to improve analysis and reduce processing load.

The method used for feature selection in linear regression is the interestingness score, because the model supports only supports continuous columns. For reference, the following table shows the difference in feature selection for the Linear Regression algorithm and the Decision Trees algorithm.

The Microsoft Linear Regression algorithm supports parameters that affect the behavior, performance, and accuracy of the resulting mining model.

You can also set modeling flags on the mining model columns or mining structure columns to control the way that data is processed. The following table lists the parameters that are provided for the Microsoft Linear Regression algorithm.

The Microsoft Linear Regression algorithm supports the following modeling flags. When you create the mining structure or mining model, you define modeling flags to specify how values in each column are handled during analysis.

For more information, see Modeling Flags Data Mining. Linear regression models are based on the Microsoft Decision Trees algorithm. However, even if you do not use the Microsoft Linear Regression algorithm, any decision tree model can contain a tree or nodes that represent a regression on a continuous attribute.

You do not need to specify that a continuous column represents a regressor. The sum of the residuals is calculated, and if the deviation is too great, a split is forced in the tree.

For example, if you are predicting customer purchasing behavior using income as an attribute, and set the REGRESSOR modeling flag on the [Income] column, the algorithm would first try to fit the values by using a standard regression formula. If the deviation is too great, the regression formula is abandoned and the tree would be split on some other attribute.

The decision tree algorithm would then try to fit a regressor for income in each of the branches after the split. A linear regression model must contain a key column, input columns, and at least one predictable column. The Microsoft Linear Regression algorithm supports the specific input columns and predictable columns that are listed in the following table.

For more information about what the content types mean when used in a mining model, see Content Types Data Mining. Cyclical and Ordered content types are supported, but the algorithm treats them as discrete values and does not perform special processing.