Mastering_Predictive_Analytics_with_Python by 2016

Mastering_Predictive_Analytics_with_Python by 2016

Author:2016
Language: eng
Format: epub


The parabola is a convex function because the values between x1 and x2 (the two points where the blue line intersects with the parabola) are always below the blue line representing α(F(x1))+(1-α) (F(x2)) . As you can see, the parabola also has a global minimum between these two points.

When we are dealing with matrices such as the Hessian referenced previously, this condition is fulflled by each element of the matrix being ≥ 0 , a property known as positive semidefnite, meaning any vector multiplied by this matrix on either side (xTHx) yields a value ≥ 0 . This means the function has a global minimum, and if our solution converges to a set of coeffcients, we can be guaranteed that they represent the best parameters for the model, not a local minimum. We noted previously that we could potentially offset imbalanced distribution of classes in our data by reweighting individual points during training. In the formulas for either SGD or IRLS, we could apply a weight wi for each data point, increasing or decreasing its relative contribution to the value of the likelihood and the updates made during each iteration of the optimization algorithm. Now that we have described how to obtain the optimal parameters of the logistic regression model, let us return to our example and apply these methods to our data. Fitting the model

We can use either the SGD or second-order methods to ft the logistic regression model to our data. Let us compare the results using SGD; we ft the model using the following command:

>>> log_model_sgd = linear_model.SGDClassifier(alpha=10,loss='log', penalty='l2',n_iter=1000, fit_intercept=False).fit(census_features_ train,census_income_train)

Where the parameter log for loss specifes that this is a logistic regression that we are training, and n_iter specifes the number of times we iterate over the training data to perform SGD, alpha represents the weight on the regularization term, and we specify that we do not want to ft the intercept to make comparison to other methods more straightforward (since the method of ftting the intercept could differ between optimizers). The penalty argument specifes the regularization penalty, which we saw in Chapter 4 , Connecting the Dots with Models – Regression Methods , already for ridge regression. As l2 is the only penalty we can use with second-order methods, we choose l2 here as well to allow comparison between the methods. We can examine the resulting model coeffcients by referencing the coeff_ property of the model object:

>>> log_model_sgd.coef_

Compare these coeffcients to the second-order ft we obtain using the following command:

>>> log_model_newton = linear_model.LogisticRegression(penalty='l2',solve r='lbfgs', fit_intercept=False).fit(census_features_train,census_income_ train

Like the SGD model, we remove the intercept ft to allow the most direct comparison of the coeffcients produced by the two methods., We fnd that the coeffcients are not identical, with the output of the SGD model containing several larger coeffcients. Thus, we see in practice that even with similar models and a convex objective function, different optimization methods can give different parameter results. However, we can see that the results are highly correlated based on a pairwise scatterplot of the coeffcients:

>>> plt.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.