December 8, 2023

Linear Regression:

Model Evaluation of Train Data:

Scatter Plot of Predicted vs Actual Values:
• scatter plot with the x-axis representing the actual values (y_train) and the y-axis representing the predicted values (y_pred).
• The closer the points are to the diagonal line (not explicitly shown but implied), the better the model’s predictions match the actual data.
• The points seem to align well along an increasing diagonal line, suggesting a good fit between the model’s predictions and actual values, especially as the total jobs number increases.
Distribution of Residuals:
• The second image is a histogram overlaid with a kernel density estimate that shows the distribution of the model’s residuals, which are the differences between the actual values and the predicted values.
• Ideally, the residuals should be normally distributed around zero, indicating that the model’s predictions are unbiased.
• The distribution looks approximately normal and centered around zero, which is a good sign, although there seems to be a slight right skew.

Model Evaluation of Validation Data:

Scatter Plot of Predicted vs Actual Values:

This plot compares the actual values (y-val) on the x-axis with the predicted values on the y-axis. Ideally, if predictions were perfect, all points would lie on the diagonal line which equals the predictions. The scatter shows that the model’s predictions are reasonably close to the actual values, although there is some variance, especially in the middle range of the actual values.
Distribution of Residuals:
The residuals are the differences between the actual and predicted values. This histogram shows the distribution of these residuals, with a superimposed kernel density estimate (KDE). The residuals seem to be approximately normally distributed, with a mean close to zero. This is a good sign, indicating that the model does not systematically overpredict or underpredict the total number of jobs. However, there is a noticeable spread, suggesting that there are predictions that are significantly off from the actual values, which is also reflected in the scatter plot.

Leave a Reply

Your email address will not be published. Required fields are marked *