Introduction
Regression analysis models the relationship between a dependent variable (Y) and one or more independent variables (X). It's one of the most widely used tools in business analytics for prediction and understanding relationships.
Simple Linear Regression
Y = β₀ + β₁X + ε
β₀ = intercept, β₁ = slope, ε = error
Example
Sales = 100 + 5×Advertising
Base sales = ₹100; each ₹1 in advertising adds ₹5 in sales
Multiple Regression
Y = β₀ + β₁X₁ + β₂X₂ + ... + βₙXₙ + ε
Multiple regression allows you to control for other factors and understand relative importance of variables.
Key Assumptions
| Assumption | Description |
|---|---|
| Linearity | Relationship is linear |
| Independence | Errors are independent |
| Homoscedasticity | Constant error variance |
| Normality | Errors normally distributed |
| No multicollinearity | IVs not highly correlated |
Interpreting Results
- R²: % of variance explained (0-1, higher is better)
- p-value: Statistical significance (< 0.05)
- Coefficients: Effect size per unit change in X
Warning: Correlation ≠ causation. High R² doesn't prove causality.
Conclusion
Key Takeaways
- Regression models relationship between Y and X
- OLS minimizes sum of squared errors
- Check assumptions before interpreting
- R² shows variance explained
- Correlation ≠ causation