#### When?
#### Why?
lm()
- linear models.A typical model has the form response ~ terms
first + second
all the terms in first together with all the terms in second with duplicates removed.first : second
indicates the set of terms obtained by taking the interactions of all terms in first with all terms in second.first * second
indicates the cross of first and second. This is the same as first + second + first:second.y ~ x - 1
or y ~ 0 + x
.lm(y ~ 0 + x1 + x2 * x3)
lm()
- linear models.lm()
- linear models.The output is a list
containing coefficients, residuals, etc.. ::: {.cell output-location=‘column-fragment’}
List of 12
$ coefficients : Named num [1:2] 0.844 0.683
..- attr(*, "names")= chr [1:2] "(Intercept)" "hwy"
$ residuals : Named num [1:234] -2.658 0.342 -2.024 -0.341 -2.608 ...
..- attr(*, "names")= chr [1:234] "1" "2" "3" "4" ...
$ effects : Named num [1:234] -257.89 -62.1 -1.9 -0.21 -2.46 ...
..- attr(*, "names")= chr [1:234] "(Intercept)" "hwy" "" "" ...
$ rank : int 2
$ fitted.values: Named num [1:234] 20.7 20.7 22 21.3 18.6 ...
..- attr(*, "names")= chr [1:234] "1" "2" "3" "4" ...
$ assign : int [1:2] 0 1
$ qr :List of 5
..$ qr : num [1:234, 1:2] -15.2971 0.0654 0.0654 0.0654 0.0654 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:234] "1" "2" "3" "4" ...
.. .. ..$ : chr [1:2] "(Intercept)" "hwy"
.. ..- attr(*, "assign")= int [1:2] 0 1
..$ qraux: num [1:2] 1.07 1.06
..$ pivot: int [1:2] 1 2
..$ tol : num 1e-07
..$ rank : int 2
..- attr(*, "class")= chr "qr"
$ df.residual : int 232
$ xlevels : Named list()
$ call : language lm(formula = cty ~ hwy, data = mpg)
$ terms :Classes 'terms', 'formula' language cty ~ hwy
.. ..- attr(*, "variables")= language list(cty, hwy)
.. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:2] "cty" "hwy"
.. .. .. ..$ : chr "hwy"
.. ..- attr(*, "term.labels")= chr "hwy"
.. ..- attr(*, "order")= int 1
.. ..- attr(*, "intercept")= int 1
.. ..- attr(*, "response")= int 1
.. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. ..- attr(*, "predvars")= language list(cty, hwy)
.. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. ..- attr(*, "names")= chr [1:2] "cty" "hwy"
$ model :'data.frame': 234 obs. of 2 variables:
..$ cty: int [1:234] 18 21 20 21 16 18 18 18 16 20 ...
..$ hwy: int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
..- attr(*, "terms")=Classes 'terms', 'formula' language cty ~ hwy
.. .. ..- attr(*, "variables")= language list(cty, hwy)
.. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:2] "cty" "hwy"
.. .. .. .. ..$ : chr "hwy"
.. .. ..- attr(*, "term.labels")= chr "hwy"
.. .. ..- attr(*, "order")= int 1
.. .. ..- attr(*, "intercept")= int 1
.. .. ..- attr(*, "response")= int 1
.. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. .. ..- attr(*, "predvars")= language list(cty, hwy)
.. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. .. ..- attr(*, "names")= chr [1:2] "cty" "hwy"
- attr(*, "class")= chr "lm"
:::
lm()
- linear models.lm()
- linear models.
Call:
lm(formula = cty ~ hwy, data = mpg)
Residuals:
Min 1Q Median 3Q Max
-2.9247 -0.7757 -0.0428 0.6965 4.6096
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.84420 0.33319 2.534 0.0119 *
hwy 0.68322 0.01378 49.585 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.252 on 232 degrees of freedom
Multiple R-squared: 0.9138, Adjusted R-squared: 0.9134
F-statistic: 2459 on 1 and 232 DF, p-value: < 2.2e-16
easystats
package(s)easystats
package(s). report::report()
We fitted a linear model (estimated using OLS) to predict cty with hwy
(formula: cty ~ hwy). The model explains a statistically significant and
substantial proportion of variance (R2 = 0.91, F(1, 232) = 2458.64, p < .001,
adj. R2 = 0.91). The model's intercept, corresponding to hwy = 0, is at 0.84
(95% CI [0.19, 1.50], t(232) = 2.53, p = 0.012). Within this model:
- The effect of hwy is statistically significant and positive (beta = 0.68, 95%
CI [0.66, 0.71], t(232) = 49.58, p < .001; Std. beta = 0.96, 95% CI [0.92,
0.99])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald t-distribution approximation.
easystats
package(s). report::report()
Extra example model: ::: {.cell}
:::
We fitted a linear model (estimated using OLS) to predict cty with hwy, drv,
displ and cyl (formula: cty ~ hwy + drv + displ + cyl). The model explains a
statistically significant and substantial proportion of variance (R2 = 0.93,
F(5, 228) = 637.44, p < .001, adj. R2 = 0.93). The model's intercept,
corresponding to hwy = 0, drv = 4, displ = 0 and cyl = 0, is at 4.40 (95% CI
[2.62, 6.18], t(228) = 4.86, p < .001). Within this model:
- The effect of hwy is statistically significant and positive (beta = 0.64, 95%
CI [0.60, 0.69], t(228) = 26.01, p < .001; Std. beta = 0.90, 95% CI [0.83,
0.97])
- The effect of drv [f] is statistically significant and negative (beta =
-0.75, 95% CI [-1.20, -0.29], t(228) = -3.25, p = 0.001; Std. beta = -0.18, 95%
CI [-0.28, -0.07])
- The effect of drv [r] is statistically significant and negative (beta =
-0.97, 95% CI [-1.55, -0.40], t(228) = -3.33, p = 0.001; Std. beta = -0.23, 95%
CI [-0.36, -0.09])
- The effect of displ is statistically non-significant and negative (beta =
-0.02, 95% CI [-0.36, 0.33], t(228) = -0.09, p = 0.930; Std. beta = -4.65e-03,
95% CI [-0.11, 0.10])
- The effect of cyl is statistically significant and negative (beta = -0.36,
95% CI [-0.62, -0.11], t(228) = -2.79, p = 0.006; Std. beta = -0.14, 95% CI
[-0.23, -0.04])
Standardized parameters were obtained by fitting the model on a standardized
version of the dataset. 95% Confidence Intervals (CIs) and p-values were
computed using a Wald t-distribution approximation.
easystats
package(s). report::report()
report()
actually works on many other objects than regression models
Example with t.test()
Effect sizes were labelled following Cohen's (1988) recommendations.
The Welch Two Sample t-test testing the difference of height by gender (mean in
group feminine = 163.57, mean in group masculine = 182.39) suggests that the
effect is negative, statistically significant, and large (difference = -18.82,
95% CI [-29.92, -7.72], t(7.81) = -3.93, p = 0.005; Cohen's d = -2.81, 95% CI
[-4.73, -0.81])
easystats
package(s). parameters::model_parameters()
Parameter | Coefficient | SE | 95% CI | t(232) | p
-----------------------------------------------------------------
(Intercept) | 0.84 | 0.33 | [0.19, 1.50] | 2.53 | 0.012
hwy | 0.68 | 0.01 | [0.66, 0.71] | 49.58 | < .001
Parameter | Coefficient | SE | 95% CI | t(228) | p
-------------------------------------------------------------------
(Intercept) | 4.40 | 0.91 | [ 2.62, 6.18] | 4.86 | < .001
hwy | 0.64 | 0.02 | [ 0.60, 0.69] | 26.01 | < .001
drv [f] | -0.75 | 0.23 | [-1.20, -0.29] | -3.25 | 0.001
drv [r] | -0.97 | 0.29 | [-1.55, -0.40] | -3.33 | 0.001
displ | -0.02 | 0.18 | [-0.36, 0.33] | -0.09 | 0.930
cyl | -0.36 | 0.13 | [-0.62, -0.11] | -2.79 | 0.006
easystats
package(s). parameters::model_parameters()
Parameter | Coefficient | SE | 95% CI | t(232) | p
-----------------------------------------------------------------
(Intercept) | 0.84 | 0.33 | [0.19, 1.50] | 2.53 | 0.012
hwy | 0.68 | 0.01 | [0.66, 0.71] | 49.58 | < .001
Parameter | Coefficient | SE | 95% CI | t(228) | p
-------------------------------------------------------------------
(Intercept) | 4.40 | 0.91 | [ 2.62, 6.18] | 4.86 | < .001
hwy | 0.64 | 0.02 | [ 0.60, 0.69] | 26.01 | < .001
drv [4] | 0.00 | | | |
drv [f] | -0.75 | 0.23 | [-1.20, -0.29] | -3.25 | 0.001
drv [r] | -0.97 | 0.29 | [-1.55, -0.40] | -3.33 | 0.001
displ | -0.02 | 0.18 | [-0.36, 0.33] | -0.09 | 0.930
cyl | -0.36 | 0.13 | [-0.62, -0.11] | -2.79 | 0.006
easystats
package(s). parameters::model_parameters()
easystats
package(s). parameters::compare_parameters()
Parameter | mpg_model1 | mpg_model2
-------------------------------------------------------
(Intercept) | 0.84 (0.19, 1.50) | 4.40 ( 2.62, 6.18)
hwy | 0.68 (0.66, 0.71) | 0.64 ( 0.60, 0.69)
drv [f] | | -0.75 (-1.20, -0.29)
drv [r] | | -0.97 (-1.55, -0.40)
displ | | -0.02 (-0.36, 0.33)
cyl | | -0.36 (-0.62, -0.11)
-------------------------------------------------------
Observations | 234 | 234
easystats
package(s). parameters::compare_parameters()
easystats
package(s). parameters::model_parameters()
Read more here
Method | Adjustment Formula | Best Use Case |
---|---|---|
HC0 | No adjustment | Large samples |
HC1 | Inflation by \(\frac{n}{n-k}\) (default in Stata) | Moderate samples |
HC2 | Adjustment by \((1 - h_i)\) | Small samples |
HC3 | Adjustment by \((1 - h_i)^2\) | Very small samples or high leverage |
HC4 | Further adjustment for extreme leverage | Severe leverage issues |
Parameter | Coefficient | SE | 95% CI | t(232) | p
-----------------------------------------------------------------
(Intercept) | 0.84 | 0.38 | [0.10, 1.59] | 2.24 | 0.026
hwy | 0.68 | 0.02 | [0.65, 0.72] | 40.18 | < .001
easystats
package(s). performance::model_performance()
# Indices of model performance
AIC | AICc | BIC | R2 | R2 (adj.) | RMSE | Sigma
---------------------------------------------------------------
773.387 | 773.491 | 783.753 | 0.914 | 0.913 | 1.247 | 1.252
::: {.cell output-location=‘fragment’}
# Indices of model performance
AIC | AICc | BIC | R2 | R2 (adj.) | RMSE | Sigma
---------------------------------------------------------------
721.519 | 722.015 | 745.706 | 0.933 | 0.932 | 1.097 | 1.112
:::
easystats
package(s). performance::compare_performance()
Extra example model: ::: {.cell}
:::
# Comparison of Model Performance Indices
Name | Model | AIC (weights) | AICc (weights) | BIC (weights) | R2
---------------------------------------------------------------------------
mpg_model1 | lm | 773.4 (<.001) | 773.5 (<.001) | 783.8 (<.001) | 0.914
mpg_model2 | lm | 721.5 (<.001) | 722.0 (<.001) | 745.7 (>.999) | 0.933
mpg_model3 | lm | 609.0 (>.999) | 648.1 (>.999) | 809.4 (<.001) | 0.973
Name | R2 (adj.) | RMSE | Sigma
--------------------------------------
mpg_model1 | 0.913 | 1.247 | 1.252
mpg_model2 | 0.932 | 1.097 | 1.112
mpg_model3 | 0.965 | 0.694 | 0.798
easystats
package(s). performance::compare_performance()
easystats
package(s). performance::check_model()
lm()
to fit linear regression modelsdependent ~ independent
easystats
report()
model_parameters()
model_performance()
compare_performance()
check_model()
glm(, family = binomial)
logistic regressionglm(, family = poisson)
poissan regressionlmer()
linear mixed-effects modelestimator R-package for calculating Robust SE, etc.
The marginaleffects
package for R and Python offers a single point of entry to easily interpret the results of over 100 classes of models, using a simple and consistent user interface.
Its benefits include:
gtsummary
YES! This package can help with regression tables as well
tbl_regression()
tbl_regression()
tbl_regression()
tbl_regression()
Characteristic | Beta | 95% CI | p-value |
---|---|---|---|
hwy | 0.64 | 0.60, 0.69 | <0.001 |
drv | |||
4 | — | — | |
f | -0.75 | -1.2, -0.29 | 0.001 |
r | -0.97 | -1.6, -0.40 | 0.001 |
displ | -0.02 | -0.36, 0.33 | >0.9 |
cyl | -0.36 | -0.62, -0.11 | 0.006 |
Abbreviation: CI = Confidence Interval | |||
R² = 0.933; Adjusted R² = 0.932; Sigma = 1.11; Statistic = 637; p-value = <0.001; df = 5; Log-likelihood = -354; AIC = 722; BIC = 746; Deviance = 282; Residual df = 228; No. Obs. = 234 |
tbl_merge()
tbl_merge()
Characteristic |
Table 1
|
Table 2
|
||||
---|---|---|---|---|---|---|
Beta | 95% CI | p-value | Beta | 95% CI | p-value | |
hwy | 0.68 | 0.66, 0.71 | <0.001 | 0.64 | 0.60, 0.69 | <0.001 |
drv | ||||||
4 | — | — | ||||
f | -0.75 | -1.2, -0.29 | 0.001 | |||
r | -0.97 | -1.6, -0.40 | 0.001 | |||
displ | -0.02 | -0.36, 0.33 | >0.9 | |||
cyl | -0.36 | -0.62, -0.11 | 0.006 | |||
Abbreviation: CI = Confidence Interval |
tbl_merge()
Titles on models ::: {.cell output-location=‘fragment’}
Characteristic |
Simple
|
WOW model!!!
|
||||
---|---|---|---|---|---|---|
Beta | 95% CI | p-value | Beta | 95% CI | p-value | |
hwy | 0.68 | 0.66, 0.71 | <0.001 | 0.64 | 0.60, 0.69 | <0.001 |
drv | ||||||
4 | — | — | ||||
f | -0.75 | -1.2, -0.29 | 0.001 | |||
r | -0.97 | -1.6, -0.40 | 0.001 | |||
displ | -0.02 | -0.36, 0.33 | >0.9 | |||
cyl | -0.36 | -0.62, -0.11 | 0.006 | |||
Abbreviation: CI = Confidence Interval |
:::
tbl_merge()
Titles are read as markdown ::: {.cell output-location=‘fragment’}
Characteristic |
Simple
|
WOW model!!!
|
||||
---|---|---|---|---|---|---|
Beta | 95% CI | p-value | Beta | 95% CI | p-value | |
hwy | 0.68 | 0.66, 0.71 | <0.001 | 0.64 | 0.60, 0.69 | <0.001 |
drv | ||||||
4 | — | — | ||||
f | -0.75 | -1.2, -0.29 | 0.001 | |||
r | -0.97 | -1.6, -0.40 | 0.001 | |||
displ | -0.02 | -0.36, 0.33 | >0.9 | |||
cyl | -0.36 | -0.62, -0.11 | 0.006 | |||
Abbreviation: CI = Confidence Interval |
:::
If you use plot()
on a gtsummary
table you get a ggplot
object
You can modify the ggplot
object using functions you already know.
gtsummary
Read more about regression and gtsummary