forcats package
Image Credits: Gauraw Tiwari
https://medium.com/@tiwarigaurav2512/r-data-types-847fffb01d5b
Image Credits: Hadley Wickham
https://adv-r.hadley.nz/vectors-chap.html
A vector that can contain only predefined values.
Used to store categorical data.
Built on top of an integer vector with two attributes:
a class
, “factor”, which makes it behave differently from regular integer vectors.
levels
, which defines the set of allowed values.
Image Credits: Hadley Wickham
https://adv-r.hadley.nz/vectors-chap.html
You know the set of possible values but they’re not all present in a given dataset.
# A tibble: 2 × 2
sex_factor n
<fct> <int>
1 Male 3
2 Female 0
You want to display character vectors in a non-alphabetical order.
[1] "Friday" "Monday" "Sunday" "Thursday" "Tuesday"
[1] Monday Tuesday Thursday Friday Sunday
Levels: Monday Tuesday Wedensday Thursday Friday Saterday Sunday
Ordered factors are a minor variation of factors.
In general, they behave like regular factors, but the order of the levels is meaningful (low, medium, high) This property that is automatically leveraged by some modelling and visualization functions.
forcats
The forcats
package from the tidyverse
contains many useful functions for working with factors.
We are going to learn:
factor()
fct_reorder()
fct_infreq()
fct_rev()
fct_recode()
fct_lump()
forcats
cheatsheet.fct_infreq()
and fct_rev()
fct_infreq()
order levels after increasing frequency.
fct_rev()
reverses the order.
fct_recode()
[1] Monday Sunday Thursday Tuesday Friday
Levels: Monday Tuesday Wedensday Thursday Friday Saterday Sunday
fct_recode()
recodes, or changes, the value of each level.
::: {.cell output-location=‘column-fragment’}
x1f |> factor() |>
fct_recode(
"mon" = "Monday",
"tue" = "Tuesday",
"thu" = "Thursday",
"fri" = "Friday",
"sun" = "Sunday"
)
[1] mon sun thu tue fri
Levels: mon tue thu fri sun
:::
fct_recode()
will leave levels that aren’t explicitly mentioned as is (Tuesday in example), and will warn you if you refer to a level that doesn’t exist. ::: {.cell output-location=‘column-fragment’}
x1f |> factor() |>
fct_recode(
"mon" = "Monday",
# "tue" = "Tuesday",
"wed" = "Wedensday", # doesn't exist in x1f
"thu" = "Thursday",
"fri" = "Friday",
"sat" = "Saterday", # doesn't exist in x1f
"sun" = "Sunday"
)
Warning: Unknown levels in `f`: Wedensday, Saterday
[1] mon sun thu Tuesday fri
Levels: mon Tuesday thu fri sun
:::
fct_collapse()
fct_collapse()
is a useful variant of fct_recode()
.
[1] Work Weekend Work Work Work
Levels: Work Weekend
fct_lump()
fct_lump()
is another useful variant of fct_recode()
.
# A tibble: 37 × 2
species n
<chr> <int>
1 Aleena 1
2 Besalisk 1
3 Cerean 1
4 Chagrian 1
5 Clawdite 1
6 Droid 6
7 Dug 1
8 Ewok 1
9 Geonosian 1
10 Gungan 3
# ℹ 27 more rows
# A tibble: 2 × 2
species n
<fct> <int>
1 Human 35
2 Other 48
Lets practice