Using `source()` to make your Quarto document manageable

Brug source() til at skabe overblik i Quarto dokumenter

Author

Søren O’Neill & Steen Harsted

Published

October 23, 2024

English
Dansk

You could structure your manuscript as a single Quarto document, with everything included, but it does tend to get lengthy and unwieldy.

1 Sourcing R code from file

A better solution: If a part of your code can stand alone and solve a discrete problem, we suggest you store that R code in a separate file and then source it from the Quarto document.

For instance, the code used to load raw data and tidy it into clean data could be separated out as a stand-alone R script.

Look at the following manuscript and pay close attention attention to the R code:

---
title: "This title definition is in the YAML of my Quarto document"
---

```{r} 
# This is an R chunk

# Load raw data, clean it and save it as an RDS object
source("clean_data.R")
# Now load the clean data from RDS object file
data <- readRDS("clean_data.rds")
```

This markdown constitues my main manuscript text.

The source() command works as if it was substituted by the R code in the “clean_data.R” file.

Only source R scripts in R code – not Quarto

The source() function is an R function and will only work inside a) an R script or b) an R chunk in a Quarto manuscript – it will not work if placed in the text body of a Quarto document.

The source() function is used to include R code only – it will not work to include other file types, like Quarto markdown files – for that, see Section 2 below.

What is an RDS object/file

An RDS object (R Data Serialization) is a convenient, albeit not human-readable data format and you can read more about it here.

Benefits to using source() to structure a Quarto manuscript include:

Potentially lengthy R code can be hidden from view, making the Quarto document more manageable.¹
R code stored in separate R scripts can be re-used in other contexts (e.g. another Quarto document).
Speeding up the rendering process – see below

Speeding up the rendering process…

Suppose the clean_data.R code takes a substantial time to complete – long enough to be a nuisance when working with (and repeatedly rendering) your Quarto file to output…

Once you know that the clean_data.R script does what it is intended to do, you could simply comment out² the source command in the R chunk of the Quarto document. That would speed things up, as the clean_data.R script would not sourced/executed at all. Instead, R would simply proceed to the next line of the R chunk and read the clean_data.RDS object into memory.

However, this requires a bit of discipline! Consider what you would need to do in case:

the raw data was changed
the code in the clean_data.R script was changed
you are ready to render the final output for submission

If all three cases you would need to execute the clean_data.R script again to ensure that any changes are reflected in the RDS file. I.e. you would have to un-comment the line again to ensure that clean_data.R is actually sourced when re-rendering the Quarto manuscript.

Several R packages exist that offer to handle such conditional execution of R code automatically to avoid unnecessary and repetitive code execution – a Google search for R code caching should provide some hints. Personally, why think it is overkill.

Consider sourcing external R scripts for these discrete tasks:

Cleaning raw data
Structuring and filtering raw data for current project
Generating figures
Generating tables
Statistical modelling

2 Include other document types

In addition to R scripts, it is possible to include the content of other types of external files into a Quarto document – other Quarto documents (which may include R chunks), plain MarkDown and simple text files, but not MS Word documents.

The way to include an external file into a Quarto document, is to use the {{< include filename>}} shortcode, e.g.: {{< include included_text.qmd >}}.

This is useful for text which is commonly re-used. E.g. author names and affiliations, conflicts-of-interest statements, etc. Personally, why think it is overkill.

There are a few caveats to be aware of if the included files are anything but simple – please see this link.

Du kunne strukturere dit manuskript som et enkelt Quarto-dokument, med alt inkluderet – det har dog en tendens til at blive langt og uhåndterligt.

3 Indhent R-kode fra fil

En bedre løsning: Hvis en del af din kode kan stå alene og løse et specifikt problem, foreslår vi, at du gemmer denne R-kode i en separat fil og derefter source’r den fra Quarto-dokumentet.

For eksempel kunne koden, der bruges til at indlæse rådata og oprense dem til tidy data, skilles ud som et selvstændigt R-script.

Nærlæs R-chunk’en i dette Quarto-eksempel:

---
title: "This title definition is in the YAML of my Quarto document"
---

```{r} 
# This is an R chunk

# Load raw data, clean it and save it as an RDS object
source("clean_data.R")
# Now load the clean data from RDS object file
data <- readRDS("clean_data.rds")
```

This markdown constitues my main manuscript text.

Kommandoen source() virker som om den blev erstattet af R-koden i filen clean_data.R.

Kun source’e R-scripts i R-kode – ikke Quarto

source()-funktionen er en R-funktion og vil kun virke i a) et R-script eller b) en R-chunk i et Quarto-manuskript – den vil ikke fungere, hvis den placeres i brødteksten i et Quarto-dokument.

source()-funktionen bruges kun til at inkludere R-kode – det vil ikke fungere at inkludere andre filtyper, såsom Quarto markdown-filer – for det, se Section 4 nedenfor.

Hvad er et RDS-objekt/-fil

Et RDS-objekt (R Data Serialization) er et praktisk, omend ikke menneskeligt læsbart dataformat, og du kan læse mere om det her.

Fordelene ved at bruge source() til at strukturere et Quarto-manuskript inkluderer:

Potentielt lang R-kode kan skjules, hvilket gør Quarto-dokumentet mere overskueligt.³
R-kode gemt i separate R-scripts kan genbruges i andre sammenhænge (f.eks. et andet Quarto-dokument).
Hurtigere render til output – se nedenfor

Hurtigere render til output…

Hvad nu hvis, clean_data.R-koden tager lang tid at færdiggøre – længe nok til at være til gene, når du arbejder med (og gentagne gange render) din Quarto-fil til output…

Når du ved, at clean_data.R-scriptet gør, hvad det er beregnet til, kan du simpelthen kommentere⁴ source()-kommandoen i R-chunk’en af Quarto-dokumentet. Det ville fremskynde tingene, da clean_data.R-scriptet slet ikke ville hentes/eksekveres. I stedet ville R simpelthen fortsætte til næste linje i R-chunk’en og læse clean_data.RDS-objektet ind i hukommelsen.

Dette kræver dog en smule selvdisciplin! Overvej, hvad du skal gøre i tilfælde af:

rådata blev ændret
koden i clean_data.R scriptet blev ændret
du er klar til at levere det endelige output til indsendelse

I alle tre tilfælde skal du eksekvere clean_data.R-scriptet igen for at sikre, at eventuelle ændringer afspejles i RDS-filen. Dvs. du bliver nødt til at af-kommentere kommandoen igen for at sikre, at clean_data.R rent faktisk bliver source’et og eksekveret, når Quarto-manuskriptet render’es til output igen.

Der eksisterer adskillige R-pakker, der tilbyder at håndtere en sådan betinget udførelse af R-kode automatisk for at undgå unødvendig og gentagne kodeudførelse - en Google-søgning efter R code caching skulle give nogle hints. Personligt, synes vi det er overkill.

Overvej at køre eksterne R-scripts til disse diskrete opgaver:

Rensning af rådata
Strukturering og filtrering af rådata til igangværende projekt
Generering af figurer
Generering af tabeller
Statistisk modellering

4 Inkluder andre dokumenttyper

Ud over R-scripts er det muligt at inkludere indholdet af andre eksterne fil typer i et Quarto-dokument – andre Quarto-dokumenter (som kan inkludere R-chunks), almindelige MarkDown- og simple tekstfiler, men ikke MS Word-dokumenter.

Måden at inkludere en ekstern fil i et Quarto-dokument på er at bruge kortkoden {{< include filename>}}, f.eks.: {{< include included_text.qmd >}}.

Dette er nyttigt til tekst, som ofte genbruges. F.eks. forfatternavne og affilieringer, udsagn om interessekonflikter osv. Personligt, synes vi det er overkill.

Der er et par forbehold, du skal være opmærksom på, hvis du inkluderede den slags filer – se venligst dette link.

Footnotes

Another way to hide code in R scripts and Quarto is described about 6 minutes into the video here ↩︎
Replace source("clean_data.R") with # source("clean_data.R").↩︎
En anden måde at skjule kode i R-scripts og Quarto er beskrevet omkring 6 minutter inde i videoen her ↩︎
Erstat source("clean_data.R") med # source("clean_data.R").↩︎