Chapter 5 Questions and Answers

A list of questions that may arise during the use of R and their answers. When you have questions that remain unanswered, feel free to reach out!

What is the difference between a data.frame and a tibble?

Both are objects that hold data. The data.frame is the structure supported by basic R and the tibble is an extended variant that is supported by the tidyverse package. Dependig on the function that is used to load data, read.csv() from base R or read_csv() from the tidyverse, the data will be loaded in a data.frame or tibble, respectively. It is possible to convert a classic data.frame to a tibble (which requires the tidyverse pakcage):

df <- as_tibble(mtcars)
class(df)

[1] "tbl_df"     "tbl"        "data.frame"

And to convert a tibble to a data.frame:

df <- as.data.frame(df)
class(df)

[1] "data.frame"

What’s up with spaces in variable names and in files names?

Spaces are used to separate words (as in this line). When a space would be used between two words, these would be treated as separate entities. Avoiding spaces in filenames has a more technical background. Most filesystems currently accept spaces, but some systems do not. To increase compatibility across systems it is a good habit to use_underscores_instead.

What are the rules for naming dataframes or variables?

Do not use spaces, stick to characters and numbers. Whenever a variable consists of multiple words, e.g. room temperature there are two options:

Add an underscore as separator: room_temperature.
Use the ‘camelCase’ notation: roomTemperature.

Personally, I prefer an underscore and the abbreviation df for dataframe when multiple dataframes are generated, e.g. df_tidy or df_summary

What is the difference between `<-` and `=` for assigning variables?

R prefers the <- for the assignment of a value to a variable. In RStudio the shortcut is <alt>+<->. It really is a preference since both can be used:

x <- 1
y = 2
x + y

[1] 3

What does ‘NA’ mean in a dataframe?

When no data is available, this is known as a ‘missing value’. In R this is indicated with NA for ‘Not Available’. Empty cells in a CSV file will be converted to NA when the file is loaded. Other characters that can be present in a file (especially xls files) are: “.” or “NaN” or “#N/A” or “#VALUE!”. To convert these strings to NA use:

df <- read.csv('FPbase_Spectra.csv', na.strings=c(".", "NaN", "#N/A", "#VALUE!"))

What is the beste way to type a ‘%>%’ operator?

I prefer to literally type it. The shortcut that works in Rstudio is <shift>+<command>+<M> or <shift>+<control>+<M>.

Where do I find the example data?

The example data that is used in Chapter 2 and 3 is located on this github page: https://github.com/JoachimGoedhart/DataViz-protocols

The data that is used for the protocols in Chapter 4 is located in the subdirectory /Protocols.

Is there a way to re-use or adjust the protocols?

Instead of copy-pasting and running the code line-by-line, you can download the R Markdown file (.Rmd) from the protocols folder in the Github repository. The R Markdown file (together with the data) can be used to replicate the protocol and to modify it. For more info on R Markdown, see https://rmarkdown.rstudio.com

Which packages are included in the `{tidyverse}` package?

You can find this out by first loading the package and then run tidyverse_packages()

What is the difference between `require()` and `library()` for loading a package?

Both functions can be used to load a package. The difference is that require() returns FALSE when the package does not exist and can be used to check whether a package was loaded:

if (require("nonexistant") == FALSE) ("This packkage doesn't exist")

Loading required package: nonexistant

[1] "This packkage doesn't exist"

Is it possible to have interactive file selection?

In some cases it can be convenient to select a file by point-and-click, although this is not strictly reproducible. This example code shows how this can be achieved by using the function file.choose() inside a function for reading a csv file:

df <- read.csv(file.choose(), header = TRUE)