Chapter 5 Questions and Answers
A list of questions that may arise during the use of R and their answers. When you have questions that remain unanswered, feel free to reach out!
What is the difference between a data.frame and a tibble?
Both are objects that hold data. The data.frame is the structure supported by basic R and the tibble is an extended variant that is supported by the tidyverse package. Dependig on the function that is used to load data, read.csv()
from base R or read_csv()
from the tidyverse, the data will be loaded in a data.frame or tibble, respectively. It is possible to convert a classic data.frame to a tibble (which requires the tidyverse pakcage):
[1] "tbl_df" "tbl" "data.frame"
And to convert a tibble to a data.frame:
[1] "data.frame"
What’s up with spaces in variable names and in files names?
Spaces are used to separate words (as in this line). When a space would be used between two words, these would be treated as separate entities. Avoiding spaces in filenames has a more technical background. Most filesystems currently accept spaces, but some systems do not. To increase compatibility across systems it is a good habit to use_underscores_instead.
What are the rules for naming dataframes or variables?
Do not use spaces, stick to characters and numbers. Whenever a variable consists of multiple words, e.g. room temperature there are two options:
- Add an underscore as separator:
room_temperature.
- Use the ‘camelCase’ notation:
roomTemperature.
Personally, I prefer an underscore and the abbreviation df for dataframe when multiple dataframes are generated, e.g. df_tidy or df_summary
What is the difference between <-
and =
for assigning variables?
R prefers the <-
for the assignment of a value to a variable. In RStudio the shortcut is <alt>+<->
. It really is a preference since both can be used:
[1] 3
What does ‘NA’ mean in a dataframe?
When no data is available, this is known as a ‘missing value’. In R this is indicated with NA for ‘Not Available’. Empty cells in a CSV file will be converted to NA when the file is loaded. Other characters that can be present in a file (especially xls files) are: “.” or “NaN” or “#N/A” or “#VALUE!”. To convert these strings to NA use:
What is the beste way to type a ‘%>%’ operator?
I prefer to literally type it. The shortcut that works in Rstudio is <shift>+<command>+<M>
or <shift>+<control>+<M>
.
Where do I find the example data?
The example data that is used in Chapter 2 and 3 is located on this github page: https://github.com/JoachimGoedhart/DataViz-protocols
The data that is used for the protocols in Chapter 4 is located in the subdirectory /Protocols.
Is there a way to re-use or adjust the protocols?
Instead of copy-pasting and running the code line-by-line, you can download the R Markdown file (.Rmd) from the protocols folder in the Github repository. The R Markdown file (together with the data) can be used to replicate the protocol and to modify it. For more info on R Markdown, see https://rmarkdown.rstudio.com
Which packages are included in the {tidyverse}
package?
You can find this out by first loading the package and then run tidyverse_packages()
What is the difference between require()
and library()
for loading a package?
Both functions can be used to load a package. The difference is that require()
returns FALSE
when the package does not exist and can be used to check whether a package was loaded:
Loading required package: nonexistant
[1] "This packkage doesn't exist"
Is it possible to have interactive file selection?
In some cases it can be convenient to select a file by point-and-click, although this is not strictly reproducible. This example code shows how this can be achieved by using the function file.choose()
inside a function for reading a csv file: