## From normal distribution
x <- rnorm(10)
## From uniform distribution
x <- runif(10)
## A permutation of some values
x <- sample(1:10)
## A random factor
x <- sample(letters[1:4], size = 20, replace = TRUE)Getting help with R
When you need help with an R problem it can be very useful to write a reproducible example. A reproducible example allows someone else to recreate your problem by just copying and pasting R code. The example should be minimal, simplifying the problem as much as possible.
A minimal reproducible R example usually consists of the following items:
- A minimal dataset, necessary to demonstrate the problem. 
- The minimal runnable code necessary to reproduce the error or problem, which can be run on the given dataset. 
Writing a reproducible example can actually help you find the solution to the problem by yourself. If not, you can share the example with others by sending the runnable script through several channels.
Minimal dataset
Instead of supplying the full dataset you are working with, try creating a simplified version that contains the necessary structure to reproduce the problem. The reader should be able to get the data without the need for downloading any external file. The code itself should provide the data. The goal is to make it as easy as possible for someone to reproduce your problem.
Two possible ways of providing a minimal data set are creating fake data using R’s built-in functions or using one of R’s built-in datasets.
Vectors
Making a vector in R is easy. Sometimes it is necessary to add some randomness to it, and there are a whole number of functions to make that. sample() can randomize a vector, or give a random vector with only a few values. letters is a useful vector containing the alphabet, which can be used for making factors.
Data frames
You can create a simple data frame for your example by specifying made-up vectors of the same length inside data.fram() or tibble().
df <- data.frame(
  x = sample(1:10),
  y = sample(c("yes", "no"), 10, replace = TRUE)
)For some questions, specific formats can be needed. For these, one can use: as.factor(), as.Date(), etc.
It may also be possible to just use one of the built-in datasets in R that is suitable to your problem. You can use library(help = "datasets") to see a comprehensive list of all built-in datasets.
airquality[1:5,]  Ozone Solar.R Wind Temp Month Day
1    41     190  7.4   67     5   1
2    36     118  8.0   72     5   2
3    12     149 12.6   74     5   3
4    18     313 11.5   62     5   4
5    NA      NA 14.3   56     5   5Another option is to copy your data frame into your reproducible script using dput():
- Run dput(my_df)in R, wheremy_dfis your data frame.
- Copy the output.
- In your reproducible script, type my_df <-and paste.
Minimal code
- Packages should be loaded at the top of the script. 
- Load only the packages necessary for the example to work. 
- No calls to - install.packages().
- Spend some time ensuring that your code is easy for others to read: - Use simple, descriptive names for variables and functions.
- Use comments to indicate where your problem lies.
- Do your best to remove everything that is not related to the problem.
 
Example
Let’s look at an example. Here is the problem:
Asker: I need to take the average of several variables for all combination of two categorical variables. I have 20+ variables for which I need to take the average. Is there a way to apply the same function (i.e.
mean) over several columns in a data frame?
And here is a minimal reproducible example:
library(dplyr)
df <- tibble(
  year = sample(2018:2020, size = 15, replace = TRUE),
  scenario = sample(c("A", "B"), 15, replace = TRUE),
  x1 = rnorm(15, mean = 12),
  x2 = rnorm(15, 18),
  x3 = rnorm(15, 30)
)
## I can compute the average of x1, x2, and x3 manually.
df %>%
  group_by(year, scenario) %>%
  summarize(
    x1 = mean(x1),
    x2 = mean(x2),
    x3 = mean(x3)
  )
## How can I avoid having to repeat the mean() function calls for each variable?It is straightforward for someone to copy and paste the code and provide a possible solution:
Helper: You can use the new
across()function from the dplyr package insidesummarize().
df %>%
  group_by(year, scenario) %>%
  summarize(across(.cols = everything(), .fns = mean))# A tibble: 6 × 5
# Groups:   year [3]
   year scenario    x1    x2    x3
  <int> <chr>    <dbl> <dbl> <dbl>
1  2018 A         11.6  17.9  29.9
2  2018 B         12.5  18.3  30.4
3  2019 A         12.0  19.7  29.4
4  2019 B         11.4  17.8  30.2
5  2020 A         12.1  18.5  29.8
6  2020 B         12.2  17.9  29.1