Session 2

Review

Data Types

What kind of values are each of the following?

10L
1.2345
"three"

Functions and Arguments

What can you tell me about the value that will be returned when I run:

runif(1, min = -1)

Result

[1] 0.8741508

Variable Names

Which of these are valid variable names?

min_height
min.height
minHeight
MINHEIGHT
min-height
.min.height
_min_height
0height
min0height
`Minimum Height`

Results

These are all fine:

min_height <- 14
min.height <- 14
minHeight <- 14
MINHEIGHT <- 14

min0height <- 14
`Minimum Height` <- 14

This one works, but creates a hidden variable.

.min.height <- 14

These don’t work and will cause an error.

# Can't start with an `_`
_min_height <- 14

# This is subtraction
min-height <- 14

# Can't start with a number
0height <- 14

Variables & Environment

What is the value of each variable after each of the following statements?

todays_temp <- 31.6666
offset <- 32L
coef <- 1.8
intermed_temp <- todays_temp * coef
todays_temp <- intermed_temp + offset
round(todays_temp, 2)

Result

[1] 89

Getting Help

What do the following functions do? Use ?, ?? or the Help pane to learn about each function. Come up with 1-3 examples of each function in action.

identical(___)

tolower(___)

rep(___)

seq(___)

identical

The safe and reliable way to test two objects for being exactly equal. It returns TRUE in this case, FALSE in every other case.

identical(1, 1L)

[1] FALSE

identical(-0, +0, num.eq = FALSE)

[1] FALSE

identical(1, NULL)

[1] FALSE

tolower

Translate characters in character vectors, in particular from upper to lower case or vice versa.

tolower("APPLE")

[1] "apple"

tolower("Help")

[1] "help"

toupper("banana")

[1] "BANANA"

rep

rep() replicates the values in x.

rep(1:4, 2)

[1] 1 2 3 4 1 2 3 4

rep(1:4, each = 2)       # not the same.

[1] 1 1 2 2 3 3 4 4

rep(1:4, c(2,2,2,2))     # same as second.

[1] 1 1 2 2 3 3 4 4

rep(1:4, c(2,1,2,1))

[1] 1 1 2 3 3 4

rep(1:4, each = 2, len = 4)    # first 4 only.

[1] 1 1 2 2

rep(1:4, each = 2, len = 10)   # 8 integers plus two recycled 1's.

 [1] 1 1 2 2 3 3 4 4 1 1

rep(1:4, each = 2, times = 3)  # length 24, 3 complete replications

 [1] 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4

seq

Generate regular sequences. seq is a standard generic with a default method. seq.int is a primitive which can be much faster but has a few restrictions. seq_along and seq_len are very fast primitives for two common cases.

seq(1, 9)

[1] 1 2 3 4 5 6 7 8 9

seq(0, 1, by = 0.1)

 [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

seq(0, 1, length.out = 11)

 [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

seq(1, 9, by = 2)     # matches 'end'

[1] 1 3 5 7 9

seq(1, 9, by = pi)    # stays below 'end'

[1] 1.000000 4.141593 7.283185

seq(1, 6, by = 3)

[1] 1 4

seq(1.575, 5.125, by = 0.05)

 [1] 1.575 1.625 1.675 1.725 1.775 1.825 1.875 1.925 1.975 2.025 2.075
[12] 2.125 2.175 2.225 2.275 2.325 2.375 2.425 2.475 2.525 2.575 2.625
[23] 2.675 2.725 2.775
 [ reached getOption("max.print") -- omitted 47 entries ]

seq(17) # same as 1:17, or even better seq_len(17)

 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17

Highlighting New Concepts

These examples gave us a chance to review the things that we talked about during the last session, and they also introduced us to several new concepts that we will cover today.

Vectors
What are the ...? (further arguments passed to or from other methods)
NULL and NA

Overview

Packages
Collections: vectors, lists, data frames
Data Types Continued
- Special data types: NA, NULL
- Working with data types: is, as, class
Workspaces & RStudio Projects
More Functions
Data Processing with dplyr

Packages

Installing Packages

install.packages("tidyverse")
install.packages("readxl")

Most packages are hosted on CRAN (cran.rstudio.com).

How do you find packages? Besides Google, you can use MetaCRAN (r-pkg.org) to search for available packages. Or you can use the CRAN Task View.

Attaching (Loading) Packages

library(tidyverse)

Includes the following packages: broom, cli, crayon, dplyr, dbplyr, forcats, ggplot2, haven, hms, httr, jsonlite, lubridate, magrittr, modelr, purrr, readr, readxl, reprex, rlang, rstudioapi, rvest, stringr, tibble, tidyr, and xml2.

library(readr)
library(dplyr)
library(stringr, tibble) #<< Doesn't do what you think

@ijlyttle a package is a like a book, a library is like a library; you use library() to check a package out of the library #rsats
— Hadley Wickham (@hadleywickham) December 8, 2014

Using RStudio

You can also use the Packages pane to install and update,

RStudio Packages Pane

or Tools ▸ Install Packages….

RStudio Install Packages Menu

Behind the Scenes

A package contains:

Functions
Documentation
Vignettes
Data

library(babynames)

install.packages("babynames")

babynames

Error in eval(expr, envir, enclos): object 'babynames' not found

babynames::babynames

# A tibble: 1,858,689 x 5
    year sex   name          n   prop
   <dbl> <chr> <chr>     <int>  <dbl>
 1  1880 F     Mary       7065 0.0724
 2  1880 F     Anna       2604 0.0267
 3  1880 F     Emma       2003 0.0205
 4  1880 F     Elizabeth  1939 0.0199
 5  1880 F     Minnie     1746 0.0179
 6  1880 F     Margaret   1578 0.0162
 7  1880 F     Ida        1472 0.0151
 8  1880 F     Alice      1414 0.0145
 9  1880 F     Bertha     1320 0.0135
10  1880 F     Sarah      1288 0.0132
# ... with 1,858,679 more rows

library(babynames)
babynames

# A tibble: 1,858,689 x 5
    year sex   name          n   prop
   <dbl> <chr> <chr>     <int>  <dbl>
 1  1880 F     Mary       7065 0.0724
 2  1880 F     Anna       2604 0.0267
 3  1880 F     Emma       2003 0.0205
 4  1880 F     Elizabeth  1939 0.0199
 5  1880 F     Minnie     1746 0.0179
 6  1880 F     Margaret   1578 0.0162
 7  1880 F     Ida        1472 0.0151
 8  1880 F     Alice      1414 0.0145
 9  1880 F     Bertha     1320 0.0135
10  1880 F     Sarah      1288 0.0132
# ... with 1,858,679 more rows

births

# A tibble: 119 x 2
    year  births
   <int>   <int>
 1  1909 2718000
 2  1910 2777000
 3  1911 2809000
 4  1912 2840000
 5  1913 2869000
 6  1914 2966000
 7  1915 2965000
 8  1916 2964000
 9  1917 2944000
10  1918 2948000
# ... with 109 more rows

ls("package:babynames")

[1] "applicants" "babynames"  "births"     "lifetables"

detach("package:babynames", unload = TRUE)

Collections

Vectors

1:3

[1] 1 2 3

c(1, 2, 3)

[1] 1 2 3

identical(1:3, c(1, 2, 3))

[1] FALSE

identical(1:3, c(1L, 2L, 3L))

[1] TRUE

c(1L, 2L, 3L, pi)

[1] 1.000000 2.000000 3.000000 3.141593

fruits <- c("apple", "banana")
fruits

[1] "apple"  "banana"

fruits <- c(fruits, "cherimoya")

A cherimoya

fruits

[1] "apple"     "banana"    "cherimoya"

month_days <- c("March" = 31, "April" = 30, May = 31)

month_days

March April   May 
   31    30    31

names(month_days) <- c("August", "September", "October")
month_days

   August September   October 
       31        30        31

month_days["October"]

October 
     31

Lists

x <- list(
  fruits = fruits,
  months = names(month_days),
  month_days = month_days
)

x

$fruits
[1] "apple"     "banana"    "cherimoya"

$months
[1] "August"    "September" "October"  

$month_days
   August September   October 
       31        30        31

Data Frames and Tibbles

y <- data.frame(
  fruits = fruits,
  months = names(month_days),
  month_days = month_days
)
y

             fruits    months month_days
August        apple    August         31
September    banana September         30
October   cherimoya   October         31

y2 <- data.frame(
  fruits = fruits,
  months = names(month_days),
  month_days = month_days,
  stringsAsFactors = FALSE,
  row.names = NULL
)
y2

     fruits    months month_days
1     apple    August         31
2    banana September         30
3 cherimoya   October         31

z <- data_frame(             #<< tibble()
  fruits = fruits,
  months = names(month_days),
  mont_days = month_days
)
z

# A tibble: 3 x 3
  fruits    months    mont_days
  <chr>     <chr>         <dbl>
1 apple     August           31
2 banana    September        30
3 cherimoya October          31

Data Types Continued

Review

Type	Example
integer	`1L`
double	`3.14`, `1.23e-4`
character	`"apple"`
logical	`TRUE`, `FALSE`
vector	`c(...)`
list	`list(...)`
data.frame	`data.frame(...)`
tibble	`data_frame(...)`
N/A	`NA`
null	`NULL`
factor	`factor(letters)`

Special Data Types

c(1, 2, NA, 4)

[1]  1  2 NA  4

c(1, 2, NULL, 4)

[1] 1 2 4

Factors

We’ll talk about this later, but it’s basically a vector with an additional label and sometimes an order.

Sneak Peek

factor(1:3)

[1] 1 2 3
Levels: 1 2 3

factor(1:3,
       levels = 1:3)

[1] 1 2 3
Levels: 1 2 3

factor(1:3,
       levels = 1:3,
       labels = c("a", "b", "c"))

[1] a b c
Levels: a b c

factor(1:3,
       levels = 1:3,
       labels = c("a", "b", "c"),
       ordered = TRUE)

[1] a b c
Levels: a < b < c

factor(1:3,
       levels = 3:1,
       labels = c("a", "b", "c"),
       ordered = TRUE)

[1] c b a
Levels: a < b < c

Working with Data Types

What is this thing?

x <- 1L
y <- pi
z <- "apple"

class(x)

[1] "integer"

class(y)

[1] "numeric"

class(z)

[1] "character"

typeof(z)

[1] "character"

class(mtcars)

[1] "data.frame"

Are you this thing?

is.integer(1L)

[1] TRUE

is.numeric(pi)

[1] TRUE

is.numeric(1L)

[1] TRUE

is.double(1L)

[1] FALSE

is.character("one")

[1] TRUE

is.logical("TRUE")

[1] FALSE

Are you even there?

is.na(c(1, 2, NA, 4))

[1] FALSE FALSE  TRUE FALSE

is.null(c(1, 2, NULL, 4))

[1] FALSE

xyz <- NULL
is.null(xyz)

[1] TRUE

Turn you into this thing.

as.character(1)

[1] "1"

as.integer(pi)

[1] 3

as.double(10L)

[1] 10

as.logical(2)

[1] TRUE

Workspaces & RStudio Projects

Working Directory

The working directory is where R looks when it tries to find a file or where it writes a file.

You can check where your R process is “living” – i.e. your working directory – with

getwd()

and you can set it with

setwd("~/myCoolProject")

But this is not recommended!

You can also use More ▸ Set As Working Directory or Go To Working Directory in the Files pane to set the working directory, but this is also not recommended.

When should you? When you get lost.

Using RStudio Projects

Without some kind of organization scheme, you’ll very quickly end up writing all of your R scripts in a single folder. Multiple analysis will write out files, exporting data and creating plots, each of these writing into the same folder.

Life without RStudio Projects

Instead, RStudio offers an excellent method of organization called Projects. Using RStudio projects, each analysis is self-contained and organized, each in its own way, and it’s easy to switch from one project to another and know that your files will be organized, your environment will be clean, and you can pick everything back up from where you started.

Life with RStudio Projects, credit

Create an RStudio Project

Select File ▸ New Project or choose New Project from the drop-down menu in the upper right corner of the RStudio window.

Select New Directory to create your project in a new directory. If you already have files in a directory that you want to use, choose Existing Directory.

Select the type of project you want to start – this will generally be New Project.

Choose the name for the folder that will be created to house your project and pick the folder where the project folder will be created.

Here we give the new project folder the name cds-r-course. This will also be the name of the project itself.

Your project will be created and you’ll be dropped into a new R/RStudio session.

Work with your project

Use the Files to create a folder called data in your project folder.

Run the following command to download the gapminder.csv file into your data folder.

> download.file("https://gerkelab.github.io/core-r-course/materials/01/gapminder.csv", "data/gapminder.csv")

Create a new R script. Add the following code to it.

library(tidyverse)

patient_id <- 5554321
age_at_diagnosis <- 54
age_at_visit <- 54:58
tumor_size <- c(9.5, 9.5, 9.7, 9.9, 10.1)
site_code <- c("C220", "C400", "C412", "C220", "C400")

Save the file as example_single_patient.R in your project directory.

Source the file.

Close the project. Take a deep breath. Re-open the project. Everything is still there!

Session 2

July 25, 2018

Links

Review

Data Types

Functions and Arguments

Variable Names

Variables & Environment

Getting Help

Highlighting New Concepts

Overview

Packages

Installing Packages

Attaching (Loading) Packages

Using RStudio

Behind the Scenes

Collections

Vectors

Lists

Data Frames and Tibbles

Data Types Continued

Review

Special Data Types

Factors

Working with Data Types

What is this thing?

Are you this thing?

Are you even there?

Turn you into this thing.

Workspaces & RStudio Projects

Working Directory

Using RStudio Projects

Create an RStudio Project

Work with your project