Data Structures in R

Tony Yao-Jen Kuo

An overview

Data structures…

  • collects scalars
  • can be indexing
  • can be slicing
  • are iterable

We are gonna talk about 6 of them

  • vector
  • list
  • (optional)factor
  • data.frame
  • (optional)matrix
  • (optional)array

Vectors

Characteristics of a vector

  • element-wise operation
  • uniformed class
  • supports logical filtering

Why is there always a [1] before printed scalar?

Using c() to create vectors

player_names <- c("Jeremy Lin", "Michael Jordan", "Shaquille O'Neal")
player_heights <- c(191, 198, 216)
player_weights <- c(91, 98, 148)
player_names
player_heights
player_weights
## [1] "Jeremy Lin"       "Michael Jordan"   "Shaquille O'Neal"
## [1] 191 198 216
## [1]  91  98 148

Using [INDEX] indexing a value from vectors

player_names[1]
player_names[2]
player_names[3]
player_names[length(player_names)] # in case we have a long vector
## [1] "Jeremy Lin"
## [1] "Michael Jordan"
## [1] "Shaquille O'Neal"
## [1] "Shaquille O'Neal"

Using [c(INDICE)] slicing values from vectors

player_names[2:3]
player_names[c(1, 3)]
## [1] "Michael Jordan"   "Shaquille O'Neal"
## [1] "Jeremy Lin"       "Shaquille O'Neal"

What will happen if we set a NEGATIVE index?

# Try it yourself

Vectors are best known for its…

  • Element-wise operation
player_heights_m <- player_heights / 100
player_heights
player_heights_m
## [1] 191 198 216
## [1] 1.91 1.98 2.16

Practices: Using vector operations for players’ BMIs

player_bmis <- # ...

Beware of the types

# Name, height, weight, has_ring
mj <- c("Michael Jordan", 198, 98, TRUE)
mj
class(mj[1])
class(mj[2])
class(mj[3])
class(mj[4])
## [1] "Michael Jordan" "198"            "98"             "TRUE"          
## [1] "character"
## [1] "character"
## [1] "character"
## [1] "character"

How to generate vectors quickly

11:21
seq(from = 11, to = 21)
seq(from = 11, to = 21, by = 2)
seq(from = 11, to = 21, length.out = 6)
rep(7, times = 7)
##  [1] 11 12 13 14 15 16 17 18 19 20 21
##  [1] 11 12 13 14 15 16 17 18 19 20 21
## [1] 11 13 15 17 19 21
## [1] 11 13 15 17 19 21
## [1] 7 7 7 7 7 7 7

Getting logical values

player_heights <- c(191, 198, 216)
player_weights <- c(91, 98, 148)
player_bmis <- player_weights/(player_heights*0.01)**2
player_bmis > 30
## [1] FALSE FALSE  TRUE

Logical filtering

player_bmis[player_bmis > 30]
## [1] 31.72154

Practices: finding odd numbers in random_numbers

set.seed(87)
random_numbers <- sample(1:500, size = 100, replace = FALSE)

Vector is iterable

for (ITERATOR in ITERABLE) {
  # do something iteratively until ITERATOR hits the end of ITERABLE
}

Iterator as values

player_heights <- c(191, 198, 216)
for (ph in player_heights) {
  print(ph*0.01)
}
## [1] 1.91
## [1] 1.98
## [1] 2.16

Not just printing it out…

player_heights <- c(191, 198, 216)
player_heights_m <- c()
for (ph in player_heights) {
  player_heights_m <- c(player_heights_m, ph*0.01)
}
player_heights_m
## [1] 1.91 1.98 2.16

Practices: Applying fizz_buzz() on 1:100

  • if input can be divided by 3, return “fizz”
  • if input can be divided by 5, return “buzz”
  • if input can be divided by 15, return “fizz buzz”
  • otherwise, return input itself
## [1] 1 2 "fizz" 4 "buzz" ... 14 "fizz buzz" 16 ... 99 "buzz"

Iterators as indice

player_names <- c("Jeremy Lin", "Michael Jordan", "Shaquille O'Neal")
player_heights <- c(191, 198, 216)
for (i in 1:length(player_names)) {
  player_height_m <- player_heights[i]/100
  print(sprintf("%s is %s meter tall", player_names[i], player_height_m))
}
## [1] "Jeremy Lin is 1.91 meter tall"
## [1] "Michael Jordan is 1.98 meter tall"
## [1] "Shaquille O'Neal is 2.16 meter tall"

Practices: Is x a prime?

is_prime(87) ## FALSE
is_prime(89) ## TRUE
is_prime(91) ## FALSE

Practices: How many primes are there between x and y?

count_primes(5, 11) ## 3
count_primes(5, 13) ## 4
count_primes(5, 15) ## 4

Iterate with another style

while (CONDITION) {
  # do something iteratively when CONDITION == TRUE
}

Iterators as indice

i <- 1
while (i <= length(player_names)) {
  player_height_m <- player_heights[i]/100
  print(sprintf("%s is %s meter tall", player_names[i], player_height_m))
  i <- i + 1
}
## [1] "Jeremy Lin is 1.91 meter tall"
## [1] "Michael Jordan is 1.98 meter tall"
## [1] "Shaquille O'Neal is 2.16 meter tall"

Practices: How many times do I have to flip a coin to get 6 heads?

for is necessary condition for while

Practices: Fibonacci

  • Try using 2 types of loop to generate a certain fibonacci array.
fibonacci(0, 1, 5) ## [1] 0, 1, 1, 2, 3
fibonacci(0, 1, 7) ## [1] 0, 1, 1, 2, 3, 5, 8
fibonacci(0, 1, 9) ## [1] 0, 1, 1, 2, 3, 5, 8, 13, 21

Practices: Poker card deck

suits <- c("Spade", "Heart", "Diamond", "Clover")
ranks <- c("Ace", 2:10, "Jack", "Queen", "King")

Lists

Characteristics of lists

  • Different classes
  • Supports $ selection like attributes

Using list() to create a list

infinity_war <- list(
  "Avengers: Infinity War",
  2018,
  8.6,
  c("Action", "Adventure", "Fantasy")
)
class(infinity_war)
## [1] "list"

Check the apperance of a list

infinity_war
## [[1]]
## [1] "Avengers: Infinity War"
## 
## [[2]]
## [1] 2018
## 
## [[3]]
## [1] 8.6
## 
## [[4]]
## [1] "Action"    "Adventure" "Fantasy"

Using [[INDEX]] indexing list

for (i in 1:length(infinity_war)) {
  print(infinity_war[[i]])
}
## [1] "Avengers: Infinity War"
## [1] 2018
## [1] 8.6
## [1] "Action"    "Adventure" "Fantasy"

Giving names to elements in list

infinity_war <- list(
  movieTitle = "Avengers: Infinity War",
  releaseYear = 2018,
  rating = 8.6,
  genre = c("Action", "Adventure", "Fantasy")
)
infinity_war
## $movieTitle
## [1] "Avengers: Infinity War"
## 
## $releaseYear
## [1] 2018
## 
## $rating
## [1] 8.6
## 
## $genre
## [1] "Action"    "Adventure" "Fantasy"

Using [[“ELEMENT”]] indexing list

for (e in names(infinity_war)) {
  print(infinity_war[[e]])
}
## [1] "Avengers: Infinity War"
## [1] 2018
## [1] 8.6
## [1] "Action"    "Adventure" "Fantasy"

Using $ELEMENT indexing list

infinity_war$movieTitle
infinity_war$releaseYear
infinity_war$rating
infinity_war$genre
## [1] "Avengers: Infinity War"
## [1] 2018
## [1] 8.6
## [1] "Action"    "Adventure" "Fantasy"

Every element keeps its original class

for (e in names(infinity_war)) {
  print(class(infinity_war[[e]]))
}
## [1] "character"
## [1] "numeric"
## [1] "numeric"
## [1] "character"

Practices: Getting favorite players’ last names in upper cases

Hint: using strsplit() to split players’ name and using toupper() for upper cases.

fav_players <- c("Steve Nash", "Paul Pierce", "Dirk Nowitzki", "Kevin Garnett", "Hakeem Olajuwon")
# [1] "NASH" "PIERCE" "NOWITZKI" "GARNETT" "OLAJUWON"

(optional)Factors

Characteristics of factors

  • Acts like a character vector
  • Unique character is recorded as Levels
  • Supports ordinal values and each character is encoded as integers
  • Default class of a character column

Using factor() to create a factor

all_time_fantasy <- c("Steve Nash", "Paul Pierce", "Dirk Nowitzki", "Kevin Garnett", "Hakeem Olajuwon")
class(all_time_fantasy)
all_time_fantasy <- factor(all_time_fantasy)
class(all_time_fantasy)
## [1] "character"
## [1] "factor"

Unique character in factor is recorded with levels

rgbs <- factor(c("red", "green", "blue", "blue", "green", "green"))
rgbs
## [1] red   green blue  blue  green green
## Levels: blue green red

Supports ordinal values

temperatures <- factor(c("freezing", "cold", "cool", "warm", "hot"),
                       ordered = TRUE)
temperatures
temperatures[1] > temperatures[3]
## [1] freezing cold     cool     warm     hot     
## Levels: cold < cool < freezing < hot < warm
## [1] TRUE

Adjusting the order of a factor

temperatures <- factor(c("freezing", "cold", "cool", "warm", "hot"),
                       ordered = TRUE,
                       levels = c("freezing", "cold", "cool", "warm", "hot"))
temperatures
## [1] freezing cold     cool     warm     hot     
## Levels: freezing < cold < cool < warm < hot

Elements in factor are encoded as integers

temperatures <- c("freezing", "cold", "cool", "warm", "hot")
as.numeric(temperatures) # Error
temperatures <- factor(c("freezing", "cold", "cool", "warm", "hot"))
as.numeric(temperatures)

Factors sometimes are hard to handle…

all_time_fantasy <- factor(c("Steve Nash", "Paul Pierce", "Dirk Nowitzki"