This lesson has passed peer-review! See the publication in JOSE.

Data Frame Manipulation

Overview

Teaching: 10 min
Exercises: 10 min
Questions
  • Data-frames. What are they, and how to manage them?

Objectives
  • Understand what is a data-frame and learn to manipulate it.

Data-frames: The power of interdisciplinarity

Data-frames are the powerful data structures in R. Let’s begin by creating a mock data set:

> musician <- data.frame(people = c("Medtner", "Radwimps", "Shakira"),
						 pieces = c(722,187,68),
 						 likes = c(0,1,1))
> musician

The content of our new object:

    people pieces likes
1  Medtner    722     0
2 Radwimps    187     1
3  Shakira     68     1

We have just created our first data-frame. We can see if this is true using the class() command:

> class(musician)
[1] "data.frame"

A data-frame is a collection of vectors (i.e. a list) whose components must be of the same data type within each vector:

Dataframe shown as table with columns named: people, pieces, likes. And rows names: 1,2,3 Figure 3. Structure of the created data-frame.

We can begin to explore our new object by pulling out columns using the $ operator. In order to use it, you need to write the name of your data-frame, followed by the $ operator and the name of the column you want to extract:

> musician$people
[1] "Medtner"  "Radwimps" "Shakira" 

We can do operations with the columns:

> musician$pieces + 20
[1] 742 207  88

Moreover, we can change the data type of one of the columns. Using the next line of code we can see if the musicians are popular or not:

> typeof(musician$likes)
[1] "double"
> musician$likes <- as.logical(musician$likes)
> paste("Is",musician$people, "popular? :", musician$likes, sep = " ")
[1] "Is Medtner popular? : FALSE" "Is Radwimps popular? : TRUE" "Is Shakira popular? : TRUE"

Finally, we can extract information from a specific place in our data by using the “matrix” nomenclature [-,-], where the first number inside the brackets specifies the row number, and the second the column number:

Dataframe shown as table, showing that [1,] corrseponds to row 1, [2,] to row two, [3,] to row 3, [,1] to clumn 1, [,2] to column 2, [,3] to column 3. And pinting to location [1,2] that corresponds to the number 772 Figure 4. Extraction of specific data in a data-frame and a matrix.

> musician[1,2]  # The number of pieces that Nikolai Medtner composed
[1] 722

We can also call for that data by calling the column by it’s name

> musician[1,"pieces"]  # The number of pieces that Nikolai Medtner composed
[1] 722

Exercise 2:

Complete the lines of code to obtain the required information

Code Information required
> musician[__,__] Pieces composed by Shakira
> (musician____)_2 Pieces composed by all musicians if they were half of productive (The half of their actual pieces)
> musician$___ <- c(,,___) Redefine the likes column to make all the musicians popular!

がんばって! (ganbatte; good luck):

Solution

Code Information required
> musician[3,”pieces”] Pieces composed by Shakira
> (musician$pieces)/2 Pieces composed by all musicians if they were half of productive (The half of their actual pieces)
> musician$likes <- c(“TRUE”,”TRUE”,”TRUE”) Redefine the likes columne to make all the musicians popular!

Key Points

  • Data-frames contain multiple columns with different types of data.