R - Data Structures
In R, it is very important to understand the data structures. Data structures are the objects that you will use and manipulate on a day-to-day basis in R.
R has many data structures, which are categorized below:
- Vectors
- Lists
- Matrices
- Arrays
- Factors
- Data Frames
R base data structures can be categorized by their dimension (1-D, 2-D, or n-D) and whether they are homogeneous (data of same type) or heterogeneous (data of different types).
Dimension | Homogeneous | Heterogeneous |
---|---|---|
1-D | Atomic vector | List |
2-D | Matrix | Data frame |
n-D | Array |
Vectors
Vectors are the most common and basic data structures in R. They are a one-dimensional homogeneous data structures. There are six types of atomic vectors such as logical, integer, character, double, and raw.
Example:
In the example below, a vector is created using c() function.
#creating a vector. vec <- c("Red", "Blue", "Green") #printing the vector print(vec) #printing the class of the vector print(class(vec))
The output of the above code will be:
[1] "Red" "Blue" "Green" [1] "character"
Lists
Lists are a heterogeneous data structure and can contain many different types of elements inside it. The elements of a list can be numeric, characters, vectors, character vectors, matrices, arrays, lists, and functions.
Example:
In the example below, a list is created and printed.
#creating lists using c() function list1 <- c(10, 20, 30) list2 <- c("Red", "Blue", "Green") #creating a atomic vector NoOfColors <- 3 #combining all the created data types #into a list using list() function MyList <- list(list1, list2, NoOfColors, 1:5) #printing the list print(MyList)
The output of the above code will be:
[[1]] [1] 10 20 30 [[2]] [1] "Red" "Blue" "Green" [[3]] [1] 3 [[4]] [1] 1 2 3 4 5
Matrices
Matrices are two-dimensional, homogeneous data structures. Matrices are not a separate type of object but simply an atomic vector with dimensions; the number of rows and columns. As like atomic vectors, the elements of a matrix must be of the same data type.
A Matrix can be created using a vector input to the matrix function.
Example:
In the example below, a matrix is created and printed.
#creating a matrix mat <- matrix( c(10, 20, 30, 40, 50, 60), nrow = 2, ncol = 3, byrow = TRUE) #printing the matrix print(mat)
The output of the above code will be:
[,1] [,2] [,3] [1,] 10 20 30 [2,] 40 50 60
Arrays
Arrays are n-dimensional homogeneous data structures. While matrices are confined to two dimensions, arrays can be of any number of dimensions. For example, an array of dimensions (2, 3, 3) contains 3 rectangular matrices each with 2 rows and 3 columns. The array function takes a dim attribute which creates the required number of dimension.
Example:
In the example below, an array is created using a vector.
#creating an arr arr <- array(c(10, 20, 30, 40), dim = c(3,3,2)) #printing the array print(arr)
The output of the above code will be:
, , 1 [,1] [,2] [,3] [1,] 10 40 30 [2,] 20 10 40 [3,] 30 20 10 , , 2 [,1] [,2] [,3] [1,] 20 10 40 [2,] 30 20 10 [3,] 40 30 20
Factors
Factors are the data objects which are used to categorize the data and store it as levels. It stores the vector along with the distinct values of the elements in the vector as labels. The labels are always character irrespective of whether it is numeric or character or Boolean etc. in the input vector. They are useful in statistical modeling.
Factors are created using the factor() function. The nlevels function gives the count of levels.
Example:
In the example below, a factor is created using a vector.
#creating a vector gender <- c("Male", "Female", "Female", "Male", "Male") #creating a factor obkect fac <- factor(gender) #printing the array print(fac) print(nlevels(fac))
The output of the above code will be:
[1] Male Female Female Male Male Levels: Female Male [1] 2
Data Frames
Data frames are tabular data objects which are used to store the tabular data. They are two-dimensional, heterogeneous data structures. Unlike a matrix, each column in a data frame can contain different types of data. It is a list of vectors of equal length.
Data Frames are created using the data.frame() function.
Example:
In the example below, a data frame is created which contains three columns.
#creating a data frame Info <- data.frame( Name = c("John", "Marry", "Kim", "Ramesh"), City = c("London", "New York", "Paris", "Mumbai"), Age = c(28, 30, 25, 31) ) print(Info)
The output of the above code will be:
Name City Age 1 John London 28 2 Marry New York 30 3 Kim Paris 25 4 Ramesh Mumbai 31