R - Mean, Median & Mode
R has a rich set of built-in functions for performing statistical operations. In this section, we will discuss functions for calculating mean, median and mode in R language.
Mean
The R mean() function is used to calculate the arithmetic mean of the elements of a given numeric vector. The syntax for using this function is given below:
Syntax
mean(x, trim = 0, na.rm = FALSE)
Parameters
x |
Required. Specify the numeric vector. |
trim |
Optional. Specify a fraction (0 to 0.5) of observations to be trimmed from each end of x. It is used to drop some observations from both end of the sorted vector. |
na.rm |
Optional. Specify TRUE to remove NA or NaN values before the computation. Default is FALSE. |
Example:
The example below shows the usage of mean() function.
v <- c(10, 15, 20, 25, 30, 35) cat("The vector contains:\n") print(v) cat("mean of all elements of the vector:", mean(v), "\n") m <- matrix(c(10, 20, 30, 40, 50, 60), ncol=2) cat("\nThe matrix contains:\n") print(m) cat("mean of all elements of the matrix:", mean(m)) cat("\nmean along first column of the matrix:", mean(m[,1]))
The output of the above code will be:
The vector contains: [1] 10 15 20 25 30 35 mean of all elements of the vector: 22.5 The matrix contains: [,1] [,2] [1,] 10 40 [2,] 20 50 [3,] 30 60 mean of all elements of the matrix: 35 mean along first column of the matrix: 20
Using trim parameter
The trim parameter can be specified as fraction from (0 to 0.5). It is used to trim a fraction of observations from each end of the sorted vector.
Example:
In the example below, trim parameter is used to trim 25% of observation from each end of the vector. The sorted vector will contain -45, -25, -10, 5, 12, 23, 23, 40. The trim=0.25 option removed -45, -25, 23, 40 and therefore the mean is calculated over vector containing -10, 5, 12, 23, which is 7.5.
v <- c(-10, 12, 23, -25, 23, 5, 40, -45) cat("The vector contains:\n") print(v) cat("mean of all elements of the vector:", mean(v), "\n") cat("mean after trimmed it by 25% from each side:", mean(v, trim=0.25), "\n")
The output of the above code will be:
The vector contains: [1] -10 12 23 -25 23 5 40 -45 mean of all elements of the vector: 2.875 mean after trimmed it by 25% from each side: 7.5
Using na.rm parameter
The na.rm parameter can be set TRUE to remove NA or NaN values before the computation.
Example:
Consider the example below to see the usage of na.rm parameter.
v1 <- c(10, 20, NA) v2 <- c(10, 20, NaN) cat("mean of all elements of v1:", mean(v1), "\n") cat("mean after removing NA:", mean(v1, na.rm=TRUE), "\n") cat("\nmean of all elements of v2:", mean(v2), "\n") cat("mean after removing NaN:", mean(v2, na.rm=TRUE), "\n")
The output of the above code will be:
mean of all elements of v1: NA mean after removing NA: 15 mean of all elements of v2: NaN mean after removing NaN: 15
Median
The R median() function is used to calculate the median of a given numeric vector. The syntax for using this function is given below:
Syntax
median(x, na.rm = FALSE)
Parameters
x |
Required. Specify the numeric vector. |
na.rm |
Optional. Specify TRUE to remove NA or NaN values before the computation. Default is FALSE. |
Example:
The example below shows the usage of median() function.
v <- c(10, 15, 20, 25, 30, 35) cat("The vector contains:\n") print(v) cat("median of the vector:", median(v), "\n") m <- matrix(c(10, 20, 30, 40, 50, 60), ncol=2) cat("\nThe matrix contains:\n") print(m) cat("median of the matrix:", median(m)) cat("\nmedian along first column of the matrix:", median(m[,1]))
The output of the above code will be:
The vector contains: [1] 10 15 20 25 30 35 median of the vector: 22.5 The matrix contains: [,1] [,2] [1,] 10 40 [2,] 20 50 [3,] 30 60 median of the matrix: 35 median along first column of the matrix: 20
Using na.rm parameter
The na.rm parameter can be set TRUE to remove NA or NaN values before the computation.
Example:
Consider the example below to see the usage of na.rm parameter.
v1 <- c(10, 20, NA) v2 <- c(10, 20, NaN) cat("median of v1:", median(v1), "\n") cat("median after removing NA:", median(v1, na.rm=TRUE), "\n") cat("\nmedian of v2:", median(v2), "\n") cat("median after removing NaN:", median(v2, na.rm=TRUE), "\n")
The output of the above code will be:
median of v1: NA median after removing NA: 15 median of v2: NA median after removing NaN: 15
Mode
In statistics, mode is the value which has highest number of occurrences in a given dataset. R does not have built-in function to calculate mode. But, we can define a function which can be used to calculate mode of a given dataset.
Example:
Consider the example below, where Mode() function is defined to calculate the mode of the passed argument. It can be used with numeric vector as well as character vector.
#creating a function to calculate mode Mode <- function(x) { uniqx <- unique(x) uniqx[which.max(tabulate(match(x, uniqx)))] } #creating a vector with numbers v1 <- c(10, 20, 30, 40, 30, 30, 50) #creating a vector with characters v2 <- c("this", "is", "a", "dog", "this", "this") #calculating mode of vector v1 cat("mode of vector v1:", Mode(v1), "\n") #calculating mode of vector v2 cat("mode of vector v2:", Mode(v2), "\n")
The output of the above code will be:
mode of vector v1: 30 mode of vector v2: this