R - Mean, Median & Mode

R has a rich set of built-in functions for performing statistical operations. In this section, we will discuss functions for calculating mean, median and mode in R language.

Mean

The R mean() function is used to calculate the arithmetic mean of the elements of a given numeric vector. The syntax for using this function is given below:

Syntax

mean(x, trim = 0, na.rm = FALSE)

Parameters

`x`	`Required.` Specify the numeric vector.
`trim`	`Optional.` Specify a fraction (0 to 0.5) of observations to be trimmed from each end of x. It is used to drop some observations from both end of the sorted vector.
`na.rm`	`Optional.` Specify TRUE to remove NA or NaN values before the computation. Default is FALSE.

Example:

The example below shows the usage of mean() function.

v <- c(10, 15, 20, 25, 30, 35)
cat("The vector contains:\n")
print(v)
cat("mean of all elements of the vector:", mean(v), "\n")

m <- matrix(c(10, 20, 30, 40, 50, 60), ncol=2)
cat("\nThe matrix contains:\n")
print(m)
cat("mean of all elements of the matrix:", mean(m))
cat("\nmean along first column of the matrix:", mean(m[,1]))

The output of the above code will be:

The vector contains:
[1] 10 15 20 25 30 35
mean of all elements of the vector: 22.5 

The matrix contains:
     [,1] [,2]
[1,]   10   40
[2,]   20   50
[3,]   30   60
mean of all elements of the matrix: 35
mean along first column of the matrix: 20

Using trim parameter

The trim parameter can be specified as fraction from (0 to 0.5). It is used to trim a fraction of observations from each end of the sorted vector.

Example:

In the example below, trim parameter is used to trim 25% of observation from each end of the vector. The sorted vector will contain -45, -25, -10, 5, 12, 23, 23, 40. The trim=0.25 option removed -45, -25, 23, 40 and therefore the mean is calculated over vector containing -10, 5, 12, 23, which is 7.5.

v <- c(-10, 12, 23, -25, 23, 5, 40, -45)
cat("The vector contains:\n")
print(v)
cat("mean of all elements of the vector:", mean(v), "\n")
cat("mean after trimmed it by 25% from each side:", 
       mean(v, trim=0.25), "\n")

The output of the above code will be:

The vector contains:
[1] -10  12  23 -25  23   5  40 -45
mean of all elements of the vector: 2.875 
mean after trimmed it by 25% from each side: 7.5

Using na.rm parameter

The na.rm parameter can be set TRUE to remove NA or NaN values before the computation.

Example:

Consider the example below to see the usage of na.rm parameter.

v1 <- c(10, 20, NA)
v2 <- c(10, 20, NaN)

cat("mean of all elements of v1:", mean(v1), "\n")
cat("mean after removing NA:", mean(v1, na.rm=TRUE), "\n")

cat("\nmean of all elements of v2:", mean(v2), "\n")
cat("mean after removing NaN:", mean(v2, na.rm=TRUE), "\n")

The output of the above code will be:

mean of all elements of v1: NA 
mean after removing NA: 15 

mean of all elements of v2: NaN 
mean after removing NaN: 15

Median

The R median() function is used to calculate the median of a given numeric vector. The syntax for using this function is given below:

Syntax

median(x, na.rm = FALSE)

Parameters

`x`	`Required.` Specify the numeric vector.
`na.rm`	`Optional.` Specify TRUE to remove NA or NaN values before the computation. Default is FALSE.

Example:

The example below shows the usage of median() function.

v <- c(10, 15, 20, 25, 30, 35)
cat("The vector contains:\n")
print(v)
cat("median of the vector:", median(v), "\n")

m <- matrix(c(10, 20, 30, 40, 50, 60), ncol=2)
cat("\nThe matrix contains:\n")
print(m)
cat("median of the matrix:", median(m))
cat("\nmedian along first column of the matrix:", median(m[,1]))

The output of the above code will be:

The vector contains:
[1] 10 15 20 25 30 35
median of the vector: 22.5 

The matrix contains:
     [,1] [,2]
[1,]   10   40
[2,]   20   50
[3,]   30   60
median of the matrix: 35
median along first column of the matrix: 20

Using na.rm parameter

The na.rm parameter can be set TRUE to remove NA or NaN values before the computation.

Example:

Consider the example below to see the usage of na.rm parameter.

v1 <- c(10, 20, NA)
v2 <- c(10, 20, NaN)

cat("median of v1:", median(v1), "\n")
cat("median after removing NA:", median(v1, na.rm=TRUE), "\n")

cat("\nmedian of v2:", median(v2), "\n")
cat("median after removing NaN:", median(v2, na.rm=TRUE), "\n")

The output of the above code will be:

median of v1: NA 
median after removing NA: 15 

median of v2: NA 
median after removing NaN: 15

Mode

In statistics, mode is the value which has highest number of occurrences in a given dataset. R does not have built-in function to calculate mode. But, we can define a function which can be used to calculate mode of a given dataset.

Example:

Consider the example below, where Mode() function is defined to calculate the mode of the passed argument. It can be used with numeric vector as well as character vector.

#creating a function to calculate mode
Mode <- function(x) {
  uniqx <- unique(x)
  uniqx[which.max(tabulate(match(x, uniqx)))]
}

#creating a vector with numbers
v1 <- c(10, 20, 30, 40, 30, 30, 50)
#creating a vector with characters
v2 <- c("this", "is", "a", "dog", "this", "this")

#calculating mode of vector v1
cat("mode of vector v1:", Mode(v1), "\n")
#calculating mode of vector v2
cat("mode of vector v2:", Mode(v2), "\n")

The output of the above code will be:

mode of vector v1: 30 
mode of vector v2: this