Creating Functions

User-defined Functions

One of the great strengths of R is the user’s ability to add functions. Sometimes there is a small task (or series of tasks) you need done and you find yourself having to repeat it multiple times. In these types of situations, it can be helpful to create your own custom function. The structure of a function is given below:

name_of_function <- function(argument1, argument2) {
    statements or code that does something
    return(something)
}
  • First you give your function a name.
  • Then you assign value to it, where the value is the function.

When defining the function you will want to provide the list of arguments required (inputs and/or options to modify behaviour of the function), and wrapped between curly brackets place the tasks that are being executed on/using those arguments. The argument(s) can be any type of object (like a scalar, a matrix, a dataframe, a vector, a logical, etc), and it’s not necessary to define what it is in any way.

Finally, you can “return” the value of the object from the function, meaning pass the value of it into the global environment. The important idea behind functions is that objects that are created within the function are local to the environment of the function – they don’t exist outside of the function.

Let’s try creating a simple example function. This function will take in a numeric value as input, and return the squared value.

square_it <- function(x) {
    square <- x * x
    return(square)
}

Once you run the code, you should see a function named square_it in the Environment panel (located at the top right of Rstudio interface). Now, we can use this function as any other base R functions. We type out the name of the function, and inside the parentheses we provide a numeric value x:

square_it(5)
[1] 25

Pretty simple, right? In this case, we only had one line of code that was run, but in theory you could have many lines of code to get obtain the final results that you want to “return” to the user.

Do I always have to return() something at the end of the function?

In the example above, we created a new variable called square inside the function, and then return the value of square. If you don’t use return(), by default R will return the value of the last line of code inside that function. That is to say, the following function will also work.

square_it <- function(x) {
   x * x
}

However, we recommend always using return at the end of a function as the best practice.

We have only scratched the surface here when it comes to creating functions! We will revisit this in later lessons, but if interested you can also find more detailed information on this R-bloggers site, which is where we adapted this example from.

Exercise

Basic
  1. Let’s create a function temp_conv(), which converts the temperature in Fahrenheit (input) to the temperature in Kelvin (output).
    • We could perform a two-step calculation: first convert from Fahrenheit to Celsius, and then convert from Celsius to Kelvin.
    • The formula for these two calculations are as follows: temp_c = (temp_f - 32) * 5 / 9; temp_k = temp_c + 273.15. To test your function,
    • if your input is 70, the result of temp_conv(70) should be 294.2611.
  2. Now we want to round the temperature in Kelvin (output of temp_conv()) to a single decimal place. Use the round() function with the newly-created temp_conv() function to achieve this in one line of code. If your input is 70, the output should now be 294.3.
# Basic

# 1
temp_conv <- function(temp_f) {
  temp_c = (temp_f - 32) * 5 / 9
  temp_k = temp_c + 273.15
  return (temp_k)
}

# 2
round(temp_conv(70), digits = 1)
[1] 294.3
Advanced

The Fibonacci sequence is \(0, 1, 1, 2, 3, 5, 8, ...\) where the first two terms are 0 and 1, and for all other terms \(n^{th}\) term is the sum of the \((n-1)^{th}\) and \((n-2)^{th}\) terms. Note that for n=0 you should return 0 and for n=1 you should return 1 as the first 2 terms.

  1. Write a function fibonacci which takes in a single integer argument n and returns the \(n^{th}\) term in the Fibonacci sequence.

  2. Have your function stop with an appropriate message if the argument n is not an integer. Stop allows you to create your own errors in R. This StackOverflow thread contains useful information on how to tell if something is or is not an integer in R.

# Advanced
fibonacci <- function(n){
  
  # These next 3 lines are part 2
  if((n %% 1)!=0){
    stop("Must provide an integer to fibonacci")
  }
  fibs <- c(0,1)
  for (i in 2:n){
    fibs <- c(fibs, fibs[i-1]+fibs[i])
  }
  return(fibs[n+1])
}
Challenge

Re-write your fibonacci function so that it calculates the Fibonacci sequence recursively, meaning that it calls itself. Your function should contain no loops or iterative code.

You will need to define two base cases, where the function does not call itself.

#Challenge
fibonacci2 <- function(n){
  if((n %% 1)!=0){
    stop("Must provide an integer to fibonacci")
  }
  # We call these two if statement the 'base cases' of the recursion
  if (n==0){
    return(0)
  }
  if (n==1){
    return(1)
  }
  # And this is the recursive case, where the function calls itself
  return(fibonacci2(n-1)+fibonacci2(n-2))
}

Recursion isn’t relevant to most data analysis, as it is often significantly slower than a non-recursive solution in most programming languages.

However, setting up a solution as recursive sometimes allows us to perform an algorithmic strategy called dynamic programming and is fundamental to most sequence alignment algorithms.


The materials in this lesson have been adapted from work created by the HBC and Data Carpentry, as well as materials created by Laurent Gatto, Charlotte Soneson, Jenny Drnevich, Robert Castelo, and Kevin Rue-Albert. These are open access materials distributed under the terms of the Creative Commons Attribution license (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.