The goal of this module is to get everyone’s computers set up for the semester and to provide background and an introduction to the R programming language and the R environment.
The name R is a nod to the statistical programming language S (for “Statistics”) that inspired its creation. S was developed at Bell Laboratories by John Chambers and later sold to a small company that further developed it into S-Plus. R was then developed as an alternative to S by Ross Ihaka and Robert Gentleman in the Department of Statistics at the University of Aukland, New Zealand.
R is a high-level, interpreted language, like Python or Ruby, where commands are executed directly and sequentially, without previously compiling a program into machine-language instructions. Each statement is translated, on the fly, into a sequence of subroutines that have already been compiled into machine code.
R is open-source software, meaning that the source code for the program is freely available for anyone to modify, extend, and improve upon. R is also FREE (!) for anyone to use and distribution. The large and active community of users and developers is one of the reasons that R has become very popular in academics, science, engineering, and business - any field that requires data analytics. Developers have also built in the capacity for easily making production-quality graphics, making it a great tool for data visualization. There are thus many good reasons to learn and use R.
Here are a few of the main ones, in a nutshell:
R can be run in several ways: - Interactively from the command line in a terminal window. - In batch mode, by sourcing commands from an R script file (which is a simple text file). - From within an R graphical user interface (or GUI) or integrated development environment (or IDE), which accommodates both of the above.
We are going to introduce several of these ways of working with R, but the easiest and most convenient is to use an IDE.
Open the R program from wherever you
installed it… you should see the console window and the
>
prompt. Note that your screen may look slightly
different from the screenshots below. Also, note that you can also run
R in a terminal window (MacOS or Unix) or from
the Windows command shell after starting it with the command
“r”.
On MacOS…
<-
to assign a
value, the results of an operation, or specific code to an
object** (e.g., a variable, a function, a complex data
structure)
=
, but I prefer to use that only to
assign values to function arguments (more on this later)ls()
with no argumentsoptions()
with the prompt
argument:
options(prompt="")
[where you supply, between the quotes,
text with what you want the prompt to say]getwd()
, which has no argumentssetwd("")
[where you supply, between the quotes, the path
to the desired directory]
⌘-RETURN
(Mac) or control-R
(PC)⌘-RETURN
(Mac) or control-R
(PC)#
ls()
functionrm()
function [where an individual object’s name or a list of object names is
included as the argument to rm()
]rm(list=ls())
[in this case, you are passing to
rm()
a list consisting of all the objects in the workspace,
provided by the ls()
functionTry interacting with R via the command line or console window.
Try doing some math in R by using it to evaluate the followinge expressions:
8 + 5
## [1] 13
10 - 6/2
## [1] 7
(10 - 6)/2
## [1] 2
10 * 5
## [1] 50
15/5
## [1] 3
10^5
## [1] 1e+05
3 * pi
## [1] 9.424778
Try working with assignments:
x <- 6
x
## [1] 6
y <- 5
y
## [1] 5
z <- x * y
z
## [1] 30
x2 <- x^2
x2
## [1] 36
Try out some of the built-in functions of R functions:
log()
functionfactorial()
functionsqrt()
function?round()
to view the help file for the function
round()
?abs()
to view the help file for the function
abs()
x <- 10
log(x)
## [1] 2.302585
factorial(x)
## [1] 3628800
y <- 81
sqrt(y)
## [1] 9
z <- -8.349218
round(z, digits = 3)
## [1] -8.349
abs(z * y)
## [1] 676.2867
ls()
function to list the variables currently
stored in your active session. How many do you have?ls()
## [1] "x" "x2" "y" "z"
rm(list=ls())
to clear all the
variables you have defined.One of the fantastic things about R, and
one of the reasons it is such a flexible tool for so many types of data
analysis, is the ability to extend its functionality with
packages. Packages are sets of reusable
R functions created by the core development
team or by users and are akin to libraries in other programming
software, like Python
. They can
be installed into R and then attached to a
workspace (using the require()
or library()
functions), which then gives the user access to the functions contained
therein.
If a package is loaded that has a function with the same name as one
in a previously loaded package, then the older one is masked
and the more recently attached one is used if called by a user. Both
alternative versions of the function can be called explicitly by using
the ::
operator, using the construct
package-name::function
.
install.packages("")
[where you
include, between the quotes, the name of the package you want to
install]. This command installs the package, by default, to the USER
level, though this can be changed by providing a path to install to
using the lib
argument. Other arguments for this function
can be set to specify the repository to download from, etc.Installing packages simply places them into a standard location on your computer. To actually use the functions they contain in an R session, you need to also load them into your R workspace.
To load packages from the base MacOS GUI:
To load packages from the base Windows GUI:
To load packages from the R console prompt or in a script:
library()
or require()
with the package name as an argument
require()
function is nearly identical to the
library()
function except that the former is safer to use
inside functions because it will not throw an error if a package is not
installed. require()
also returns a value of
TRUE or FALSE depending on whether the
package loaded successfully or not. I almost always have used the
library()
function in my scripts.You can use the following to list the set of packages you have installed:
library()
installed.packages()
The command (.packages())
will give you a list of
packages loaded in your current workspace
The command detach(package:XXX)
[where XXX, not in
quotes, is the name of the package you want to unload] will unload a
currently loaded package
To update your installed packages to the latest version, you can use:
update.packages()
Apart from the GUIs included in the MacOS and Windows installations of R, there are several IDEs that connect to the R interpretator and provide lots of convenient functionality , but one of the most versatile and easy to use (and my favorite) is RStudio.
Note that the workspace you see is divided into four separate panes (Source and Console panes on the left, two customizable panes on the right). You can modify the layout and appearance of the RStudio IDE to suit your taste by selecting Preferences from the RStudio menu (MacOS) or by selecting Global Options from the Tools menu (both MacOS and Windows).
The Source pane is where you work with and edit various file types (e.g., scripts), while the Console pane is where you send commands to the R interpreter and see the results of those commands. The other two customizable panes provide easy access to useful tools and overviews of your interactions with R. For example, the Environment tab lists all of the objects in your current workspace, the History tab shows the log of all of the commands you have sent to the interpreter, and the Packages tab provides a convenient interface for installing and loading packages.
Repeat Challenge 1 from above using the editor and console in RStudio.
NOTE: In both the base GUI that ships with the
R application and in
RStudio, the console supports code
completion. Pressing TAB
after starting to type a
function or variable name will give you suggestions as to how to
complete what you have begun to type. In
RStudio, this functionality is present also
when you are typing code in the text editor in the
Source pane. Also helpful in
RStudio are popup windows that accompany code
completion that show, for each function, what possible arguments that
function can take and their default values.
Almost everything in R can be thought of as an object, including variables, functions, and complex data structures.
Objects in R fall into different classes or types. There are a few basic (or “atomic”) classes for objects: numeric (real numbers), integer (integer numbers), character (for text), logical (Boolean values, i.e., TRUE or FALSE, represented as 1 and 0, respectively), complex (for imaginary numbers), and factor (for defined levels of categorical variables… we will talk more about factors later on). Both built-in and user defined functions have the class function.
R also supports a variety of data structure objects, the most basic of which is the vector. Vectors are variables with one or more values of the same type, e.g., student’s grades on an exam. The class of a vector has to be one of the atomic classes described above. [A scalar variable, such as a constant, is simply a vector with only 1 value.]
c()
or “concatenate” command:x <- c(15, 16, 12, 3, 21, 45, 23)
x
## [1] 15 16 12 3 21 45 23
y <- c("once", "upon", "a", "time")
y
## [1] "once" "upon" "a" "time"
z <- "once upon a time"
z
## [1] "once upon a time"
f <- function() {
# code to be evaluated
}
# this is the minimal definition for a function
class()
function to check.x <- c("2", 2, "zombies")
? What is the class of
x now?mean()
? To
check, use class(mean)
Another way to create a vector is to use the :
sequence
operator:
x <- 1:10
x
## [1] 1 2 3 4 5 6 7 8 9 10
Some objects in R also have attributes, which we can think of as metadata, or data describing the object. A useful attribute to query about a vector object is the number of elements in it.
length(x)
## [1] 10
Try some vector math using the console in RStudio
class()
function. What is the length of
x?mean()
and sd()
functions to
calculate the mean and standard deviation of s.x <- 15:28 # or x <- c(15, 16, 17...)
y <- 1:4
x + y
## Warning in x + y: longer object length is not a multiple of shorter object
## length
## [1] 16 18 20 22 20 22 24 26 24 26 28 30 28 30
s <- x + y
## Warning in x + y: longer object length is not a multiple of shorter object
## length
s
## [1] 16 18 20 22 20 22 24 26 24 26 28 30 28 30
class(s)
## [1] "integer"
length(s)
## [1] 14
z <- c(10, 100)
x * z
## [1] 150 1600 170 1800 190 2000 210 2200 230 2400 250 2600 270 2800
mean(s)
## [1] 23.85714
sd(s)
## [1] 4.4003
As noted above, scripts are text files that store R commands and they can be used to link together sets of commands to perform complete analyses and show results. R scripts may be constructed using lines of individual code to accomplish tasks, but if the code will be executed repeatedtly, it is much better to organize the code into user-defined functions. A function is a bit of code that performs a specific task. It may take arguments or not, and it may return nothing, a single value, or any R object (e.g., a vector or a list, which is another data structure will discuss later on). If care is taken to write functions that work under a wide range of circumstances, then they can be reused in many different places. Novel functions are the basis of the thousands of user-designed packages that are what make R so extensible and powerful.
If you save a script, you can then use the source()
function (with the path to the script file of interest as an argument)
at the console prompt (or in another script) to read and execute the
entire contents of the script file. In RStudio
you may also go to Code > Source to run an entire
script, or you can run lines in the script by opening it, highlighting
the lines of interest, and sending those lines to the console using the
Run button or the appropriate keyboard shortcut,
⌘-RETURN
(Mac) or control-R
(PC).
Try writing a script containing a function.
sayhi()
function,
which adds a name to a greeting:sayhi <- function(x) {
hi <- paste("greetings, ", x, "!", sep = "")
# the paste command allows string concatenation
return(hi)
}
Now, send this function to R by
highlighting it and hitting ⌘-RETURN
(Mac) or
control-R
(PC) to automatically paste it to the
console.
name1 <- "septimus"
name2 <- "thomasina"
sayhi(name1)
## [1] "greetings, septimus!"
sayhi(name2)
## [1] "greetings, thomasina!"
Working in RStudio, you can save script files (which, again, are just plain text files) using standard dialog boxes.
When you go to quit R (by using the
q()
function or by trying to close
RStudio), you will be asked whether you want
to
Save workspace image to XXX/.Rdata?
[where XXX is the
path to your working directory].
Saying “yes” will store all of the contents of your workspace in a single hidden file, .Rdata. The leading “.” makes this invisible to most operating systems, unless you deliberately make it possible to see hidden files. The next time you start R, this workspace will be loaded again automatically, provided you have not changed your working directory.
A second hidden file, .Rhistory, will also be stored in the same directory, which will contain a log of all commands you sent to the console during your R session.
R has been under continuous and active development since its inception in the late 1990s, and several updates are made available each year. These updates help to fix bugs, improve speed and computational efficiency, and add new functionality to the software. The following information on how to update R is based on this post from R Bloggers:
tmp <- installed.packages()
installedpkgs <- as.vector(tmp[is.na(tmp[, "Priority"]), 1])
save(installedpkgs, file = "installed_old.rda")
load("installed_old.rda")
tmp <- installed.packages()
installedpkgs.new <- as.vector(tmp[is.na(tmp[, "Priority"]), 1])
missing <- setdiff(installedpkgs, installedpkgs.new)
# Run these last two lines separately... sometimes they require
# interactive commands:
install.packages(missing)
update.packages()
chooseBioCmirror()
BiocManager::install()
load("installed_old.rda")
tmp <- installed.packages()
installedpkgs.new <- as.vector(tmp[is.na(tmp[, "Priority"]), 1])
missing <- setdiff(installedpkgs, installedpkgs.new)
for (i in 1:length(missing)) BiocManager::install(missing[i])
If you’re using a Windows computer, you can use the package {installr} to update R and all your packages.
If you’re using a Mac, you can use the package {updateR}, but only from base R (not within RStudio).
Finally, to reassure yourself that you have done everything correctly, type these two commands in the RStudio console to see what you’ve got in terms of what version of R you are running and what packages you have installed:
version
packageStatus()