I installed the “roasted marshmallows” version of R (2.15.1) from www.r-project.org, which went smoothly on my MacBookPro running Lion. I was happy to find its REPL ran easily on the command line in Terminal.
R is a free implementation of a dialect of the S language, the statistics and graphics environment for which John Chambers won the ACM Software Systems award. S was consciously designed to blur the distinction between users and programmers. S is a high-level programming language, with similarities to Scheme and Python. It is a good system for rapid development of statistical applications.
— R fundamentals
R has amazing built-in primitives and libraries for what scientists like to do with data and incredible graphing options. I struggled to find good, simple resources to approach learning R as a programmer. I loved this tutorial. Here are my cliff notes:
The most fundamental objects are vectors — basically an array, where index starts at 1.
Names of objects are case sensitive
Comments starts with ‘#’
n <- 25 v1 <- c(2,5,1,9) # create a vector, combine a list of columns v2 <- numeric(4) # initialize a vector of specific length (with zeros) v3 <- rep(3, 10) # initialize a vector with number: repeat 3, 10x v4 <- 1:4 # specify a vector using the range 1-4 v4 <- 4:1 # you can count down, 4,3,2,1 v4 <- 3:-1 # even to negative numbers: 3,2,1,0,-1 v4 <- seq(1, 1.5, by=.1) # easy to generate a sequence of equality spaced non-integers v5 <- c(v3, v4, 7) # c will combine or concatenate vectors
v2 <- 4 # use bracket notation to access (or set) an element v2[1:3] <- 4:6 # you can even access (or set) a range
For those of us who already know it from Math class (or computer graphics), vector math in R works the way you would expect:
- when an operation involves a vector and a number, the number is used to modify each element of the vector as specified by the operation
- when arithmetic involves two vectors of the same length, then the operation applies to elements with the same index.
Adding vectors of different length doesn’t really make sense in real life (although maybe there’s an application for that I don’t know about), but R conveniently defines that the shorter vector is repeated as often a needed to match the length of the longer vector.
Like many languages, and more importantly, like Math, functions are a name with its arguments enclosed in parentheses. Here are some common ones:
max(v1) mean(v1) sum(v1) prod(v1) length(v1) unique(v1)
By using parentheses for grouping, one can combine several expressions that involve
(sum(v2^2) - (sum(v2)^2)/length(v2)) / (length(v2) - 1)
A simpler way to get the same result would be to use the
The standard deviation is computed by using the expression
Comparisons are cool.
> 1 < 2  TRUE > 1:2 < 2:1  TRUE FALSE > v6 <- c(4,5,2,1,9) > v6[v6 < 3] # find all elements in a vector which are < 3  2 1 > length(v6[v6 < 3]) # count how many are < 3  2