http://www.jeremymiles.co.uk/regressionbook/extras/appendix2/R/

Of course the R-manual is great:

http://cran.r-project.org/doc/manuals/r-release/R-intro.html

However R is a huge system and when you are getting started the entire manual can be daunting, which is why I prefer a tutorial to get me started, and then start looking up more functionality in the manual.

Here are some tips and tricks that I commonly use:

**list all objects**

ls()

**remove an object**

rm()

**remove all objects**

rm(list = ls())

**load data from csv file**

reference: http://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html

data <- read.csv("myfile.csv")

- myfile.csv has headers for each column that are used as variable names
- if a column header starts with a number (e.g. "9") the corresponding variable name will start with an "X" (e.g. "X9")
- the variable "data" contains all file data, the column headers are elements of the data object
- for example data.frac_change accesses the data in column "frac_change" within myfile.csv

**make an object the "root" of all subsequent calls**

rather than type data.frac_change and data.X1 etc. you can make the object "data" automatically assumed to be preprended

attach(data)

**calculate linear regression**

fit the data in frac_change to a linear combination of the data in X1, X2, X3 and a constant

glm.linear = glm(frac_change ~ X1 + X2 + X3)

the object glm.linear now contains the best fit linear regression model.

Also the function bayesglm carries out the regression but assumes priors for the coefficients and then...?

the object glm.linear now contains the best fit linear regression model.

Also the function bayesglm carries out the regression but assumes priors for the coefficients and then...?

**see details of the linear regression model**

glm.linear

or

summary(glm.linear)

**calculate model's predicted values**

predictions <- fitted.values(glm.linear)

**histogram of data**

hist(frac_change)

hist(frac_change, 24)

(uses 24 bins)

**graph / plot data**

plot (predictions, frac_change)

(a good quick graph to show if your predictions match the actual data)

**sorting data**

sorted = sort(predictions, index.return = TRUE)

index.return = TRUE tells R that you want the indexes (into the original data) of the sorted value returned as well

sorted.x contains the sorted values

sorted.ix contains the indexes into the original data of the sorted values

**user-defined functions**

I do it very simply in that I just paste the function definition into R, I know there are much better ways to do it. A function I use frequently is below. Taking as inputs a vector of sort indexes (indexes of sorted values), a vector of values (not necessarily directly the sorted values), and a lower and upper fraction, the function determines the set of the sorted indexes corresponding to those fractional values and then returns the values from the value vector at those indexes. For example, if lowerFrac = 0.0, upperFrac = 0.10, then the sort indexes corresponding to the bottom 10% are used to pull the values from reterieveVector.

```
getSubset <- function(sortIndexes, retrieveVector, lowerFrac, upperFrac) {
```

##sortIndexes - indexes of sorted reference vector, will be used to pick range from retrieveVector

lowerIndex = round(lowerFrac * length(sortIndexes))

if (lowerIndex == 0) {

lowerIndex = 1

}

upperIndex = round(upperFrac * length(sortIndexes))

print("lowerIndex upperIndex")

print(c(lowerIndex, upperIndex))

subsetSortIndexes = sortIndexes[lowerIndex:upperIndex]

result = list(x = retrieveVector[subsetSortIndexes], ix = subsetSortIndexes)

result

}

**Vector manipulation**

1-based indexes, so to get the first element of a:

a[1]

to get a range of elements - in this case the 2nd through the 5th

a[2:5]

create a vector that is a sequence of integers, e.g. 2 through 11

2:11

creates: 2 3 4 5 6 7 8 9 10 11

create a vector that is a sequence of numbers, e.g. from -0.015 to 0.015 increment by 0.00125:

seq(-0.015, 0.015, 0.00125)

creates:

-0.01500 -0.01375 -0.01250 -0.01125 -0.01000 -0.00875 -0.00750 -0.00625 -0.00500 -0.00375 -0.00250 -0.00125 0.00000 0.00125 0.00250 0.00375 0.00500 0.00625 0.00750 0.00875 0.01000 0.01125 0.01250 0.01375 0.01500

**Graphing**

plot

xlim = c(-2, 2) sets the x-axis limits to -2 and 2

ylim = c(-2, 2) does the same but for the y-axis

windows() to create a new window to plot in

points(), lines() act like plot but add the data to the existing window

## No comments:

## Post a Comment