Practice Set: Loading/Importing Files
This time we’ll talk about several little things, but all relate to importing your own data files. Most of us are not creating data.frames from scratch every time. Instead, our data is collected elsewhere, and stored as a file that a colleague shares with you, you download from the internet, etc.
Directories
Your computer is just a series of folders. Folders are also known as directories. R
will only look and save things to your working directory. To find your current working directory, you can type getwd()
in your console. To change your working directory to somewhere else, you can use setwd("/file/path/goes/here")
.
Fun Fact: The ~
sign helps shorten some of the really long file paths. Try setwd("~/rest/of/file/path")
. It usually points to the User’s directory. For instance, on my computer, it is set to /Users/Coop
.
Organize your files/projects!
YOU NEED to have some sort of structure to your projects. Play around with what’s right for you. Generally speaking, a good idea is something like this:
- Project Name (top level directory)
- Data (sub-directory of Project Name)
- Code (sub-directory)
- Results (sub-directory)
You can be even more organized and thorough, depending on your project. But those above are a good starting point. Here’s an example of the directory structure of one of my projects. I’ve changed the names to be meaningful here.
- Project Name
- data
- data from source 1
- data from source 2
- data from source 3
- code
- preprocessing code
- postprocessing code
- results
- figures
- manuscripts
- data
Again, things will change based on your specific projects. But keeping things organized and using directories to your advantage will help you in the long run!
WARNING: You need to save your files to the cloud. Absolutely do not save anything of importance to a Desktop or any old place on your computer. Box, Dropbox, Github, Open Science Framework, or any internal tool your school/company uses is fine. Always assume that your computer could get stolen tomrorow. You would be devastated if all of your work was suddenly gone.
Ok, I’ll get off the soapbox now…
RProjects
RProjects are a specific type of file. It’s not one that you edit directly, though. Instead, it contains metadata that is useful for keeping things organized. We are going to combine a RProject file along with a package called here
so that we never have to type all of these file paths again! Hooray!
You try! Follow along with Shelly as we create an RProject for this class. Together, we are going to do the following steps:
- Create a top level directory on Box. Call it
Psych4175
or something similar. - Create sub-directories (folders) for assignments, data, notes, and slides
- Download all datasets from Canvas and put it in the
data
sub-directory - Install the
here
package (only do once!) - Follow along as we open our project. See what happens when the whole thing moves to a totally different folder.
Importing Data
For a lot of us, our data will be stored as either .csv
or .txt
files. R
can handle many types of files but we’ll just practice with .csv
. (For more on importing other types, ask your old friend Google)
You can use the RStudio
interface to import your file (see the slides for how to do this). But it’s a lot of steps for something so simple! Let’s do it the lazy way.
To read in a .csv
The function for reading in a .csv
file is read.csv()
and the input is the file path.
To read in a .txt
The function for reading in a .txt
file is read.delim()
and the input is the file path.
File Path What?
Remember that long "/Users/Coop/Box Sync/Class/Data"
line that was our file path? We would have to type that every single time. But now that we have an RProject that contains metadata, we don’t need to do anything special! Instead, here’s what we’ll do:
library(here)
data <- read.csv(here("data/some_data_file.csv"))
Quick Quiz
Assume we have a project whose structure looks like this:
- Project Name
- data
- data from source 1
- data from source 2
- data from source 3
- code
- preprocessing code
- postprocessing code
- results
- figures
- manuscripts
- data
You try! (on your own computer; not in this browser)
- Open a new script file. Make sure it is in the
notes
sub-directory - Load in the
nhanes_small.csv
dataset