I. Introduction

One of the most basic things one usually needs to know when using R, is how does one get data into R so that you can analyze it. This page will demonstrate a simple an easy way on how to get data into R, will review assignment operators, and will discuss various conventions when it comes to variable names.

II. Entering Data

There are a variety of ways to get data into R, but if your dataset is small, manually entering the data is rather straightforward using the c() function. Technically, the "c" stands for concatenate. However, it may be easier just to think that "c" stands for combine :)

R operates on named data structures. The simplest such structure is the numeric vector, which is a single entity consisting of an collection of numbers. As an example, lets create a simple vector named x consisting of four numbers, namely 13, 21, 23.4, and 7.4. To create the vector, and then assign the data to the variable named x, use the following R code:

> # ASSIGN THE DATA TO VARIABLE x
> x = c(13, 21, 23.4, 7.4)
>
> # DISPLAY THE DATA STORED IN X
> x
[1] 13.0 21.0 23.4  7.4

Example 2:

Enter the following dataset into R: {11, 14, 22, 15} and label the data my.data

> # ASSIGN DATA TO VARIABLE NAMED my.data
> my.data=c(11,14,22,15)
>
> # VIEW THE DATA ASSIGNED TO THE VARIABLE my.data
> my.data
[1] 11 14 22 15

III. Assignment Operators

In R, either  "=" or "<-" can be used to assign data to a variable. Note however that their are slight differences between the two. For this website, we will stick with just using " = " as it is easier for beginning students to pick up. However, should a student wish to become more advanced, it is recommended that he or she become familiar with both.

> # ASSIGN USING "="
> data_1 = c(1, 2, 3)
> data_1
[1] 1 2 3
>
> # ASSIGN USING "<-"
> data_2 <- c(4, 5, 6)
> data_2
[1] 4 5 6

IV. Naming Variables

In R, one has a great degree of flexibility when naming variables. However, certain conventions should generally be followed.

Convention 1: Try to get away from simple variable names commonly used in previous math classes, such as x, y, and z. One should work to create variable names that represent the data. For example, if your data represents the relationship between height and weight use height and weight as the variable names. This will make it easier for you, and others, to understand.

 # GOOD height = c(57, 68, 55, 44) weight = c(134, 153, 154, 122) # BAD x = c(57, 68, 55, 44) y = c(134, 153, 154, 122)

Convention 2: In general, don't use capitalized letters. Once you start using capitalized letters, it becomes confusing to remember if your variable names have capitalized letters or not, so a great rule to follow is to just not use them :)

 # GOOD texas = c(57, 68, 55, 44) steve_data = c(134, 153, 154, 122) # BAD Texas = c(57, 68, 55, 44) Steve_data = c(134, 153, 154, 122)

Convention 3: When you wish to use variable names represent two or more words, use either a period or underscore to separate the words. (Note: This rule is a STATS4STEM suggestion followed by certain coding communities, however, some coding communities or educators may not agree with the following convention.)

 # GOOD river.data = c(1, 2, 3, 4) byron_smith = c(23, 24, 81, 91) height.meters = c(1.34, 2.19, 2.22) music_volume = c(23, 18, 89, 99) # BAD RiverData = c(1, 2, 3, 4) ByronSmith = c(23, 24, 81, 91) HEIGHTmeters = c(1.34, 2.19, 2.22) musicVolume = c(23, 18, 89, 99)

Convention 4: This is more of a rule than a convention. You can use numbers in your variable names. However, R prevents you from starting a variable name with a number.

 # GOOD row.1 = c(57, 68, 55, 44) row.2 = c(134, 153, 154, 122) > # BAD > 1.row = c(57, 68, 55, 44) Error: unexpected symbol in "1.row"

Convention 4: This is more of a rule than a convention. You can use numbers in your variable names. However, R prevents you from starting a variable name with a number.