3-Data Types, Basic Commands and Charting
1. Creating and Manipulating Vectors and Basic Variables
To create any data object:
- the command will begin with the a name for the new variable
- followed by: - an assignment operator
<-
- and then the data or expression that defines the content of the variable.
-This can include direct values, function calls, operations, or other variables.
variableName <- "word"
Variable names are never wrapped in quotes. String/character values being assigned to variable must be wrapped in quotes. Variable names cannot begin with anything other than an alphabetical character, but otherwise can contain special characters and numbers (*_13). Variable names cannot contain spaces, but string values can.
Definition - “Function”: A set of instructions defined to perform a specific task.
-E.g., help() : ‘help’ is a function to get information
Definition - “Function Call”: The act of executing a function with specific arguments, if required, to produce a result.
- e.g., help(“integer”)
- This calls the ‘help’ function with the argument (aka parameter) “integer”
- It will return information about an ‘integer’ object type.
1.1 Variables and Basic Data Types
Let’s start by looking at types of variables.
Definition - “Basic Data Types”: Types of data representing the simplest forms of data.
Basic Data Types:
-
Numeric: Decimal or floating-point numbers (e.g., 4.5, -3.2).
-
Integer: Whole numbers (e.g., 1, -5, 20).
-
In R, integers are often just treated as numeric unless explicitly specified.
-
Character: Text or strings (e.g., “hello”, “1234”).
-
Logical: Boolean values, either TRUE or FALSE.
-
Factor: Categorical data, or data as levels (e.g., “low”, “medium”, “high”).
Here we’ll look at basic operations with character variables.
Whenever you enter a string parameter, the string will more likely than not be wrapped in quotes. If it doesn’t work, add or remove quotes.
⭐ Task 1.1-1
Create a variable.
Create a variable for a pig’s first name. The first pig's first name is 'Bart'.
Check your code
#assign the first name 'Bart' to the first pig (pig1)
pig1.first_name <- "Bart"
⭐ Task 1.1-2
Create a variable.
Create a variable for a Bart’s last name. Bart's last name is 'Smith'.
Check your code
#assign the last name 'Smith' to the first pig (pig1)
pig1.last_name <- "Smith"
⭐ Task 1.1-3
Create a variable.
Create a variable that equals Bart’s first and last name, then display the full name in the console
The paste()
function combines two strings and inserts a space between them. paste()
takes two arguments, like paste(string1, string2)
Check your code
#concatenate the first pig's (pig1) first ('Bart') and last name ('Smith')
pig1.full_name <- paste(pig1.first_name, pig1.last_name)
#after pig1.full_name has been created, print (display) Bart’s full name...
pig1.full_name
## [1] "Bart Smith"
Now we’ll look at basic operations with numeric and integer variables. First we’ll create height information for Bart and find out how much he’s grown in height.
⭐ Task 1.1-4
Create a variable.
Create a variable for Bart’s height as a piglet: 10
Check your code
#Assign the value of Bart’s piglet height
pig1.heightA <- 10
⭐ Task 1.1-5
Create a variable.
Create a variable for Bart’s height now: 22.3
Check your code
#Assign the value of Bart's current height
pig1.heightB <- 22.3
⭐ Task 1.1-6
Create a variable.
Now create a variable expressing the amount he’s grown.
Check your code
# Find the difference in height using the expression: 'heightB - heightA'
# using the subtraction operator.
pig1.heightGain <- pig1.heightB - pig1.heightA
#after pig1.heightGain has been created, print (display) the value of pig.heightGain...
pig1.heightGain
## [1] 12.3
Hint: “Expressing” indicates that the value will require an expression, in this case, a mathematical operation.
pig1.heightA
is an ‘integer’ data type (whole number)
pig1.heightB
is a ‘numeric’ data type (decimal number)
R can perform operations on different data types like getting the difference of a value.
📍 As you work through these activities, remember to save your script(s) regularly.
To remove data objects from your environment, execute the ‘remove’ function in the console: rm()
, e.g. rm(full_name)
.
Time for logical or boolean values!
We can denote if Bart is small or large with a boolean value.
⭐ Task 1.1-7
Create two variables.
Create two variables (pig1.mini and pig1.large) which indicate that Bart is a large pig and not a mini pig.
Check your code
pig1.mini <- FALSE
pig1.large <- TRUE
Hint: Boolean values are either ‘TRUE’ or ‘FALSE’ (case sensitive).
1.2 Vectors
A vector is a 1-dimensional list of items that are of the same data type (all text, all whole numbers, etc.)
To create a vector object, you will use the c()
function.
-
The ‘c’ stands for ‘combine’.
-
It’s used to create a vector by grouping individual values into a list-like structure.
-
Think of it as placing items into a container where each item remains distinct and can be individually accessed.
-
For example,
vector1 <- c(value1, value2)
creates a vector named ‘vector1’ containing the elements ‘value1’ and ‘value2’ as separate items in a sequence, not as a single merged item. -
A value in a vector can be accessed by using square brackets and its index (the value’s place in the vector), where 1 is the first index.
vector1[1]
will output: ‘value1’
-
As you might have seen if you tested the help() function by looking up information on vectors, you will know that many functions and operations in R are designed to work naturally with vectors.
⭐ Task 1.2-1
Create a vector.
Make a vector for the following weight values of miniature goats. Name your variable ‘goat.weights’
Goat weights: 13.3, 17.2, 14.8, 14.6, 12.4
Check your code
# The period between 'goat' and 'weight' has no special purpose.
# It only shows the person reading the code that 'weights' is information that pertains to the goats
goat.weights <- c(13.3, 17.2, 14.8, 14.6, 12.4)
The command you just ran has now appeared in your console (bottom left window)
- the goat.weight vector is now listed in the Environment tab (top right window) under Values.
⭐ Task 1.2-2
View variables.
Show the contents of the vector containing the goat weights.
If at any point you want to view the value of a variable, use the print()
function with the name of the variable name and type ‘enter’ or ‘return’ to execute.
Check your code
print(goat.weights)
## [1] 13.3 17.2 14.8 14.6 12.4
⭐ Task 1.2-3
View variables.
Display the weight of the second goat in the vector.
Check your code
goat.weights[2]
## [1] 17.2
Hint: data_object_name[indexNumber]
You have just worked with numeric vectors. Now let’s move to string vectors.
⭐ Task 1.2-4
Create a new R script.
Make a vector for the following name values of miniature goats.
Name your variable goat.name
Goat names: baby, pickles, cookie, sparkle, gabbie
Note: Text values must be wrapped in quotations. You can use double or single quotes, but must be consistent - Good: "text"
- Good: 'text'
- Bad: 'text"
Check your code
goat.name <- c("baby", "pickes", "cookie", "sparkle", "gabbie")
To get the length of a vector, we can use the length()
function.
⭐ Task 1.2-5
View information about variables.
Print (display) the length of the vector of miniature goat names.
Note: In a script (code editor), you often need to use the print() function explicitly to see the output, especially when running multiple lines of code or within functions. However, in the console, R automatically displays the output of expressions upon execution of the command.
Check your code
length(goat.name)
## [1] 5
Lists (Additional Information)
A ‘list’ can hold items of different types (even vectors), while items in a ‘vector’ must all be the same type.
To make a list, we’ll use the list()
function.
Hint: Remember that all items in a vector must be the same type, but can be different types if in a list.
If you want to create 2D lists, also known as a table, you will create a matrix using the matrix()
function.
- For more on matrices, check me out.
- Instead of creating our own matrices, we will be importing data later on.
📍 As you work through these activities, remember to save your script(s) regularly.
2. Descriptive Statistics
Statistics is:
-
the science of collecting, analyzing, interpreting
-
data to uncover patterns and trends,
-
informed decisions based on this data.
If you’re unfamiliar with statistics, you can learn more about it from the w3school Statistics Tutorial
In this section, we’ll be focusing on
- Basic statistical measures
- Presenting data in a histogram
- More on presenting data will be covered in Activity 4-Data Visualization
- Importing data
2.1 Basic statistical measures
The function names for the following three statistical measures (mean, median, standard deviation) are quite intuitive.
It is just the name or abbreviation of the statistical measure, where the argument is the object containing the set of values we are analyzing.
Each function takes the vector as its argument.
These three functions are designed for sets of numerical and decimal values. If run on other types (string, aka text, and boolean, aka true/false), result will be NA
.
⭐ Task 2.1-1
View information about variables.
For this task, we will use a new vector object containing weights for a set of pigs.
Create a vector object with the weights of a set of pigs. Name your variable ‘pigs.weight’
Weights of pigs: 22, 27, 19, 25, 12, 22, 18
Check your code
pigs.weight <- c(22, 27, 19, 25, 12, 22, 18)
⭐ Task 2.1-2
Get the mean (average) value.
Mean: the average value in a set.
The mean()
function calculates the sum of the in the set and divides the sum by the number of items in the set.
Write and execute a command that outputs the mean value of the pigs’ weights.
Check your code
# output the average weight of all of the pigs
mean(pigs.weight)
## [1] 20.71429
⭐ Task 2.1-3
Get median value.
Write and execute a command that outputs the median value of the pigs’ weights
Median: The middle value in a sorted set (e.g. lowest - highest). median()
Check your code
median(pigs.weight)
## [1] 22
The output tells you the weight of the pig that falls between the lighter half and the heavier half of the pigs.
⭐ Task 2.1-4
Get median value.
Standard deviation: Describes how spread out the data is. sd()
Write and execute a command that outputs the standard deviation of the pigs’ weights
The output tells you how much the weights of the pigs vary from the average weight.
- A small standard deviation means that most pigs’ weights are close to the average, indicating uniformity in size.
- A large standard deviation suggests a wide range of weights.
Check your code
sd(pigs.weight)
## [1] 4.956958
⭐ Task 2.1-5
Get summary of value statistics.
Display a summary of values pertaining to the pigs’ weights
We can execute a ‘summary’ to generate several descriptive statistics at the same time. summary()
Check your code
summary(pigs.weight)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 12.00 18.50 22.00 20.71 23.50 27.00
2.2. Histogram Plot for Pig Weights
Histogram: A graph used for understanding and analysing the distribution of values in a vector.
A histogram illustrates:
- Where data points tend to cluster
- The variability of data
- The shape of variability
The histogram will appear in the Plots tab (bottom right quadrant if you haven’t modified your RStudio layout).
⭐ Task 2.2-1
Create a histogram.
Create a histogram for the pigs’ weights using the histogram function hist()
- Parameter: vector of pig weights
Check your code and see the histogram
hist(pigs.weight)
# The histogram will appear in the Plot tab.
We can also pass in additional parameters to control the way our plot looks.
Some of the frequently used parameters are:
main
: The title of the plot- e.g.,
main = "This is the Plot Title"
- e.g.,
xlab
: The x-axis label- e.g.,
xlab = "The X Label"
- e.g.,
ylab
: The y-axis label- e.g., ylab = “The Y Label”
Multiple parameters are given to a function by putting them in parentheses separated by commas, function_name(parameter1, parameter2)
- E.g.,
hist(dataset, xlab="x-label", ylab = "y-label", main = "main title")
⭐ Task 2.2-2
Create a histogram.
Create a histogram for the pigs’ weights, with axes labels.
In your histogram for the pigs’ weights, use:
- X-label: “Weight”
- Y-label: “Frequency”
- This is a default value.
- You don’t have to specify it unless you would like a different label.
- Graph title: “Histogram of Pigs’ Weights”
Check your code
# The first parameter is the name of the data (vector) object
# 'main' is the graph title
# 'xlab' is the label of the x-axis
# label parameters can be in any order, but following the data object
# y-label on a histogram defaults to "frequency". You can add 'ylab=""' if you'd like.
hist(pigs.weight,main='Histogram of Pig Weight',xlab='Weight')
# The histogram will appear in the Plot tab.
📍 As you work through these activities, remember to save your script(s) regularly.
- File
- Save (or cmd+s on Mac, ctrl+s on Windows)