Learning Objectives
Following this assignment students should be able to:
- ensure proficiency and understanding of course learning objectives
- affirm course success by completing challenge exercises
Exercises
Vector Review
The number of birds banded at a series of sampling sites has been counted by your field crew. The data are organized in two vectors. The first vector contains the alphanumeric code for each site and the second vector contains the number of birds banded per site. Cut and paste the vectors into your assignment and then answer the following questions by printing them to the screen.
sites <- c("A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "B1", "B2", "B3", "B4", "B5", "B6", "B7", "B8", "C1", "C2", "C3", "C4", "D1", "D2", "D3", "D4", "D5", "D6") counts <- c(28, 32, 1, 0, 10, 22, 30, 19, 145, 27, 36, 25, 9, 38, 21, 12, 122, 87, 36, 3, 0, 5, 55, 62, 98, 32)
- How many sites are there?
- How many birds were counted at the 7th site?
- How many birds were counted at the last site?
- What is the total number of birds counted across all sites?
- What is the average number of birds seen on a site?
- What is the total number of birds counted on sites with codes beginning with C? Don’t just identify this sites by eye, in the real world there could be hundreds or thousands of sites.
Data Management Review
Dr. Granger is interested in studying the relationship between the length of house-elves’ ears and aspects of their DNA. This research is part of a larger project attempting to understand why house-elves possess such powerful magic. She has obtained DNA samples and ear measurements from a small group of house-elves to conduct a preliminary analysis (prior to submitting a grant application to the Ministry of Magic) and she would like you to conduct the analysis for her (she might know everything there is to know about magic, but she sure doesn’t know much about computers). She has placed the data in a file on the web for you to download.
Write an R script that:
- Imports the data
- For each row in the dataset checks to see if the ear length is
"large"
(>10 cm) or"small"
(<=10 cm) and determines the GC-content of the DNA sequence (i.e., the percentage of bases that are either G or C) - Stores this information in a table where the first column has the ID for the
individual, the second column contains the string
"large"
or the string"small"
depending on the size of the individuals ears, and the third column contains the GC content of the DNA sequence. - Exports this table to a
csv
(comma separated values) file titledgrangers_analysis.csv
. - Prints the average GC-contents for large-eared elves and small-eared elves to the screen.
As you start to work on more complex problems it’s important to break them down into manageable pieces. One natural way to break this list of things down is: 1) import data; 2) determine size category; 3) determine GC-content; 4) calculate the size category and GC-content for each row of data and store it; 5) export this data to
csv
; 6) calculate and print the average GC-content for large and small ears.Use functions to break the code up into manageable pieces. Remember to document your code well.
There are several different specific approaches you could take to doing calculations for each row of data. One is to use
dplyr
using therowwise()
function (here’s an example). Another is to loop over the rows in the data.frame usingfor (row in 1:nrow(data)){...}
A third is to break the
data.frame
into vectors and usesapply()
.Ask your instructor if you have questions about the best choices.
Unit Conversion Challenge
Measures of the amount of energy used by biological processes are critical to understanding many aspects of biology from cellular physiology to ecosystem ecology. There are many different units for energy use and their utilization varies across methods, research areas, and lab groups. Write a function,
convert_energy_units(energy_value, input_unit, output_unit)
to convert units between the following energy values - Joules(J), Kilojoules(KJ), Calories(CAL), and Kilocalories (KCAL; this is unit used for labeling the amount of energy contained in food). A Kilojoule is 1000 Joules, a Calorie is 4.1868 Joules, a Kilocalorie is 4186.8 Joules. An example of a call to this function would look like:energy_in_cal <- 200 energy_in_j <- convert_energy_units(energy_in_cal, "CAL", "J")
Make this function more efficient by linking
if else
statements. If either the input unit or the output unit do not match the five types given above, have the function print - “Sorry, I don’t know how to convert “ + the name of the unit provided. Instead of writing an individual conversion between each of the different currencies (which would require 12 if statements) you could choose to convert all of the input units to a common scale and then convert from that common scale to the output units. This approach is especially useful since we might need to add new units later and this will be much easier using this approach.Use your function to answer the following questions:
- What is the daily metabolic energy used by a human (~2500 KCALs) in Joules.
- How many times more energy does a common seal use than a human? The common seal uses ~52,500 KJ/day (Nagy et al. 1999). Use the daily human metabolic cost given above.
- How many ergs (ERG) are there in one kilocalorie. Since we didn’t include the erg conversion this should trigger our ‘don’t know how to convert’ message
Tree Biomass Challenge
Understanding the total amount of biomass (the total mass of all individuals) in forests is important for understanding the global carbon budget and how the earth will respond to increases in carbon dioxide emissions. Measuring the mass of entire trees is difficult, and it’s pretty much impossible to weigh an entire forest even if we were willing to clear cut a forest for science, but fortunately we can estimate the mass of a tree based on its diameter.
There are lots of equations for estimating the mass of a tree from its diameter, but one good option is the equation M = 0.124 * D2.53, where M is measured in kg of dry (above-ground) biomass and D is in cm d.b.h. (Brown 1997). We’re going to estimate the total tree biomass for trees in a 96 hectare area of the Western Ghats in India.
- Write a function that takes a vector of tree diameters as an argument and
returns a vector of tree masses. - The raw data
is available on Ecological Archives, but
unfortunately due to poor database structure using all of the trees would be
a hassle. You could try to solve this problem yourself, but it turns out
that someone else has already solved it for you. Install the
EcoData Retriever and use it to download and
cleanup this data automatically (using the command line interface the
command would be
retriever install csv Ramesh2010
and the data will be stored inRamesh2010-macroplots.csv
) and import it into R. - If you look at the file or the metadata carefully you’ll notice that the data is actually in girth (i.e., circumference, which is equal to pi * diameter) rather than diameter. Write a function to take an vector of circumferences as an argument and returns an vector of diameters.
- Use the two functions you’ve written to estimate the total biomass (i.e., the sum of the masses) of trees in this dataset and print the result to the screen.
- Write a function that takes a vector of tree diameters as an argument and