Learning Objectives

Following this assignment students should be able to:

  • understand the basic query structure of SQL
  • execute SQL commands to select, sort, group, and aggregate data
  • use joins to combine tables in SQL

Reading

Lecture Notes

  1. Basic Queries
  2. Aggregation
  3. Joins

Exercises

  1. SELECT (5 pts)

    For this and many of the following problems you will create queries that retrieve the relevant information from the Portal small mammal survey database. Download the data. As you begin to familiarize yourself with the database you will need to know some details regarding what is in this database in order to answer the questions. For example, you may need to know what species is associated with the two character species ID or you may need to know the units for the individual’s weight. This type of information associated with data is called metadata and the metadata for this dataset is available online at Ecological Archives.

    1. Write a query that displays all of the records for all of the fields (*) in the main table. Save it as a view named all_survey_data.
    2. We want to generate data for an analysis of body size differences (using both weight and hind foot length) between males and females of each species. We have decided that we can ignore the information related to when and where the individuals were trapped. Create a query that returns all of the necessary information, but nothing else. Save this as size_differences_among_sexes_data.
  2. WHERE (5 pts)

    A population biologist (Dr. Undomiel) who studies the population dynamics of Dipodomys spectabilis would like to use some data from the Portal Project, but she doesn’t know how to work with large datasets. Being the kind and benevolent person that you are, write a query to extract the data that she needs. She wants only the data for her species of interest (DS in the species_id column), when each individual was trapped, and what sex it was. She doesn’t care about the plot the individual was trapped on or the size of the individuals. She also doesn’t need the species codes because you’re only providing her with the data for one species, and since she isn’t looking at the database itself the two character abbreviation would probably be confusing. Save this query as a view with the name spectabilis_population_data.

    Scrolling through the results of your query you notice that the data on sex is missing for some species. You send Dr. Undomiel a short e-mail* asking what she would like you to do regarding this complexity. Dr. Undomiel asks that you create two additional queries so that she can decided what to do about this issue later. Add a query that retrieves the same data as above, but only for cases where the sex is known to be male, and an additional query with the same data, but only where the sex is known to be female. Save these as views with the names spectabilis_population_data_males and spectabilis_population_data_females.

    *Short for elven-mail

  3. ORDER BY (5 pts)

    The graduate students that work at the Portal site are hanging out late one evening drinking… soda pop… and they decide it would be an epically awesome idea to put together a list of the 100 largest rodents ever sampled at the site. Since you’re the resident computer genius they text you, and since you’re up late working and this sounds like a lot more fun than the homework you’re working on (which isn’t really saying much, if you know what I’m saying) you decide you’ll make the list for them.

    The rules that the Portal students have come up with (and they did spend a sort of disturbingly long time coming up with these rules; I guess you just had to be there) are:

    1. The data should include the species_id, year, and the weight. These columns should be presented in this order.
    2. Individuals should be sorted in descending order with respect to mass.
    3. Since individuals often have the same mass, ties should be settled by sorting next by hindfoot_length and finally by the year.

    Since you need to limit this list to the top 100 largest rodents, you’ll need to add the SQL command LIMIT 100 to the end of the query. Save the final query as 100_largest_individuals.

  4. DISTINCT (5 pts)

    Write a query that returns a list of the dates that mammal surveys took place at Portal with no duplicates. Save it as dates_sampled.

  5. Missing Data (5 pts)

    Write a query that returns the year, month, day, species_id, and weight for every record were there is no missing data in any of these fields. Save it as no_missing_data.

  6. GROUP BY (5 pts)

    Using GROUP BY, write a query that returns a list of dates on which individuals of the species Dipodomys spectabilis (indicated by the DS species code) were trapped (with no duplicates). Sort the list in chronological order (from oldest to newest). Save it as dates_with_dipodomys_spectabilis.

  7. COUNT (10 pts)

    Write a query that returns the number of individuals trapped in each year. Count the species_id column so that you only include cases where an individual was identified to species. Name the count column total_abundance and sort it chronologically. Include the year in the output. Save it as total_abundance_by_year. There should only be one value for each year since this is a count of the individuals across all species in that year.

  8. SUM (10 pts)

    Write a query that returns the number of individuals of each species captured in each year (total_abundance) and the total_biomass of those individuals (the sum of the weight). The units for biomass should be in kilograms. Include the year and species_id in the output. Sort the result chronologically by year and then alphabetically by species. Save as mass_abundance_data.

  9. Basic Join (10 pts)

    Write a query that returns the year, month, and day for each individual captured as well as it’s genus and species names. This can be accomplished by joining the species table to the surveys table using the species_id column in both tables. Save this query as species_captures_by_date.

  10. Multi-table Join (10 pts)

    The plots table in the Portal database can be joined to the surveys table by joining plot_id to plot_id and the species table can be joined to the surveys table by joining species_id to species_id.

    The Portal mammal data include data from a number of different experimental manipulations. You want to do a time-series analysis of the population dynamics of all of the species at the site, taking into account the different experimental manipulations. Write a query that returns the year, month, day, genus and species of every individual as well as the plot_id and plot_type of the plot they are captured on. Save this query as species_plot_data.

  11. Filtered Join (10 pts)

    You are curious about what other kinds of animals get caught in the Sherman traps used to census the rodents. Write a query that returns a list of the genus, species, and taxa (from the species table) for non-rodent individuals that are caught on the Control plots. Non-rodents are indicated in the taxa column of the species table. You are only interested in which species are captured, so make this list unique (only one line for each species). Save this query as non_rodents_on_controls.

  12. Detailed Join (10 pts)

    We want to do an analysis comparing the size of individuals on the Control plots to the Long-term Krat Exclosures. Write a query that returns the year, genus, species, weight and the plot_type for all cases where the plot type is either Control or Long-term Krat Exclosure. Be sure to choose only rodents and exclude individuals that have not been identified to species (i.e., exclude species with sp. in the species column). Remove any records where the weight is missing. Save this query as size_comparison_controls_vs_krat_exclosures.

  13. Aggregated Join (10 pts)

    Write a query that displays the total number of rodent individuals sampled on each plot_type. Save this query as individuals_per_plot_type.