Contact me by email at ease — I typically respond within a day or so!
7.0.2 Announcements
CautionAnnouncements
Fill out the Midterm Conflicts Form (QR code in last week’s slides).
Complete the Project 1 Partner Matching Form.
Starting with Homework 04, we will be stricter about submission issues — please submit your written work properly.
Welcome to the Project 1 Discussion Worksheet! This week we will cover table methods like .group and .pivot, conditional statements, and iteration. As a reminder, make use of the Python Reference and the Table Functions Visualizer on the Data 8 website, as well as the Discussion Reference sheet you are provided with!
NoteMaking the Most of the Reference Sheet
Your reference sheet is a powerful tool—spend a couple minutes learning how to use it!
The columns tell you what the inputs and outputs of each method are.
tbl is just a placeholder name. When you write your own code, replace it with the name of your table.
Knowing how to quickly read the sheet will save you time on assignments and exams.
7.1 Table Matchmaking
Shown below are the chocolates and nutrition tables respectively.
Match the following table method calls to the resulting descriptions of tables.
Hint: Pay attention to the column names of the resulting tables! For example, what happens to the column names when you specify an aggregating function in group()? What happens when you only specify one column name? How does the behavior of pivot() differ from group()?
.group lets us summarize a table by grouping rows with the same value in a column.
With no aggregation function, you get:
the grouped column
a new count column
With an aggregation function (e.g. max), you get:
the grouped column
all other columns summarized with the function name added to their labels
Think of .group whenever you want to count or summarize rows by a specific column.
NoteJoining Tables
.join combines information from two tables when they share a column of values.
Example: chocolates.join(nutrition) allows us to bring in the Calories column, which .group or .pivot cannot do.
Watch out if the column names differ—you may need to specify a third argument to align them.
The join column itself (like Type) may not appear in the final joined table.
NotePivoting Tables
.pivot is like grouping on two columns, but the output is shaped into a grid.
Each row corresponds to one unique value from the second argument.
Each column corresponds to one unique value from the first argument.
Example:
chocolates.pivot("Color", "Shape")
creates a table where Shape values are rows, and Color values (Dark, Milk, White) become column names.
Answer
Bing’s Amazing Slides:
A = 5
group() with no aggregating function yields a table with just the column that was grouped on, and a count column.
chocolates.group("Shape")
Shape
count
Rectangular
3
Round
3
B = 1
group() with an aggregating function yields the grouped column, and the other columns of the original table with the function name added at the end of each one.
chocolates.group("Shape", max)
Shape
Color max
Amount max
Price ($) max
Rectangular
White
12
2
Round
Milk
7
1.75
C = 3
group() with multiple columns and an aggregating function yields the columns that were grouped on, and the remaining columns with the function name added at the end.
pivot() yields the second argument to pivot() (“Shape”), as well as all unique values in the column of the first argument (“Dark”, “Milk”, “White” from the “Color” column) as columns in the resulting table.
join() gives you all the columns from the two tables, except for the extra column that is being used to join the two tables (“Type” is dropped in the resulting table).
chocolates.join("Color", nutrition, "Type")
Color
Shape
Amount
Price ($)
Calories
Dark
Round
4
1.3
120
Dark
Round
7
1.75
120
Milk
Rectangular
6
1.2
130
Milk
Rectangular
9
1.4
130
Milk
Round
2
1
130
White
Rectangular
12
2
115
F = 6
group() with multiple columns and no aggregating function yields the columns that were grouped on, and a count column.
chocolates.group(make_array("Shape", "Color"))
Shape
Color
count
Rectangular
Milk
2
Rectangular
White
1
Round
Dark
2
Round
Milk
1
7.2 Chocolate!
7.2.1 (a)
The % operator returns the remainder if you divide by a certain number (e.g., 11 % 5 = 1). If a number n is odd, what will n % 2 return?
Answer
It will return 1. Note that if a number is even, n % 2 will return 0.
n =11n %2
1
7.2.2 (b)
What will the following code output? Format your answer in the same way you expect Python to output the result of running the following code.
NoteUnderstanding If vs Else
The computer evaluates conditions in order.
If the first if is True, it runs that block and skips the else.
You can nest if statements inside other ifs.
If the outer if is not satisfied, the inner if is not even checked.
Strings in Python
You can multiply or add strings:
"hi" * 3 → "hihihi"
"Jelly" + "Belly" → "JellyBelly"
Watch out for spaces when concatenating!
if vs elif vs multiple ifs
Multiple ifs: even if one condition is True, Python will still check the others.
With elif/else: once one condition is satisfied, the rest are skipped.
for c in np.arange(5):if c <2:print("Chocolate?")if c %2==0:print("Yes sir! With or without nuts?")else:if c %2==1:print("Again!?")else:print("CHOCOLATE"+ ("!"* (c *2-3)))
Answer
Code
for c in np.arange(5):if c <2:print("Chocolate?")if c %2==0:print("Yes sir! With or without nuts?")else:if c %2==1:print("Again!?")else:print("CHOCOLATE"+ ("!"* (c *2-3)))
Chocolate?
Yes sir! With or without nuts?
Chocolate?
CHOCOLATE!
Again!?
CHOCOLATE!!!!!
7.2.3 (c) (Bonus!)
Remember, we can “multiply” a string by a certain number (e.g., 'data8' * 2). What is the result of doing so? Take 'data8' * 2 as an example.
Answer
It will return 'data8data8'. More generally speaking, it will concatenate the string with itself that number of times with no empty space between them.
'data8'*2
'data8data8'
7.3 Getting Even
7.3.1 (a)
The count_evens function takes in an array of integers and returns the number of even integers in the array. Use a combination of iteration and conditionals to complete the skeleton code below.
The table squirrel below contains some information on reported squirrel sightings across the UC Berkeley campus. Each row in the squirrel table represents one unique squirrel sighting:
Squirrel ID (int): unique identification number for each squirrel.
Location (string): common name of the nearest campus landmark where the squirrel was spotted.
Month (int): numerical representation of the month when the squirrel was spotted.
Day (int): day of the month when the squirrel was spotted.
Write a line of code that evaluates to the proportion of Squirrel IDs in the table that are even.
NoteWorking with Proportions
Proportions often come from counting how many times a condition is True.
In Python, True is stored as 1 and False as 0.
This means we can calculate proportions in many ways:
np.mean(condition_array)
np.average(condition_array)
sum(condition_array) / len(condition_array)
With tables:
tbl.where(...).num_rows / tbl.num_rows
This flexibility means you can often solve the same problem in multiple ways.
Answer
np.mean(squirrel.column("Squirrel ID") % 2 == 0) Any equivalent code is acceptable, e.g. using np.average.
np.mean(squirrel.column("Squirrel ID") %2==0)
0.48599999999999999
7.4.2 (b)
Alankrita wants to find the best location where she is most likely to find a squirrel. Write a line of code that evaluates to the location with the most squirrel sightings.
NoteCommon Patterns in Data 8
There are a few coding patterns you’ll see again and again.
Finding the largest or smallest value:
tbl.sort("column").column("column").item(0)
Sort the table, pull out the column, then grab a single value.
Watching data types:
Many problems move through this chain:
Table → Array → Single Value
Knowing what type of object you have at each step helps you decide what methods you can use next.
Alankrita is interested in how many squirrels were sighted at every location during every month. Create a table called sightings where each cell contains the number of squirrel sightings that occurred in 2023 at each location during each month. Note: Each row should be in a different location.
Alankrita now has access to another table, species, that contains information about the species of each squirrel. Some of the rows from this table are shown below.
NoteBreaking Down Multi-Step Problems
Some problems ask you to combine several table methods in sequence, like .where(), .group(), .sort(), or .join().
How to Tackle These
Take it step by step. Don’t try to write the whole solution in one go.
Check what each step produces. Is it still a Table? Or did it turn into an Array or a single Value?
Build gradually. Run one line at a time and confirm the output before moving on.
This approach makes complicated problems much easier to manage and helps you avoid getting lost.
Write lines of code to find the least observed species in 2024. If multiple species are tied for the least sightings, find the species that comes first alphabetically.