7 Discussion 05: Table Methods, Conditionals & Iteration (from Fall 2025)

7.0.1 Contact Information

Name	Wesley Zheng
Pronouns	He/him/his
Email	wzheng0302@berkeley.edu
Discussion	Wednesdays, 12–2 PM @ Etcheverry 3105
Office Hours	Tuesdays/Thursdays, 2–3 PM @ Warren Hall 101

Contact me by email at ease — I typically respond within a day or so!

7.0.2 Announcements

Announcements

Fill out the Midterm Conflicts Form (QR code in last week’s slides).
Complete the Project 1 Partner Matching Form.
Starting with Homework 04, we will be stricter about submission issues — please submit your written work properly.

Welcome to the Project 1 Discussion Worksheet! This week we will cover table methods like .group and .pivot, conditional statements, and iteration. As a reminder, make use of the Python Reference and the Table Functions Visualizer on the Data 8 website, as well as the Discussion Reference sheet you are provided with!

Making the Most of the Reference Sheet

Your reference sheet is a powerful tool—spend a couple minutes learning how to use it!

The columns tell you what the inputs and outputs of each method are.
tbl is just a placeholder name. When you write your own code, replace it with the name of your table.

Knowing how to quickly read the sheet will save you time on assignments and exams.

7.1 Table Matchmaking

Shown below are the chocolates and nutrition tables respectively.

chocolates

Color	Shape	Amount	Price ($)
Dark	Round	4	1.30
Milk	Rectangular	6	1.20
White	Rectangular	12	2.00
Dark	Round	7	1.75
Milk	Rectangular	9	1.40
Milk	Round	2	1.00

nutrition

Type	Calories
Dark	120
Milk	130
White	115
Ruby	120

Code

from datascience import *
import random
import numpy as np

chocolates = Table().with_columns(
    'Color', ['Dark', 'Milk', 'White', 'Dark', 'Milk', 'Milk'],
    'Shape', ['Round', 'Rectangular', 'Rectangular', 'Round', 'Rectangular', 'Round'],
    'Amount', [4, 6, 12, 7, 9, 2],
    'Price ($)', [1.30, 1.20, 2.00, 1.75, 1.40, 1.00]
)

nutrition = Table().with_columns(
    'Type', ['Dark', 'Milk', 'White', 'Ruby'],
    'Calories', [120, 130, 115, 120]
)

Match the following table method calls to the resulting descriptions of tables.

Hint: Pay attention to the column names of the resulting tables! For example, what happens to the column names when you specify an aggregating function in group()? What happens when you only specify one column name? How does the behavior of pivot() differ from group()?

Letter	Function Call
A	`chocolates.group("Shape")`
B	`chocolates.group("Shape", max)`
C	`chocolates.group(make_array("Shape", "Color"), max)`
D	`chocolates.pivot("Color", "Shape", "Price ($)", max)`
E	`chocolates.join("Color", nutrition, "Type")`
F	`chocolates.group(make_array("Shape", "Color"))`

Number	Columns	# of Rows
1	Shape, Color max, Amount max, Price ($) max	2
2	Shape, Dark, Milk, White	2
3	Shape, Color, Amount max, Price ($) max	4
4	Color, Shape, Amount, Price ($), Calories	6
5	Shape, count	2
6	Shape, Color, count	4

A: ____________ C: ____________ E: ____________
B: ____________ D: ____________ F: ____________

Grouping Tables

.group lets us summarize a table by grouping rows with the same value in a column.

With no aggregation function, you get:
- the grouped column
- a new count column
With an aggregation function (e.g. max), you get:
- the grouped column
- all other columns summarized with the function name added to their labels

Think of .group whenever you want to count or summarize rows by a specific column.

Joining Tables

.join combines information from two tables when they share a column of values.

Example: chocolates.join(nutrition) allows us to bring in the Calories column, which .group or .pivot cannot do.
Watch out if the column names differ—you may need to specify a third argument to align them.
The join column itself (like Type) may not appear in the final joined table.

Pivoting Tables

.pivot is like grouping on two columns, but the output is shaped into a grid.

Each row corresponds to one unique value from the second argument.
Each column corresponds to one unique value from the first argument.

Example:

chocolates.pivot("Color", "Shape")

creates a table where Shape values are rows, and Color values (Dark, Milk, White) become column names.

Answer

Bing’s Amazing Slides:

A = 5

group() with no aggregating function yields a table with just the column that was grouped on, and a count column.

chocolates.group("Shape")

Shape	count
Rectangular	3
Round	3

B = 1

group() with an aggregating function yields the grouped column, and the other columns of the original table with the function name added at the end of each one.

chocolates.group("Shape", max)

Shape	Color max	Amount max	Price ($) max
Rectangular	White	12	2
Round	Milk	7	1.75

C = 3

group() with multiple columns and an aggregating function yields the columns that were grouped on, and the remaining columns with the function name added at the end.

chocolates.group(make_array("Shape", "Color"), max)

Shape	Color	Amount max	Price ($) max
Rectangular	Milk	9	1.4
Rectangular	White	12	2
Round	Dark	7	1.75
Round	Milk	2	1

D = 2

pivot() yields the second argument to pivot() (“Shape”), as well as all unique values in the column of the first argument (“Dark”, “Milk”, “White” from the “Color” column) as columns in the resulting table.

chocolates.pivot("Color", "Shape", "Price ($)", max)

Shape	Dark	Milk	White
Rectangular	0	1.4	2
Round	1.75	1	0

E = 4

join() gives you all the columns from the two tables, except for the extra column that is being used to join the two tables (“Type” is dropped in the resulting table).

chocolates.join("Color", nutrition, "Type")

Color	Shape	Amount	Price ($)	Calories
Dark	Round	4	1.3	120
Dark	Round	7	1.75	120
Milk	Rectangular	6	1.2	130
Milk	Rectangular	9	1.4	130
Milk	Round	2	1	130
White	Rectangular	12	2	115

F = 6

group() with multiple columns and no aggregating function yields the columns that were grouped on, and a count column.

chocolates.group(make_array("Shape", "Color"))

Shape	Color	count
Rectangular	Milk	2
Rectangular	White	1
Round	Dark	2
Round	Milk	1

7.2 Chocolate!

7.2.1 (a)

The % operator returns the remainder if you divide by a certain number (e.g., 11 % 5 = 1). If a number n is odd, what will n % 2 return?

Answer

It will return 1. Note that if a number is even, n % 2 will return 0.

n = 11
n % 2

7.2.2 (b)

What will the following code output? Format your answer in the same way you expect Python to output the result of running the following code.

Understanding If vs Else

The computer evaluates conditions in order.
- If the first if is True, it runs that block and skips the else.
You can nest if statements inside other ifs.
- If the outer if is not satisfied, the inner if is not even checked.

Strings in Python

You can multiply or add strings:
- "hi" * 3 → "hihihi"
- "Jelly" + "Belly" → "JellyBelly"
Watch out for spaces when concatenating!

if vs elif vs multiple ifs

Multiple ifs: even if one condition is True, Python will still check the others.
With elif/else: once one condition is satisfied, the rest are skipped.

for c in np.arange(5):
  if c < 2:
    print("Chocolate?")
    if c % 2 == 0:
      print("Yes sir! With or without nuts?")
  else:
    if c % 2 == 1:
      print("Again!?")
    else:
      print("CHOCOLATE" + ("!" * (c * 2 - 3)))

Answer

Code

for c in np.arange(5):
  if c < 2:
    print("Chocolate?")
    if c % 2 == 0:
      print("Yes sir! With or without nuts?")
  else:
    if c % 2 == 1:
      print("Again!?")
    else:
      print("CHOCOLATE" + ("!" * (c * 2 - 3)))

Chocolate?
Yes sir! With or without nuts?
Chocolate?
CHOCOLATE!
Again!?
CHOCOLATE!!!!!

7.2.3 (c) (Bonus!)

Remember, we can “multiply” a string by a certain number (e.g., 'data8' * 2). What is the result of doing so? Take 'data8' * 2 as an example.

Answer

It will return 'data8data8'. More generally speaking, it will concatenate the string with itself that number of times with no empty space between them.

'data8' * 2

'data8data8'

7.3 Getting Even

7.3.1 (a)

The count_evens function takes in an array of integers and returns the number of even integers in the array. Use a combination of iteration and conditionals to complete the skeleton code below.

def count_evens(arr):
  num_evens = 0
  for ________________________________________________:
  if ________________________________________________:
    ________________________________________________
  return ________________________________________________

Counting with Loops and Arrays

Why initialize with num_evens = 0?

This variable keeps track of how many even numbers have been seen.

For loops

Sometimes we care about the loop variable, sometimes we don’t.
Here, we only increment num_evens when the current number is even.
- Controlled with an if condition.

Using arrays instead of loops

Arrays often make counting easier than loops.
Ways to count evens in an array:
- sum(condition)
- np.count_nonzero(condition)
- len(array) - number_of_odds
Many approaches lead to the same answer.

Answer

def count_evens(an_array):
  num_evens = 0
  for num in an_array:
    if num % 2 == 0:
        num_evens = num_evens + 1
  return num_evens

arr = make_array(1, 2, 3, 4, 5, 6, 7)
count_evens(arr)

7.3.2 (b)

Now, let’s see how we can write the same function using array operations instead of iteration.

def count_evens(arr):
  remainder_array = ________________________________________________
  return ________________________________________________

Answer

def count_evens(arr):
  remainder_array = arr % 2
  return np.count_nonzero(remainder_array == 0)

arr = make_array(1, 2, 3, 4, 5, 6, 7)
count_evens(arr)

def count_evens(arr):
  iseven = (arr % 2) == 0
  return np.count_nonzero(iseven)

arr = make_array(1, 2, 3, 4, 5, 6, 7)
count_evens(arr)

Note it is acceptable to use sum or np.sum instead of np.count_nonzero for the above two solutions.

def count_evens(arr):
  remainder_array = arr % 2
  return len(remainder_array) - sum(remainder_array)

arr = make_array(1, 2, 3, 4, 5, 6, 7)
count_evens(arr)

7.4 Squirrel Sightings (Optional)

The table squirrel below contains some information on reported squirrel sightings across the UC Berkeley campus. Each row in the squirrel table represents one unique squirrel sighting:

Squirrel ID (int): unique identification number for each squirrel.
Location (string): common name of the nearest campus landmark where the squirrel was spotted.
Month (int): numerical representation of the month when the squirrel was spotted.
Day (int): day of the month when the squirrel was spotted.
Year (int): year when the squirrel was spotted.

Code

squirrel_ids = [2937, 8421, 472, 239, 2937]
locations = ["Wheeler Hall", "East Asian Library", "Etcheverry Hall", "Campbell Hall", "Moffitt Library"]
days = [17, 28, 7, 4, 7]
months = [3, 9, 1, 10, 6]
years = [2024, 2022, 2024, 2023, 2021]

location_pool = [
    "Wheeler Hall", "East Asian Library", "Etcheverry Hall",
    "Campbell Hall", "Moffitt Library", "Doe Library",
    "Cory Hall", "Soda Hall", "Evans Hall", "Haas Pavilion",
    "Stanley Hall", "Physics North", "Physics South"
]

for _ in range(995):
    squirrel_ids.append(random.randint(100, 9999))
    locations.append(random.choice(location_pool))
    days.append(random.randint(1, 28))
    months.append(random.randint(1, 12))
    years.append(random.choice([2021, 2022, 2023, 2024]))

squirrel = Table().with_columns(
    "Squirrel ID", squirrel_ids,
    "Location", locations,
    "Day", days,
    "Month", months,
    "Year", years
)

squirrel.show(5)

Squirrel ID	Location	Day	Month	Year
2937	Wheeler Hall	17	3	2024
8421	East Asian Library	28	9	2022
472	Etcheverry Hall	7	1	2024
239	Campbell Hall	4	10	2023
2937	Moffitt Library	7	6	2021

... (995 rows omitted)

7.4.1 (a)

Write a line of code that evaluates to the proportion of Squirrel IDs in the table that are even.

Working with Proportions

Proportions often come from counting how many times a condition is True.

In Python, True is stored as 1 and False as 0.
This means we can calculate proportions in many ways:
- np.mean(condition_array)
- np.average(condition_array)
- sum(condition_array) / len(condition_array)
With tables:
```
tbl.where(...).num_rows / tbl.num_rows
```
This flexibility means you can often solve the same problem in multiple ways.

Answer

np.mean(squirrel.column("Squirrel ID") % 2 == 0) Any equivalent code is acceptable, e.g. using np.average.

np.mean(squirrel.column("Squirrel ID") % 2 == 0)

0.48599999999999999

7.4.2 (b)

Alankrita wants to find the best location where she is most likely to find a squirrel. Write a line of code that evaluates to the location with the most squirrel sightings.

Common Patterns in Data 8

There are a few coding patterns you’ll see again and again.

Finding the largest or smallest value:

tbl.sort("column").column("column").item(0)

Sort the table, pull out the column, then grab a single value.
Watching data types:
Many problems move through this chain:
- Table → Array → Single Value

Knowing what type of object you have at each step helps you decide what methods you can use next.

Answer

squirrel.group("Location").sort("count", descending = True).column("Location").item(0)

'Doe Library'

7.4.3 (c)

Alankrita is interested in how many squirrels were sighted at every location during every month. Create a table called sightings where each cell contains the number of squirrel sightings that occurred in 2023 at each location during each month. Note: Each row should be in a different location.

squirrels_2023 = ________________________________________________________________________________________
sightings = _____________________________________________________________________________________________

Making Sense of Pivot Tables

Pivot tables reorganize data so you can make comparisons more easily.

The first argument becomes the set of new column labels.
The second argument becomes the set of row labels.
Each cell shows the value(s) associated with that row–column pair.

Example:

chocolates.pivot("Color", "Shape")

This creates a table where each row corresponds to a shape, and each column corresponds to a chocolate color (Dark, Milk, White).

Answer

squirrels_2023 = squirrel.where("Year", are.equal_to(2023))
sightings = squirrels_2023.pivot("Month", "Location")

sightings

Location	1	2	3	4	5	6	7	8	9	10	11	12
Campbell Hall	2	1	4	3	2	1	1	2	3	2	0	4
Cory Hall	2	1	3	1	1	2	3	1	0	1	1	1
Doe Library	2	1	3	0	3	2	2	0	2	1	2	2
East Asian Library	4	2	3	1	1	3	1	5	1	2	1	0
Etcheverry Hall	1	0	2	1	2	2	1	2	2	1	1	3
Evans Hall	2	0	1	4	2	0	1	0	2	2	2	3
Haas Pavilion	1	2	1	4	2	1	0	2	2	0	2	1
Moffitt Library	2	3	2	5	0	1	0	2	0	2	2	2
Physics North	0	0	1	4	4	1	4	1	2	1	2	2
Physics South	3	1	1	1	1	2	3	0	1	2	3	1

... (3 rows omitted)

7.4.4 (d)

Alankrita now has access to another table, species, that contains information about the species of each squirrel. Some of the rows from this table are shown below.

Breaking Down Multi-Step Problems

Some problems ask you to combine several table methods in sequence, like .where(), .group(), .sort(), or .join().

How to Tackle These

Take it step by step. Don’t try to write the whole solution in one go.
Check what each step produces. Is it still a Table? Or did it turn into an Array or a single Value?
Build gradually. Run one line at a time and confirm the output before moving on.

This approach makes complicated problems much easier to manage and helps you avoid getting lost.

Code

species_pool = [
    "eastern gray squirrel",
    "western gray squirrel",
    "fox squirrel",
    "Douglas squirrel",
    "red squirrel"
]

squirrel_ids_species = [2937, 8421, 472, 239]
species_array = [
    "eastern gray squirrel",
    "fox squirrel",
    "western gray squirrel",
    "western gray squirrel"
]

for sid in squirrel.column("Squirrel ID")[4:]:
    squirrel_ids_species.append(sid)
    species_array.append(random.choice(species_pool))

species = Table().with_columns(
    "Squirrel ID", squirrel_ids_species,
    "Species", species_array
)

species.show(4)

Squirrel ID	Species
2937	eastern gray squirrel
8421	fox squirrel
472	western gray squirrel
239	western gray squirrel

... (996 rows omitted)

Write lines of code to find the least observed species in 2024. If multiple species are tied for the least sightings, find the species that comes first alphabetically.

squirrels_2024 = ____________________________________________
species_counts = ____________________________________________
least_observed = ____________________________________________

Answer

squirrels_2024 = squirrel.where("Year", 2024)
species_counts = squirrels_2024.join("Squirrel ID", species).group("Species")
least_observed = species_counts.sort("Species").sort("count").column("Species").item(0)

least_observed

'red squirrel'