7  Discussion 05: Table Methods, Conditionals & Iteration (from Fall 2025)

7.0.1 Contact Information

Name Wesley Zheng
Pronouns He/him/his
Email wzheng0302@berkeley.edu
Discussion Wednesdays, 12–2 PM @ Etcheverry 3105
Office Hours Tuesdays/Thursdays, 2–3 PM @ Warren Hall 101

Contact me by email at ease — I typically respond within a day or so!


7.0.2 Announcements

CautionAnnouncements
  • Fill out the Midterm Conflicts Form (QR code in last week’s slides).
  • Complete the Project 1 Partner Matching Form.
  • Starting with Homework 04, we will be stricter about submission issues — please submit your written work properly.

Welcome to the Project 1 Discussion Worksheet! This week we will cover table methods like .group and .pivot, conditional statements, and iteration. As a reminder, make use of the Python Reference and the Table Functions Visualizer on the Data 8 website, as well as the Discussion Reference sheet you are provided with!

NoteMaking the Most of the Reference Sheet

Your reference sheet is a powerful tool—spend a couple minutes learning how to use it!

  • The columns tell you what the inputs and outputs of each method are.
  • tbl is just a placeholder name. When you write your own code, replace it with the name of your table.

Knowing how to quickly read the sheet will save you time on assignments and exams.

7.1 Table Matchmaking

Shown below are the chocolates and nutrition tables respectively.

chocolates

Color Shape Amount Price ($)
Dark Round 4 1.30
Milk Rectangular 6 1.20
White Rectangular 12 2.00
Dark Round 7 1.75
Milk Rectangular 9 1.40
Milk Round 2 1.00

nutrition

Type Calories
Dark 120
Milk 130
White 115
Ruby 120
Code
from datascience import *
import random
import numpy as np

chocolates = Table().with_columns(
    'Color', ['Dark', 'Milk', 'White', 'Dark', 'Milk', 'Milk'],
    'Shape', ['Round', 'Rectangular', 'Rectangular', 'Round', 'Rectangular', 'Round'],
    'Amount', [4, 6, 12, 7, 9, 2],
    'Price ($)', [1.30, 1.20, 2.00, 1.75, 1.40, 1.00]
)

nutrition = Table().with_columns(
    'Type', ['Dark', 'Milk', 'White', 'Ruby'],
    'Calories', [120, 130, 115, 120]
)

Match the following table method calls to the resulting descriptions of tables.

Hint: Pay attention to the column names of the resulting tables! For example, what happens to the column names when you specify an aggregating function in group()? What happens when you only specify one column name? How does the behavior of pivot() differ from group()?

Letter Function Call
A chocolates.group("Shape")
B chocolates.group("Shape", max)
C chocolates.group(make_array("Shape", "Color"), max)
D chocolates.pivot("Color", "Shape", "Price ($)", max)
E chocolates.join("Color", nutrition, "Type")
F chocolates.group(make_array("Shape", "Color"))
Number Columns # of Rows
1 Shape, Color max, Amount max, Price ($) max 2
2 Shape, Dark, Milk, White 2
3 Shape, Color, Amount max, Price ($) max 4
4 Color, Shape, Amount, Price ($), Calories 6
5 Shape, count 2
6 Shape, Color, count 4


A: ____________  C: ____________  E: ____________
B: ____________  D: ____________  F: ____________
NoteGrouping Tables

.group lets us summarize a table by grouping rows with the same value in a column.

  • With no aggregation function, you get:
    • the grouped column
    • a new count column
  • With an aggregation function (e.g. max), you get:
    • the grouped column
    • all other columns summarized with the function name added to their labels

Think of .group whenever you want to count or summarize rows by a specific column.

NoteJoining Tables

.join combines information from two tables when they share a column of values.

  • Example: chocolates.join(nutrition) allows us to bring in the Calories column, which .group or .pivot cannot do.
  • Watch out if the column names differ—you may need to specify a third argument to align them.
  • The join column itself (like Type) may not appear in the final joined table.
NotePivoting Tables

.pivot is like grouping on two columns, but the output is shaped into a grid.

  • Each row corresponds to one unique value from the second argument.
  • Each column corresponds to one unique value from the first argument.

Example:

chocolates.pivot("Color", "Shape")

creates a table where Shape values are rows, and Color values (Dark, Milk, White) become column names.

Answer

Bing’s Amazing Slides:

A = 5

group() with no aggregating function yields a table with just the column that was grouped on, and a count column.

chocolates.group("Shape")
Shape count
Rectangular 3
Round 3

B = 1

group() with an aggregating function yields the grouped column, and the other columns of the original table with the function name added at the end of each one.

chocolates.group("Shape", max)
Shape Color max Amount max Price ($) max
Rectangular White 12 2
Round Milk 7 1.75

C = 3

group() with multiple columns and an aggregating function yields the columns that were grouped on, and the remaining columns with the function name added at the end.

chocolates.group(make_array("Shape", "Color"), max)
Shape Color Amount max Price ($) max
Rectangular Milk 9 1.4
Rectangular White 12 2
Round Dark 7 1.75
Round Milk 2 1

D = 2

pivot() yields the second argument to pivot() (“Shape”), as well as all unique values in the column of the first argument (“Dark”, “Milk”, “White” from the “Color” column) as columns in the resulting table.

chocolates.pivot("Color", "Shape", "Price ($)", max)
Shape Dark Milk White
Rectangular 0 1.4 2
Round 1.75 1 0

E = 4

join() gives you all the columns from the two tables, except for the extra column that is being used to join the two tables (“Type” is dropped in the resulting table).

chocolates.join("Color", nutrition, "Type")
Color Shape Amount Price ($) Calories
Dark Round 4 1.3 120
Dark Round 7 1.75 120
Milk Rectangular 6 1.2 130
Milk Rectangular 9 1.4 130
Milk Round 2 1 130
White Rectangular 12 2 115

F = 6

group() with multiple columns and no aggregating function yields the columns that were grouped on, and a count column.

chocolates.group(make_array("Shape", "Color"))
Shape Color count
Rectangular Milk 2
Rectangular White 1
Round Dark 2
Round Milk 1

7.2 Chocolate!

7.2.1 (a)

The % operator returns the remainder if you divide by a certain number (e.g., 11 % 5 = 1). If a number n is odd, what will n % 2 return?

Answer

It will return 1. Note that if a number is even, n % 2 will return 0.

n = 11
n % 2
1

7.2.2 (b)

What will the following code output? Format your answer in the same way you expect Python to output the result of running the following code.

NoteUnderstanding If vs Else
  • The computer evaluates conditions in order.
    • If the first if is True, it runs that block and skips the else.
  • You can nest if statements inside other ifs.
    • If the outer if is not satisfied, the inner if is not even checked.

Strings in Python

  • You can multiply or add strings:
    • "hi" * 3"hihihi"
    • "Jelly" + "Belly""JellyBelly"
  • Watch out for spaces when concatenating!

if vs elif vs multiple ifs

  • Multiple ifs: even if one condition is True, Python will still check the others.
  • With elif/else: once one condition is satisfied, the rest are skipped.
for c in np.arange(5):
  if c < 2:
    print("Chocolate?")
    if c % 2 == 0:
      print("Yes sir! With or without nuts?")
  else:
    if c % 2 == 1:
      print("Again!?")
    else:
      print("CHOCOLATE" + ("!" * (c * 2 - 3)))
Answer
Code
for c in np.arange(5):
  if c < 2:
    print("Chocolate?")
    if c % 2 == 0:
      print("Yes sir! With or without nuts?")
  else:
    if c % 2 == 1:
      print("Again!?")
    else:
      print("CHOCOLATE" + ("!" * (c * 2 - 3)))
Chocolate?
Yes sir! With or without nuts?
Chocolate?
CHOCOLATE!
Again!?
CHOCOLATE!!!!!

7.2.3 (c) (Bonus!)

Remember, we can “multiply” a string by a certain number (e.g., 'data8' * 2). What is the result of doing so? Take 'data8' * 2 as an example.

Answer

It will return 'data8data8'. More generally speaking, it will concatenate the string with itself that number of times with no empty space between them.

'data8' * 2
'data8data8'

7.3 Getting Even

7.3.1 (a)

The count_evens function takes in an array of integers and returns the number of even integers in the array. Use a combination of iteration and conditionals to complete the skeleton code below.

def count_evens(arr):
  num_evens = 0
  for ________________________________________________:
  if ________________________________________________:
    ________________________________________________
  return ________________________________________________
NoteCounting with Loops and Arrays

Why initialize with num_evens = 0?

  • This variable keeps track of how many even numbers have been seen.

For loops

  • Sometimes we care about the loop variable, sometimes we don’t.
  • Here, we only increment num_evens when the current number is even.
    • Controlled with an if condition.

Using arrays instead of loops

  • Arrays often make counting easier than loops.
  • Ways to count evens in an array:
    • sum(condition)
    • np.count_nonzero(condition)
    • len(array) - number_of_odds
  • Many approaches lead to the same answer.
Answer
def count_evens(an_array):
  num_evens = 0
  for num in an_array:
    if num % 2 == 0:
        num_evens = num_evens + 1
  return num_evens
arr = make_array(1, 2, 3, 4, 5, 6, 7)
count_evens(arr)
3

7.3.2 (b)

Now, let’s see how we can write the same function using array operations instead of iteration.

def count_evens(arr):
  remainder_array = ________________________________________________
  return ________________________________________________
Answer
def count_evens(arr):
  remainder_array = arr % 2
  return np.count_nonzero(remainder_array == 0)
arr = make_array(1, 2, 3, 4, 5, 6, 7)
count_evens(arr)
3

or

def count_evens(arr):
  iseven = (arr % 2) == 0
  return np.count_nonzero(iseven)
arr = make_array(1, 2, 3, 4, 5, 6, 7)
count_evens(arr)
3

Note it is acceptable to use sum or np.sum instead of np.count_nonzero for the above two solutions.

or

def count_evens(arr):
  remainder_array = arr % 2
  return len(remainder_array) - sum(remainder_array)
arr = make_array(1, 2, 3, 4, 5, 6, 7)
count_evens(arr)
3

7.4 Squirrel Sightings (Optional)

The table squirrel below contains some information on reported squirrel sightings across the UC Berkeley campus. Each row in the squirrel table represents one unique squirrel sighting:

  • Squirrel ID (int): unique identification number for each squirrel.
  • Location (string): common name of the nearest campus landmark where the squirrel was spotted.
  • Month (int): numerical representation of the month when the squirrel was spotted.
  • Day (int): day of the month when the squirrel was spotted.
  • Year (int): year when the squirrel was spotted.
Code
squirrel_ids = [2937, 8421, 472, 239, 2937]
locations = ["Wheeler Hall", "East Asian Library", "Etcheverry Hall", "Campbell Hall", "Moffitt Library"]
days = [17, 28, 7, 4, 7]
months = [3, 9, 1, 10, 6]
years = [2024, 2022, 2024, 2023, 2021]

location_pool = [
    "Wheeler Hall", "East Asian Library", "Etcheverry Hall",
    "Campbell Hall", "Moffitt Library", "Doe Library",
    "Cory Hall", "Soda Hall", "Evans Hall", "Haas Pavilion",
    "Stanley Hall", "Physics North", "Physics South"
]

for _ in range(995):
    squirrel_ids.append(random.randint(100, 9999))
    locations.append(random.choice(location_pool))
    days.append(random.randint(1, 28))
    months.append(random.randint(1, 12))
    years.append(random.choice([2021, 2022, 2023, 2024]))

squirrel = Table().with_columns(
    "Squirrel ID", squirrel_ids,
    "Location", locations,
    "Day", days,
    "Month", months,
    "Year", years
)

squirrel.show(5)
Squirrel ID Location Day Month Year
2937 Wheeler Hall 17 3 2024
8421 East Asian Library 28 9 2022
472 Etcheverry Hall 7 1 2024
239 Campbell Hall 4 10 2023
2937 Moffitt Library 7 6 2021

... (995 rows omitted)

7.4.1 (a)

Write a line of code that evaluates to the proportion of Squirrel IDs in the table that are even.

NoteWorking with Proportions

Proportions often come from counting how many times a condition is True.

  • In Python, True is stored as 1 and False as 0.

  • This means we can calculate proportions in many ways:

    • np.mean(condition_array)
    • np.average(condition_array)
    • sum(condition_array) / len(condition_array)
  • With tables:

    tbl.where(...).num_rows / tbl.num_rows

    This flexibility means you can often solve the same problem in multiple ways.

Answer

np.mean(squirrel.column("Squirrel ID") % 2 == 0) Any equivalent code is acceptable, e.g. using np.average.

np.mean(squirrel.column("Squirrel ID") % 2 == 0)
0.48599999999999999

7.4.2 (b)

Alankrita wants to find the best location where she is most likely to find a squirrel. Write a line of code that evaluates to the location with the most squirrel sightings.

NoteCommon Patterns in Data 8

There are a few coding patterns you’ll see again and again.

  • Finding the largest or smallest value:

    tbl.sort("column").column("column").item(0)
  • Sort the table, pull out the column, then grab a single value.

  • Watching data types:

  • Many problems move through this chain:

    • Table → Array → Single Value

Knowing what type of object you have at each step helps you decide what methods you can use next.

Answer
squirrel.group("Location").sort("count", descending = True).column("Location").item(0)
'Doe Library'

7.4.3 (c)

Alankrita is interested in how many squirrels were sighted at every location during every month. Create a table called sightings where each cell contains the number of squirrel sightings that occurred in 2023 at each location during each month. Note: Each row should be in a different location.

squirrels_2023 = ________________________________________________________________________________________
sightings = _____________________________________________________________________________________________
NoteMaking Sense of Pivot Tables

Pivot tables reorganize data so you can make comparisons more easily.

  • The first argument becomes the set of new column labels.
  • The second argument becomes the set of row labels.
  • Each cell shows the value(s) associated with that row–column pair.

Example:

chocolates.pivot("Color", "Shape")

This creates a table where each row corresponds to a shape, and each column corresponds to a chocolate color (Dark, Milk, White).

Answer
squirrels_2023 = squirrel.where("Year", are.equal_to(2023))
sightings = squirrels_2023.pivot("Month", "Location")
sightings
Location 1 2 3 4 5 6 7 8 9 10 11 12
Campbell Hall 2 1 4 3 2 1 1 2 3 2 0 4
Cory Hall 2 1 3 1 1 2 3 1 0 1 1 1
Doe Library 2 1 3 0 3 2 2 0 2 1 2 2
East Asian Library 4 2 3 1 1 3 1 5 1 2 1 0
Etcheverry Hall 1 0 2 1 2 2 1 2 2 1 1 3
Evans Hall 2 0 1 4 2 0 1 0 2 2 2 3
Haas Pavilion 1 2 1 4 2 1 0 2 2 0 2 1
Moffitt Library 2 3 2 5 0 1 0 2 0 2 2 2
Physics North 0 0 1 4 4 1 4 1 2 1 2 2
Physics South 3 1 1 1 1 2 3 0 1 2 3 1

... (3 rows omitted)


7.4.4 (d)

Alankrita now has access to another table, species, that contains information about the species of each squirrel. Some of the rows from this table are shown below.

NoteBreaking Down Multi-Step Problems

Some problems ask you to combine several table methods in sequence, like .where(), .group(), .sort(), or .join().

How to Tackle These

  • Take it step by step. Don’t try to write the whole solution in one go.
  • Check what each step produces. Is it still a Table? Or did it turn into an Array or a single Value?
  • Build gradually. Run one line at a time and confirm the output before moving on.

This approach makes complicated problems much easier to manage and helps you avoid getting lost.

Code
species_pool = [
    "eastern gray squirrel",
    "western gray squirrel",
    "fox squirrel",
    "Douglas squirrel",
    "red squirrel"
]

squirrel_ids_species = [2937, 8421, 472, 239]
species_array = [
    "eastern gray squirrel",
    "fox squirrel",
    "western gray squirrel",
    "western gray squirrel"
]

for sid in squirrel.column("Squirrel ID")[4:]:
    squirrel_ids_species.append(sid)
    species_array.append(random.choice(species_pool))

species = Table().with_columns(
    "Squirrel ID", squirrel_ids_species,
    "Species", species_array
)

species.show(4)
Squirrel ID Species
2937 eastern gray squirrel
8421 fox squirrel
472 western gray squirrel
239 western gray squirrel

... (996 rows omitted)

Write lines of code to find the least observed species in 2024. If multiple species are tied for the least sightings, find the species that comes first alphabetically.

squirrels_2024 = ____________________________________________
species_counts = ____________________________________________
least_observed = ____________________________________________
Answer
squirrels_2024 = squirrel.where("Year", 2024)
species_counts = squirrels_2024.join("Squirrel ID", species).group("Species")
least_observed = species_counts.sort("Species").sort("count").column("Species").item(0)
least_observed
'red squirrel'