Discussion 10: Reverse Indexing with Hashing

In today’s discussion, you’ll flip a map of class rosters from courses → students (that is, associating each course with its list of enrolled students) into students → courses (that is, associating each student with their list of enrolled courses) using Java’s hashing collections (HashMap, LinkedHashMap). This gives practice with hash-based maps and stable iteration order while enforcing invariants

Learning Outcomes

  1. Write programs utilizing Sets and Maps.
  2. Analyze the efficiency of Map implementations backed by arrays, sorted arrays, (balanced) binary search trees, and/or hash tables.
  3. Implement data structures backed by chaining or probing hash tables.

Reminder: Discussion Guidelines

The work that you complete in discussion serves as a formative assessment tool; it offers the opportunity to assess your understanding of the material and for our course staff to get a “pulse” on how things are going, so we can make adjustments in future classes. Your grade in discussion is based on your attendance, active participation, and completion of that day’s activity. More information about the grading and expectations can be found in the syllabus.

Since discussion activities are not graded for correctness, we do not restrict what resources you may use to complete them, which include notes, books, unrestricted conversations with other students, internet searches, and the use of large language models or other generative AI. We advise you to be pragmatic about your use of these resources and think critically about whether they are enhancing your learning. Discussion activities are intended to serve as "strength training" for programming tasks we will expect on assignments and exams (and that you will encounter in future courses and careers), and their main benefit comes from thinking critically to "puzzle" them out.

Background: Reverse Indexing with Hash Maps

Today, you are given a forward index (hash map of rosters): Map<Course, List<Student>> rosters that you’ll clean (creating a new Map<Course, List<Student>> coursesToStudents) and use to generate a reverse index (hash map of schedules): Map<Student, List<Course>> studentsToCourses Then, you’ll use both of these indices to efficiently answer queries about their data.

Concretely, we are trying to answer the inverse question efficiently:

HashMaps shine here because they provide expected \(O(1)\) inserts/queries and do not force you to sort or scan all entries.

Ordering within hashed and nested structures

When we created the forward index (and later will traverse it to construct the reverse index), there may have been some significance to the order in which the courses were added (for example, sorted by course number, arranged by college, etc.). We might want to preserve this order and also have it reflected in the students’ course lists in the reverse index.

In general, we don’t have any guarantee of how the (key, value) pairs will be arranged in the underlying memory (i.e., the buckets). In particular, we won’t be able to easily recover the insertion order of these pairs given just the array of buckets. We generally think about the hash function as “scrambling” up the entries, which could cause us to lose track of this order. As a result, the Java library includes a LinkedHashMap class that supports the iteration over its keys/entries in insertion order.

Data cleaning: duplicate and null entries

Real rosters may include some repeated and missing data. Some students may have enrolled in the course, dropped it, and re-enrolled later. Other students may have transferred out of the university after registering for classes, leaving an empty spot in the roster. When you construct your index you should skip over null entries. You should also remove duplicate entries so that each student appears at most once on the roster of each course and each course appears at most once on the schedule of each student. You should preserve the order of first occurrence of each student.

Your Tasks

Exercise 1: Reverse Index by Hand
In this exercise you will convert the (uncleaned) forward map below to its reverse counterpart. As you do this, keep track of what steps you are taking. Which tasks are you repeating for each table row, or for each entry in one of the lists in the second column? These will correspond to loops in your code. What checks are you performing before writing new information into the reverse map?

Notes:
  • As mentioned above, any null student should be ignored.
  • As mentioned above, a student may appear multiple times in a course list; keep only the first occurrence of that student.
  • The first-seen student order is determined by scanning the table top-to-bottom, left-to-right.
  • Forward map (courses → students)

    Course Enrolled Students (in order)
    "CS 1110" ["s1111", "s1234", "s9001"]
    "CS 2110" ["s1234", "s9001", "s1111", "s9001", "s2222"]
    "CS 2800" ["s7777", "s1111", null, "s1234"]
    "CS 5154" ["s9001", "s2222", "s7777"]

    Reverse map (students → courses)

    (You may not need all of the rows.)
    Student Enrolled Courses (in order)
Exercise 2: Programming the Reverse Index
Now that we have a high level understanding, lets get ready to implement this! Download the starter code and take a couple minutes to read through the functionality that is already provided. The provided Course and Student record are very simplistic, but could be expanded with additional fields and functionality to build up a more realistic course registrar system (like Cornell's Student Center).

All of the code that you will write will be in EnrollmentIndex.java.
(a)

EnrollmentIndex Constructor

Most of the work of the EnrollmentIndex class is done by its constructor, which initializes both its forward and reverse indices to satisfy their class invariants. Complete the definition of this constructor to build these indices in one pass over the rosters map. It may help to reference the Map interface documentation to remind yourself what operations are available.

(b)

Index Query Methods

Complete the definitions of the isEnrolled(), courseCountOf(), and enrollmentOf() methods, which run small queries against the indices that you built. Determine which index (forward or reverse) to use to get the best performance. Make sure that you carefully consider the case that a student or course may not be present in the index to avoid propagating any exceptions.

(c)

Runtime Analysis

Determine the runtimes of your implementations of the methods from part (b) in terms of \(|S|\), the total number of students, and \(|C|\), the total number of courses.

If you only had the forward index coursesToStudents, which of these operations would have a strictly worse runtime? Explain your answer.

(d)

Iterator Methods

Complete the definitions of the students(), coursesOf(), and studentsIn() methods, which each return an Iterator. In the case of a student or course that is not present in the index, return Collections.emptyIterator(). You should not need to define any nested iterator classes to implement these methods.

Once you have completed your code, you can run the EnrollmentIndexDemo to confirm visually that your constructor correctly builds the reverse index and that your accessor methods work as intended. You can also run the provided test cases in tests/EnrollmentIndexTest.java to confirm the correctness of your implementation.

Submission

At the end of class, your group’s primary author should create a PDF with your written answers and upload this to the “Discussion 10” Gradescope assignment, tagging all other members of the group onto the submission.