Discussion 10: Reverse Indexing with Hashing
In today’s discussion, you’ll flip a map of class rosters from courses → students (that is, associating each course with its list of enrolled students) into students → courses (that is, associating each student with their list of enrolled courses) using Java’s hashing collections (HashMap, LinkedHashMap). This gives practice with hash-based maps and stable iteration order while enforcing invariants
Learning Outcomes
- Write programs utilizing Sets and Maps.
- Analyze the efficiency of Map implementations backed by arrays, sorted arrays, (balanced) binary search trees, and/or hash tables.
- Implement data structures backed by chaining or probing hash tables.
Reminder: Discussion Guidelines
The work that you complete in discussion serves as a formative assessment tool; it offers the opportunity to assess your understanding of the material and for our course staff to get a “pulse” on how things are going, so we can make adjustments in future classes. Your grade in discussion is based on your attendance, active participation, and completion of that day’s activity. More information about the grading and expectations can be found in the syllabus.
Background: Reverse Indexing with Hash Maps
Today, you are given a forward index (hash map of rosters): Map<Course, List<Student>> rosters that you’ll clean (creating a new Map<Course, List<Student>> coursesToStudents) and use to generate a reverse index (hash map of schedules): Map<Student, List<Course>> studentsToCourses Then, you’ll use both of these indices to efficiently answer queries about their data.
Concretely, we are trying to answer the inverse question efficiently:
- Given a course, find its students. (Forward index you already have.)
- Given a student, find their courses. (Reverse index you’ll build.)
HashMaps shine here because they provide expected \(O(1)\) inserts/queries and do not force you to sort or scan all entries.
Ordering within hashed and nested structures
When we created the forward index (and later will traverse it to construct the reverse index), there may have been some significance to the order in which the courses were added (for example, sorted by course number, arranged by college, etc.). We might want to preserve this order and also have it reflected in the students’ course lists in the reverse index.
In general, we don’t have any guarantee of how the (key, value) pairs will be arranged in the underlying memory (i.e., the buckets). In particular, we won’t be able to easily recover the insertion order of these pairs given just the array of buckets. We generally think about the hash function as “scrambling” up the entries, which could cause us to lose track of this order. As a result, the Java library includes a LinkedHashMap class that supports the iteration over its keys/entries in insertion order.
Data cleaning: duplicate and null entries
Real rosters may include some repeated and missing data. Some students may have enrolled in the course, dropped it, and re-enrolled later. Other students may have transferred out of the university after registering for classes, leaving an empty spot in the roster. When you construct your index you should skip over null entries. You should also remove duplicate entries so that each student appears at most once on the roster of each course and each course appears at most once on the schedule of each student. You should preserve the order of first occurrence of each student.
Your Tasks
Notes:
- As mentioned above, any
nullstudent should be ignored. - As mentioned above, a student may appear multiple times in a course list; keep only the first occurrence of that student.
- The first-seen student order is determined by scanning the table top-to-bottom, left-to-right.
Forward map (courses → students)
| Course | Enrolled Students (in order) |
|---|---|
| "CS 1110" | ["s1111", "s1234", "s9001"] |
| "CS 2110" | ["s1234", "s9001", "s1111", "s9001", "s2222"] |
| "CS 2800" | ["s7777", "s1111", null, "s1234"] |
| "CS 5154" | ["s9001", "s2222", "s7777"] |
Reverse map (students → courses)
| Student | Enrolled Courses (in order) |
|---|---|
Course and Student record are very simplistic, but could be expanded with additional fields and functionality to build up a more realistic course registrar system (like Cornell's Student Center).All of the code that you will write will be in
EnrollmentIndex.java.
EnrollmentIndex Constructor
Most of the work of the EnrollmentIndex class is done by its constructor, which initializes both its forward and reverse indices to satisfy their class invariants. Complete the definition of this constructor to build these indices in one pass over the rosters map. It may help to reference the Map interface documentation to remind yourself what operations are available.
Index Query Methods
Complete the definitions of the isEnrolled(), courseCountOf(), and enrollmentOf() methods, which run small queries against the indices that you built. Determine which index (forward or reverse) to use to get the best performance. Make sure that you carefully consider the case that a student or course may not be present in the index to avoid propagating any exceptions.
Runtime Analysis
Determine the runtimes of your implementations of the methods from part (b) in terms of \(|S|\), the total number of students, and \(|C|\), the total number of courses.
If you only had the forward index coursesToStudents, which of these operations would have a strictly worse runtime? Explain your answer.
Iterator Methods
Complete the definitions of the students(), coursesOf(), and studentsIn() methods, which each return an Iterator. In the case of a student or course that is not present in the index, return Collections.emptyIterator(). You should not need to define any nested iterator classes to implement these methods.
Once you have completed your code, you can run the EnrollmentIndexDemo to confirm visually that your constructor correctly builds the reverse index and that your accessor methods work as intended. You can also run the provided test cases in tests/EnrollmentIndexTest.java to confirm the correctness of your implementation.
Submission
At the end of class, your group’s primary author should create a PDF with your written answers and upload this to the “Discussion 10” Gradescope assignment, tagging all other members of the group onto the submission.