On this page:
Diagram Conventions
Throughout CS 2110, we introduce many different types of diagrams to help us visualize the execution or organization of our programs. This page serves as a comprehensive reference for our diagramming conventions. As we describe below, many of the diagramming conventions can be very particular; this is intentional, as carefully adhering to the guidelines will help us avoid some common pitfalls or misunderstandings.
Object Diagrams
In the first few weeks of this course, we go over how to diagram the memory utilization of our programs in a way that is both accurate and easily understandable. These object diagrams (or memory diagrams) visualize the state of a program’s memory at one particular point during its execution (as opposed to trying to diagram its evolution over time).
The Runtime Stack
When a method is called, space is allocated for its parameters and local variables on the runtime stack (sometimes called just the stack or the call stack). All of these variables (for a single method) are grouped in a unit known as a call frame (sometimes called just a frame or an activation record).
We draw the runtime stack as an open column on the left of our diagrams.
We draw call frames as rectangles within the runtime stack column, labeled on the left with the name of the method with which they are associated. We often omit the parameters from this label, but sometimes include them for recursive methods to help disambiguate multiple frames for the same method.
Within the call frame rectangle, we draw a smaller box for each of the method’s parameters and each of the method’s local variables, one per line. We label these boxes on the left with the variable’s name, followed by a colon, followed by its static type (sometimes written on a second line to conserve space). We will often call these labeled boxes the entries of the call frame. The order of the entries can be chosen arbitrarily, though we typically write the parameters (in order) at the top of the frame and then write the local variables (in declaration order) below these.
|
|
|
|
Program execution begins with the main() method, which is the bottom frame on the runtime stack. Whenever another method is called, the call frame for its invocation is added on top of the previous call frame. The arguments passed into this method are evaluated and stored in the corresponding parameter variables, and execution proceeds in the new call frame.
As an example, the stack frames for the following code immediately after entering method() would be diagrammed as follows (note that for now, we omit the visualization of args in this diagram; we’ll discuss this below):
|
|
|
|
Also note that if a method’s parameter itself comes from a method call (as in the example above), methods are run from inside to outside. In the example above, method() must return before method2() begins.
When a method completes execution (i.e., it executes a return statement or a void method reaches the end of its body), its call frame is destroyed, and its return value is passed back to the previous call frame, where execution continues. In this class, we do not include any record of previously destroyed call frames in our memory diagrams.
Runtime Stack Rules
-
The stack frames always start at the bottom and grow upward as new methods are called. Newer method call frames are “stacked” on top of the call frames for the methods that called them.
-
All call frames are labeled with the method’s name.
-
The first (i.e., lowest) stack frame is almost always
main()(except in the case of multi-threaded programs, which we will discuss at the end of the course). -
Call frames include entries for every parameter and local variable in the method throughout the entire duration of the method’s execution, whether or not that variable has been initialized, accessed, or has gone out of scope.
-
All call frame entries are labeled with the variable name and its static type.
-
As soon as a method returns, we do not visualize it in our memory diagram. The runtime stack visualization only includes call frames for currently active methods.
Primitive Types
There are eight primitive types in Java: int, double, boolean, char, short, long, float, and byte. In this class, we will mostly utilize only the first four of these. Variables with primitive types directly store their values. In our diagrams, we write the value of a variable with a primitive type directly inside its box. This is true both for primitive parameters or local variables on the runtime stack and for primitive fields of objects (which we discuss more below). This idea is illustrated for the values of param1 and param2 in the previous diagram.
Objects and the Memory Heap
The memory heap (often shortened to just the heap) is a separate region of memory from the runtime stack. The memory for all objects constructed during a program’s execution is allocated on the memory heap. In our object diagrams, we visualize the heap as the open area on the right side of the diagram.
Any variable that is not declared with one of the eight primitive types listed in the previous section has a reference type. Some examples of reference types are:
StringScannerCounterdouble[]int[][]List<String>Map<K,V>
There is no limit on the number of different reference types, since we can create our own. Instances of reference types are called objects, and variables with reference types store a reference to these objects.
Objects are drawn as rounded rectangles or ovals on the memory heap with labeled (dynamic) types. Inside the rounded rectangle for an object, we include a depiction of the object’s state. We symbolically represent a reference using an arrow from within a variable’s box pointing to the edge of the object (on the heap).
In the following method(), the String variable str would hold an arrow referencing a String object on the heap, labeled with its contents.
|
|
|
|
An object is included in our diagram as long as it is reachable from the runtime stack, meaning that there is at least one incoming arrow pointing to the object from a variable on the runtime stack or from a field of another reachable object. As soon as an object becomes unreachable, it is not visualized on the memory heap.
Memory Heap Rules
-
Only objects can be drawn on the memory heap, visualized as rounded rectangles.
-
Objects are no longer visualized as soon as they become “unreachable” (i.e., have no incoming reference arrows).
-
Objects are always labeled outside of the rounded rectangle with their (dynamic) type.
-
There are never “free boxes” floating on the heap. All variable boxes are on the stack or are contained within an object to visualize its fields (see below).
Object Reference Rules
-
Boxes never contain the contents of reference types. A box can only ever contain a single primitive value or a single arrow tail.
-
Rounded rectangles or other boxes are never drawn inside of boxes.
-
Arrow tails must always start from within a box.
-
Arrow heads must always point to the edge of a rounded rectangle (i.e., an object).
Visualizing Reference Types
In this section, we review the conventions for diagramming specific types of objects.
Strings
We depict String objects by writing the string literal inside their rounded rectangle. An example of this is shown in the previous figure.
We use a similar convention to visualize the current contents of a StringBuilder object.
Arrays
An array stores a fixed number of elements of a single type in a particular (one-dimensional) order. We depict an array as a connected row of boxes. There are as many boxes as the length of the array. The boxes are indexed (usually underneath) from [0] counting up. Inside each box is the value at the corresponding index of the array.
Arrays are default-initialized as follows:
-
Arrays of numerical types (e.g.,
int,double, andchar) are, by default, initially filled with0s (or, more specifically,0s,0.0s, and'\u0000's, respectively). -
Arrays of
booleans are, by default, initially filled withfalse. -
Arrays of reference types are, by default, initially filled with
null.
The following example demonstrates how some arrays should be visualized. We’ll assume that the program is run with no command-line arguments, meaning args is an empty array (i.e., args.length == 0).
|
|
|
|
Multidimensional Arrays
A multidimensional array is an array of arrays. Since arrays are reference types, each index of the “outer” array holds a reference to an inner array object (whose contents have the declared type). The following example depicts a String[][] multidimensional array, which includes two levels of indirection (i.e., arrows from a field of one heap object to another heap object).
|
|
|
|
Notice here that we moved some of the dynamic type labels to beneath the depiction of the object to avoid a collision with the arrow. Doing this is fine (as long as it’s still clear which object the label belongs to).
Other Java Library Objects
For other Java library objects (i.e., objects whose classes we did not define), we have less information about their internal state. We can (unless otherwise noted) visualize these objects as labeled, empty rounded rectangles.
|
|
|
|
User-Defined Reference Types
When we create a new reference type by defining its class, we have a complete understanding of its internal state (i.e., its fields). We visualize these fields as variables within the object labeled with their name and (static) type, analogous to call frame entries.
For example, if we define the following Person class:
|
|
|
|
Then we can visualize a Person object as follows:
|
|
|
|
Visualizing Reference Semantics
In this final section on object diagrams, we’ll outline a few more special circumstances for diagramming reference types.
null
null is a special value that any reference type can take and indicates the absence of an object reference. We indicate null variables in our object diagrams by drawing a diagonal slash through the box for that variable.
For example, the call frame at the end of this method() would be diagrammed as follows.
|
|
|
|
Aliasing
Recall that when we assign an expression with a reference type to another reference type variable, this creates a copy of the reference (i.e., another arrow starting from a different box but pointing to the same object). This is called an alias reference and is common in Java programs, especially for methods with reference type parameters. In this case, the copying of the argument into the parameter’s call frame entry will result in two different frames pointing to the same object.
|
|
|
|
Instance Methods and this
When we define an instance method, the keyword this is used to refer to the object on which the method was invoked (i.e., its target). We visualize this by adding an entry for this to the instance method’s call frame storing this reference.
|
|
|
|
At the end of the execution of the happyBirthday() method, we would visualize:
Array Diagrams
Now that we know how to depict the memory state of our programs, we can consider how to depict other parts of our programs in more detail. In particular, it is useful to diagram properties of various ranges of an array. We will do this using array diagrams. Rather than focusing on low-level details like the memory diagrams above, array diagrams focus more on the array itself. Array diagrams are particularly useful in describing loop invariants for loops over arrays. As we will see below, unlike memory diagrams, we often use multiple array diagrams together to depict the state of an array over time.
Range Notation in Text
Before we begin diagramming different ranges of an array, let us first discuss how we denote array ranges in text. While range notation is not inherently part of array diagramming, it is useful enough for describing array diagrams that we include it here. Through the rest of this section, we will assume an array a with a.length at least 4.
To denote a range of a, we’ll describe its first and last indices. Usually, the first index must be less than or equal to the last index (we will see the exception at the end of this section). Everything in between and including these indices is part of the range. Using the notation, we write a[start..end] to denote the range of a from start to end, inclusive of both start and end. We dissect each part of the notation below.
First, we declare which array we are talking about by writing the name of the array. Then, we have an opening bracket followed by the first index of our range. We use two dots to denote that we are representing a range. Finally, we have the last index of our range followed by a closing bracket.
One of the simplest array ranges is the entire array, which we denote a[0..a.length-1].
Exclusive Range Boundaries
If we want an exclusive range boundary, then instead of using a bracket, we use a parenthesis. For example, the range a[0..a.length-1] can be equivalently written as a[0..a.length). This is all the elements of a from 0 to a.length, including 0 and not including a.length.
Shorthand
We will commonly adopt the shorthand that when a range boundary lies on the edge of an array, we need not write the number of the boundary, and will instead merely use a bracket with no number. For example, the range a[0..a.length) can be written more simply as a[..].
Concrete Example
As a more concrete example, let char[] b = {'H', 'E', 'L', 'L', 'O', ' ', 'w', 'o', 'r', 'l', 'd', '!'}. Then, to describe just the characters for 'HELLO', we can use the notation b[0..5). The notations b[0..4] and b[..5) can also be used. (There is one more reasonable way to write this. Can you think of what it is?)
Now, we can use range notation to state truths about parts of arrays. For example, we can say something like “b[..5) are all capital English letters”.
Empty Range
As stated above, there is one case when the start index can be greater than the end index. This will sometimes occur when we depict an empty region of an array. Usually, we use indices that are one apart. For example, a[2..1] represents an empty array range.
Empty ranges can also be depicted in other ways using our range notation conventions. For example, a(3..3), a(3..4), and a[..0) all represent empty ranges.
Depicting an Array
We depict an array as a long (length-wise) and short (height-wise) rectangular box. We label the box with three labels. To the left of the box, we label the name of the array. On top of the box, we label the start and end indices.
There are a few important details to note here. Somewhat unimportantly, the array name label is followed by a colon. More crucially, no index label lies on a boundary. The labels for 0 and a.length both lie to the right of the respective boundaries they represent. To see why, consider what a boundary line represents. A boundary line represents a division between two parts of an array. Since an element (such as a[0]) must be in one part of the array, its index must be on one side of the partition line.
Let us give a more concrete example. It is important to note that this diagram is not a normal array diagram, but rather a mix of an array diagram and a memory diagram. Let’s say int[] b = {10, 11, 12, 13}. Then, b.length == 4. Looking at the diagram below, we can see that putting the labels for 0 and b.length on the right of their respective boundaries is correct.
Depicting Ranges
We use vertical lines to delineate ranges of an array and use index labels at the top of these lines to clarify these delineations. Once again, since these labels represent indices, they must be either to the left or to the right of a boundary line.
For example, if we want to split a immediately after index 3, we depict it as follows, with the label 3 on the left of the boundary line. The first range is a[..3] and the second range is a(3..].
Oftentimes, these labels will not be int literals, but will instead be variables that take valid index values. This is especially common when we use array diagrams in tandem with loops.
Each range usually must be labeled with properties about the range. For example, the following array diagram depicts an int array in which all the elements before and at i are even and all the elements after i are odd.
We can and often will have more than two ranges, in which case we simply add another boundary line. In addition, we might not know any properties about a particular region. If this is the case, we simply label that region of the array with a question mark. Both of these features are depicted in the next array diagram.
Loop Invariants
Array diagrams are useful to reason about loops and loop invariants. Through the rest of this section, we will illustrate this with the following makeZero() method.
|
|
|
|
When reasoning about loops, we should first think about what we know to be true before the loop begins. We call this the “Pre” diagram. In this case, we know i = 0 and nothing about the array a. We diagram this:
Now, let us consider what we would like a to look like after the loop is complete. We call this the “Post” diagram. In this case, we want a to consist of all zeros. In addition, after the loop is complete, the loop guard i < a.length must be false. Since i starts less than a.length and increments by one every iteration, the loop guard will become false exactly when i == a.length. We diagram this:
Note how in the above two diagrams, i is stacked above 0 (in the “Pre” diagram) and above a.length (in the “Post” diagram). This shows that i == 0 and i == a.length respectively. Sometimes, loop variables like i are excluded in “Pre” and “Post” diagrams when they don’t correspond to separate boundaries from the ends of the array; however, it is never incorrect to include them (and can be helpful for coding up a loop using the diagrams).
Finally, let us diagram the invariant, which we call the “Inv” diagram. We know i is somewhere between 0 and a.length. We know that elements with indices before i have been set to zero, and the remaining elements are unknown. We diagram this:
There is one additional part of the “Inv” diagram that we did not discuss: the boundary between the two regions. Over the duration of the loop, the boundary moves from left to right. So, we will draw an arrow to depict this movement and its direction. This arrow is necessary for any moving boundaries.
Just as with loop invariants, we will require any variable modified in the loop to be somehow depicted in the “Inv” diagram.
Important Notes
-
All diagrams require a minimum of three labels: the array name, the starting index, and the ending index.
-
Indices never go directly above boundary lines. They should each be either to the left or to the right of a boundary line.
-
Each range of an array diagram must be labeled with the known properties of the range that differentiate it from other ranges. If nothing is known, the region should be labeled with a question mark.
-
Any boundary line whose position changes over the course of a loop must be labeled below with an arrow pointing in its direction of motion as the loop progresses.
-
All variables whose values are reassigned within the loop body must be depicted in the “Inv” diagram. If they don’t naturally correspond to one (or more) of the array ranges, list their invariants directly below or beside the diagram.
Node Diagrams
At this point, we have learned how to diagram the memory state of our programs at a fixed snapshot in time and how to diagram the state of an array (perhaps over the course of a loop). Now we will learn to diagram linked data, using node diagrams. Just like array diagrams, node diagrams abstract away some of the detail present in memory diagrams. While array diagrams do this to more readily represent array ranges, node diagrams do this to place focus on the nodes and links of linked data.
There are multiple types of node diagrams that we can draw, each with its own conventions. We will present them in the order introduced in lecture: linked lists, then trees, then graphs.
Linked Lists
Consider the following SinglyLinkedList and array:
|
|
|
|
Then, we can draw a memory diagram immediately before fromArr() returns as follows:
But this is really complicated! It is also hard to see the structure of the data without looking at the diagram with more care. (You may notice that in the above diagram, some types are different from what you might have expected. This is because Java has very specific typing rules when it comes to generic types. The details of this are outside the scope of this course.)
We aim to simplify this using node diagrams. We denote our node diagram for singly linked lists as follows:
We will draw a node as a rounded rectangle with two parts, separated by a vertical bar.
The first (left) part will consist of the data stored in the node. If this is easily displayed (e.g., Integer, String), then we can draw the data directly into the node. Otherwise, we draw the data as its own object with a reference pointing from the data section of the node to the data object.
The second (right) part will consist of the next node in the list. This will just be a reference (arrow) from this section of the node to the next node, or null in the case of tail. As with object diagrams, an arrow should start from within a node’s next section and end on the outer edge of another node (or an object’s rounded rectangle, if the reference is for the data section of a node).
In addition to nodes, a linked list has a pointer to the head and tail of the list. We draw these in as references to the first and last elements of the list. Note here that there is no box drawn. This is the only diagram we have seen so far in which boxes are not drawn for variables.
Usually, when we draw these diagrams, the size of the list will be apparent, so we will not depict size as a separate field.
Doubly Linked Lists
So far, we have discussed diagramming singly linked lists. Doubly linked lists have the same setup, but instead of two parts, we split each node into three (equal-sized) parts. The first (left) part will represent the link to the previous node, the second (middle) part will represent the data stored in the node, and the third (right) part will represent the link to the next node. For example, this diagram (from the A6 assignment handout) represents a doubly linked list. The labels prev, data, and next are optional. Note that here tail is defined differently than in the SinglyLinkedList we defined in class.
Trees
We will begin by describing the diagramming conventions for binary trees and then generalize this to general trees.
Similar to a linked list, a binary tree has nodes. However, while each node of a (singly) linked list has a next node, this is not the case for a node in a binary tree. Instead, a node in a binary tree can have up to two children. Usually, we will call these children the left child and the right child. Accordingly, when we diagram a binary tree, we will draw the left child on the “left” of a node and the right child on the “right” of a node.
Consider the following binary tree, which has root node with data 10 and two children with data 11 and 12 (left and right, respectively).
|
|
|
|
A memory diagram for this tree might look something like this:
As before, this is useful if we want to see how the tree is represented in memory, but less useful if we want to easily see properties of the tree itself. When we draw a tree node, we will simply draw a circle with the data inside, similar to the left half of a node in a linked list.
To draw children, we will draw arrows pointing downward. The left child will be to the left of the parent node and the right child will be to the right of the parent node. For example, the tree above can be written in full as:
By the nature of a tree, it is fairly obvious which node is the root, so we will not explicitly depict a reference to the root. It is important to note that the way we depict trees is inverted from what trees actually look like; in particular, the root is at the top of the tree and leaves are at the bottom. This is common in computer science. (Non-CS people joke that this convention was adopted because computer scientists never go outside and so never see trees, so don’t know they’re wrong!)
General Trees
So far, we have discussed diagramming binary trees. Diagramming general trees is similar. Nodes in general trees do not have left or right children; rather, they often have a variable number of children. For example, in the following tree, some nodes have one child and others have three children. Assume a GeneralTree class has been made, which takes in an array to represent children.
|
|
|
|
We will diagram this tree in the same way we diagrammed binary trees. Instead of a left and a right arrow for the children of a node, each node will have the same number of “child arrows” as it has children. For example, the node with data 10 has three children, so it will have three “child arrows”.
We will typically draw the children of a general tree in the order in which they were added, if the order is known. We will also typically draw the children of a general tree so that they are symmetrically positioned, no matter how many children there are.
Note that this is different from a fixed-arity tree. With a binary tree, there are fixed left and right children. If the left child is null and the right child is not, then the right child should still be depicted on the “right side” of the parent node. Similarly, with a ternary tree (each node has up to three children), there are fixed left, middle, and right children. If the middle and right children are null and the left child is not, then the left child should be depicted on the “left side” of the parent node.
Graphs
Graphs are similar to trees, but instead of having a hierarchical structure, vertices (i.e., “graph nodes”) in a graph can point to each other in more complex ways, allowing for features like loops. In fact, we can think of all trees as a subset of all graphs and all (singly) linked lists as a subset of all trees.
In a tree, nodes have children. In a graph, vertices have edges to other vertices. These edges can be directed or undirected. A directed edge e = (u,v) represents a connection from vertex u to vertex v. An undirected edge e = {u,v} represents a connection between vertices u and v, which can go in either direction. Accordingly, we will depict directed edges as arrows and undirected edges as lines. A directed edge (u,v) will be depicted as an arrow from u to v. An undirected edge {u,v} will be depicted as a line between u and v.
Any undirected edge {u,v} can be represented nearly equivalently as a pair of directed edges (u,v), (v,u). Because of this, we will only use directed graphs (graphs in which all edges are directed) in this course. In addition, we will disallow edges of the form (v,v) (self-loops) and will allow at most one edge of the form (u,v) (that is, given two vertices u and v, we will only allow zero or one edges from u to v).
Consider the simple graph created in the following code.
|
|
|
|
Then, this graph can be diagrammed as:
We can also diagram more complicated graphs. For example, consider the following graph, which would be quite time-intensive to draw a memory diagram for.
|
|
|
|
This can be diagrammed simply as:
Some graphs will also have weights or costs (or, perhaps, lengths) on its edges. We (somewhat predictably) call these graphs weighted graphs. When edge weights exist, we will depict them by writing the value of the weight next to the edge to which it corresponds. Each labeled edge weight should be unambiguous as to which edge it belongs to.
For example, consider the following code. Assume the addEdge() method has been modified to take in int weight as a third parameter.
|
|
|
|
This can be diagrammed:
Important Notes
-
For linked list node diagrams, arrows should start from within a left (for
prev) or right (fornext) section of a node and end at the boundary of another node. -
For tree node diagrams, arrows should start from the bottom half of the circle of a parent node and end at the top of the circle of the child node. In fixed-arity (e.g., binary, ternary) trees, the order and placement of children matter.
-
For graph node diagrams, directed edge arrows should start from the boundary of the source vertex and end at the boundary of the destination vertex.
Main Takeaways:
Object Diagrams
- Primitive types are diagrammed in boxes, while reference types are diagrammed in rounded rectangles in the heap.
- Arrows for reference types should always begin within a variable box and end at the edge of an object.
- Labeled array indices always go on one side of a boundary, never on a boundary line.
- Moving boundary lines should be represented with arrows showing the direction of motion.
- As with objects in memory diagrams, nodes in linked list node diagrams should be rounded. Tree nodes and graph vertices should be circles.