Topological Sort

Intro

Linearize the dependencies. Or the right the sequence to take courses. DFS (the highest finish time should go first).

BFS (Khan's Algorithm)

I personally prefer to think in terms of this approach because people naturally think will perform the topological sort manually this way. That might be partly the reason why this is the first algorithm (1962) for topo sort.

add vertices with 0 incoming edge to queue Q
while !Q.isEmpty
  current = Q.pop, add current to the end of list
  for each neighbor of current
    delete edge current → neighbor
    if neighbor has 0 incoming edge, add neighbor to queue

BFS Code

We don't want to modify the original graph (aka deleting edges). So instead we count the indegree for each node.

public ArrayList<DirectedGraphNode> topologicalSort(ArrayList<DirectedGraphNode> graph) {
    ArrayList<DirectedGraphNode> topoOrder = new ArrayList<>();
    if (graph == null || graph.size() == 0)
        return topoOrder;
    // Step 1 count indegree (incoming edges)
    Map<DirectedGraphNode, Integer> indegree = countIndegree(graph);
    // Step 2 topological sort with bfs
    topoSort(graph, indegree, topoOrder);
    return topoOrder;
}
private Map<DirectedGraphNode, Integer> countIndegree(ArrayList<DirectedGraphNode> graph) {
    Map<DirectedGraphNode, Integer> indegree = new HashMap<>();
    for (DirectedGraphNode v : graph)
        indegree.put(v, 0);
    for (DirectedGraphNode v : graph)
        for (DirectedGraphNode neighbor : v.neighbors)
            indegree.put(neighbor, indegree.get(neighbor) + 1);
    return indegree;
}

Another way is to define queue as an ArrayList in the beginning. Then for (int i = 0; i < queue.size(); i++) to get the ith element in queue as current. This way we don't have to add current to topoOrder every time.

private void topoSort(ArrayList<DirectedGraphNode> graph, 
    Map<DirectedGraphNode, Integer> indegree, ArrayList<DirectedGraphNode> topoOrder) {
    Queue<DirectedGraphNode> queue = new LinkedList<>();
    // add vertices with no incoming edge to queue
    for (DirectedGraphNode v : graph)
        if (indegree.get(v) == 0)
            queue.offer(v);
    // standard bfs
    while (!queue.isEmpty()) {
        DirectedGraphNode current = queue.poll();
        topoOrder.add(current);
        // for each neighbor delete edge
        for (DirectedGraphNode neighbor : current.neighbors) {
            indegree.put(neighbor, indegree.get(neighbor) - 1);
            // if v has no incoming edge, add v to queue
            if (indegree.get(neighbor) == 0)
                queue.offer(neighbor);
        }
    }
}

Follow Up #1

What if there is no guarantee that the graph contains a valid topological order? So in this case, the graph is no longer acyclic. Will code above run into an infinite loop? NO. That's why we count indegree for each node. When a cycle exists, the starting node in that cycle will have an indegree greater than 0 and therefre will not be added to the queue.

Before we return the result, we can add a simple check if (topoOrder.size() == graph.size()). If yes then we done, if no then there is no topological ordering in this graph.

Follow Up #2

How about finding all possible topological orderings? (ex. all possible course sequences to finish a degree)

Find all orderings/permutations/combinations, we need to use backtracking. We want to backtrack when we delete edges. To try all possibilities.

DFS

DFS on the root. The topo sort has the leaf nodes start first. So we have to reverse the finish time.

I wouldn't say this approach is counter intuitive but we don't do sort dependenties this way manually. This algorithm was introduced at a later time (1976).

for each unvisited vertex u
  DFS(u)
    for each neighbor v of u
      if v is unvisited, DFS(v)
      else skip v;
    finish DFS(u), add u to the back of list
reverse list

DFS Code

public ArrayList<DirectedGraphNode> topSort(ArrayList<DirectedGraphNode> graph) {
    ArrayList<DirectedGraphNode> topoOrder = new ArrayList<>();
    Set<DirectedGraphNode> visited = new HashSet<>();
    for (DirectedGraphNode v : graph)
        if (!visited.contains(v))
            dfs(v, visited, topoOrder);
    Collections.reverse(topoOrder);
    return topoOrder;
}

private void dfs(DirectedGraphNode current, Set<DirectedGraphNode> visited, 
    ArrayList<DirectedGraphNode> topoOrder) {
    // previsit
    visited.add(current);
    // dfs
    for (DirectedGraphNode neighbor : current.neighbors)
        if (!visited.contains(neighbor))
            dfs(neighbor, visited, topoOrder);
    // postvisit
    topoOrder.add(current);
}

Summary

In most graph questions, we can use either BFS or DFS. BFS is easier to think and maybe longer to write. While DFS is shorter to write, the thought process behind it can be lengthy. Also, the size of the system call stack is something we need to keep in mind. Unlike using a queue (in the heap space), the system stack is limited on how the JVM allocates it. Stack overflow can happen if the call stack is too deep.

Reference

Last updated