Linearize the dependencies. Or the right the sequence to take courses. DFS (the highest finish time should go first).
BFS (Khan's Algorithm)
I personally prefer to think in terms of this approach because people naturally think will perform the topological sort manually this way. That might be partly the reason why this is the first algorithm (1962) for topo sort.
add vertices with 0 incoming edge to queue Q
while !Q.isEmpty
current = Q.pop, add current to the end of list
for each neighbor of current
delete edge current → neighbor
if neighbor has 0 incoming edge, add neighbor to queue
BFS Code
We don't want to modify the original graph (aka deleting edges). So instead we count the for each node.
private Map<DirectedGraphNode, Integer> countIndegree(ArrayList<DirectedGraphNode> graph) {
Map<DirectedGraphNode, Integer> indegree = new HashMap<>();
for (DirectedGraphNode v : graph)
indegree.put(v, 0);
for (DirectedGraphNode v : graph)
for (DirectedGraphNode neighbor : v.neighbors)
indegree.put(neighbor, indegree.get(neighbor) + 1);
return indegree;
}
Another way is to define queue as an ArrayList in the beginning. Then for (int i = 0; i < queue.size(); i++) to get the ith element in queue as current. This way we don't have to add current to topoOrder every time.
private void topoSort(ArrayList<DirectedGraphNode> graph,
Map<DirectedGraphNode, Integer> indegree, ArrayList<DirectedGraphNode> topoOrder) {
Queue<DirectedGraphNode> queue = new LinkedList<>();
// add vertices with no incoming edge to queue
for (DirectedGraphNode v : graph)
if (indegree.get(v) == 0)
queue.offer(v);
// standard bfs
while (!queue.isEmpty()) {
DirectedGraphNode current = queue.poll();
topoOrder.add(current);
// for each neighbor delete edge
for (DirectedGraphNode neighbor : current.neighbors) {
indegree.put(neighbor, indegree.get(neighbor) - 1);
// if v has no incoming edge, add v to queue
if (indegree.get(neighbor) == 0)
queue.offer(neighbor);
}
}
}
Follow Up #1
What if there is no guarantee that the graph contains a valid topological order? So in this case, the graph is no longer acyclic. Will code above run into an infinite loop? NO. That's why we count indegree for each node. When a cycle exists, the starting node in that cycle will have an indegree greater than 0 and therefre will not be added to the queue.
Before we return the result, we can add a simple check if (topoOrder.size() == graph.size()). If yes then we done, if no then there is no topological ordering in this graph.
Follow Up #2
How about finding all possible topological orderings? (ex. all possible course sequences to finish a degree)
Find all orderings/permutations/combinations, we need to use backtracking. We want to backtrack when we delete edges. To try all possibilities.
DFS
DFS on the root. The topo sort has the leaf nodes start first. So we have to reverse the finish time.
I wouldn't say this approach is counter intuitive but we don't do sort dependenties this way manually. This algorithm was introduced at a later time (1976).
for each unvisited vertex u
DFS(u)
for each neighbor v of u
if v is unvisited, DFS(v)
else skip v;
finish DFS(u), add u to the back of list
reverse list
DFS Code
public ArrayList<DirectedGraphNode> topSort(ArrayList<DirectedGraphNode> graph) {
ArrayList<DirectedGraphNode> topoOrder = new ArrayList<>();
Set<DirectedGraphNode> visited = new HashSet<>();
for (DirectedGraphNode v : graph)
if (!visited.contains(v))
dfs(v, visited, topoOrder);
Collections.reverse(topoOrder);
return topoOrder;
}
private void dfs(DirectedGraphNode current, Set<DirectedGraphNode> visited,
ArrayList<DirectedGraphNode> topoOrder) {
// previsit
visited.add(current);
// dfs
for (DirectedGraphNode neighbor : current.neighbors)
if (!visited.contains(neighbor))
dfs(neighbor, visited, topoOrder);
// postvisit
topoOrder.add(current);
}
Summary
In most graph questions, we can use either BFS or DFS. BFS is easier to think and maybe longer to write. While DFS is shorter to write, the thought process behind it can be lengthy. Also, the size of the system call stack is something we need to keep in mind. Unlike using a queue (in the heap space), the system stack is limited on how the JVM allocates it. Stack overflow can happen if the call stack is too deep.
Reference
StackOverflow - for checking StackOverflowException