Word Ladder

Intro

String editing can also be viewed as an implicit graph where the vertex is a word and the edge is the letter transformation.

Word Ladder I (LC.127)

Given a startWord, an endWord, and a wordList, find the length of the shortest transformation path from start to end.

Constraints:

  1. Only one letter can be changed at a time

  2. each transformation must be a word in the list

Return 0 if no possible transformation path. All words have the same length. Lower case letters only.

Example

I:

startWord = "hit"
endWord = "cog"
wordList = ["hot","dot","dog","lot","log","cog"]

draw the word graph

O: 

5 because hit -> hot -> dot -> dog -> cog or hit -> hot -> lot -> log -> cog 

I: 

beginWord = "hit"
endWord = "cog" 
wordList = ["hot","dot","dog","lot","log"]

O: 

0 because endWord does not exit in the dictionary 

Level BFS

shortest transformation path = shortest path in a word graph

we have to construct this graph and search at the same time

stop if we visited every possible node

Code

public int ladderLength(String startWord, 
                        String endWord, 
                        List<String> wordList) {
    if (startWord.equals(endWord)) return 0;
    // preprogess
    Set<String> wordSet = new HashSet<>();
    for (String word : wordList) {
        wordSet.add(word);
    }    
    // init queue
    Queue<String> queue = new LinkedList<>();
    Set<String> visited = new HashSet<>();
    queue.offer(startWord);
    visited.add(startWord);
    // level bfs
    int levels = 1;
    String current;
    while (!queue.isEmpty()) {
        int queueSize = queue.size();
        for (int i = 0; i < queueSize; i++) {
            current = queue.poll();
            // search all neighbors
            for (String neighbor : getNeighbors(current, wordSet)) {
                if (visited.contains(neighbor)) {
                    continue;
                }
                if (neighbor.equals(endWord)) {
                    return levels + 1;
                }
                queue.offer(neighbor);
                visited.add(neighbor);
            }
        }
        levels++;
    }
    return 0; // explored the whole graph but cannot reach end goal
}

private List<String> getNeighbors(String currentWord, Set<String> wordSet) {
    List<String> neighbors = new ArrayList<>();
    for (int i = 0; i < currentWord.length(); i++) {
        for (int l = 'a'; l < 'z'; l++) {
            // if same letter continue;
            char[] wordArr = currentWord.toCharArray();
            wordArr[i] = (char) l;
            String nextWord = new String(wordArr);
            if (wordSet.contains(nextWord)) {
                neighbors.add(nextWord);
            }
        }
    }
    return neighbors;
}
from collections import deque 

# string.ascii_lowercase
LOWER_CASE_LETTERS = [
    'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 
    'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'
]

class Solution:
    
    def _next_word(self, current_word: str) -> Iterable[str]:
        for i in range(len(current_word)):
            for l in LOWER_CASE_LETTERS:
                yield current_word[:i] + l + current_word[i+1:]

    def ladderLength(self, beginWord: str, endWord: str, wordList: List[str]) -> int:
        
        if beginWord == endWord:
            return 1
        
        word_set = set(wordList)
        
        if endWord not in word_set:
            return 0
        
        # init bfs 
        
        queue: Deque[str] = deque()
        visited: Set[str] = set()
        
        queue.append(beginWord)
        visited.add(beginWord)
        
        depth = 0
        
        # start searching and creating the word graph 
        
        while len(queue) > 0:
            
            breadth = len(queue)

            for i in range(breadth):
                
                # visit current word 
                current_word = queue.popleft()
                
                if current_word == endWord:
                    return depth + 1

                # expanding current node to adjacent nodes   
                # check visited in pre visit
                for next_word in self._next_word(current_word):
                    if next_word not in word_set:
                        continue
                    if next_word in visited:
                        continue
                    queue.append(next_word)
                    visited.add(next_word) 
            
            depth += 1
            
        return 0

Optimized BFS

The time complexity of a raw BFS is O(V+E)O(V+E). In this case, the number vertices are the valid words in the word list and the number edges are the brute force computed substitution for each letter for each word. The end result is something like O(n+n×26×l)O(n + n \times 26 \times l )where nn is the length of the word list and llis the longest word length.

The idea to optimize is to shrink the implicit graph that we are constructing. One way to do it to precompute a dictionary graph with only valid edges in the beginning (i.e. "bot" will not be an edge to visit only "hot", "dot", "lot" are edges). This will shrink the brute force factor 26 significantly with the cost of a little more memory.

Precomputing is alway a good idea when the dictionary size is small in a search problem.

The dictionary graph looks like this for the first example.

dict_graph: defaultdict(<class 'list'>, {
    '*ot': ['hot', 'dot', 'lot'], 
    'h*t': ['hot'], 
    'ho*': ['hot'], 'd*t': ['dot'], 
    'do*': ['dot', 'dog'], 
    '*og': ['dog', 'log', 'cog'], 
    'd*g': ['dog'], 
    'l*t': ['lot'], 
    'lo*': ['lot', 'log'], 
    'l*g': ['log'], 
    'c*g': ['cog'], 
    'co*': ['cog']
})

Optimized Code


from collections import defaultdict, deque 

class Solution:
                
    def _next_key(self, current_word: str) -> Iterable[str]:
        for i in range(len(current_word)):
            yield current_word[:i] + '*' + current_word[i+1:]
            
    def _construct_dict_graph(self, word_list: List[str]) -> Dict[str, List[str]]:
        
        dict_graph: Dict[str, List[str]] = defaultdict(list)
        
        for word in word_list:
            for search_key in self._next_key(word):
                dict_graph[search_key].append(word)
        
        return dict_graph
    
    
    def ladderLength(self, beginWord: str, endWord: str, wordList: List[str]) -> int:
        
        if beginWord == endWord:
            return 1
        
        word_set = set(wordList)
        
        if endWord not in word_set:
            return 0
        
        dict_graph = self._construct_dict_graph(wordList)
        
        # init bfs 
        
        queue: Deque[str] = deque()
        visited: Set[str] = set()
        
        queue.append(beginWord)
        visited.add(beginWord)
        
        depth = 0
        
        # start searching and creating the word graph 
        
        while len(queue) > 0:
            
            breadth = len(queue)

            for i in range(breadth):
                
                # visit current word 
                current_word = queue.popleft()
                
                if current_word == endWord:
                    return depth + 1

                # pre visit adjacent nodes  
                # check visited in pre visit 
                for search_key in self._next_key(current_word):
                    for next_word in dict_graph[search_key]:
                        if next_word in visited:
                            continue
                        queue.append(next_word)
                        visited.add(next_word)

            depth += 1
            
        return 0
        

Approach

Time

Space

Brute force BFS

516ms

15.3mb

Optimized BFS

176ms

17.5mb

Bidirectional BFS

A bidirectional search is a lot more efficient when start and end are well defined. One big triangle to two much smaller triangles. The less than / less or equal than condition always worth going through an example by hand.

TODO: Write brute force bidirectional BFS then refactor it

def ladderLength(self, beginWord: str, endWord: str, wordList: List[str]) -> int:

    if beginWord == endWord:
        return 1

    word_set = set(wordList)

    if endWord not in word_set:
        return 0

    dict_graph = self._construct_dict_graph(wordList)

    # init bidirectional bfs

    q1: Deque[str] = deque()
    visited_1: Set[str] = set()

    q1.append(beginWord)
    visited_1.add(beginWord)

    q2: Deque[str] = deque()
    visited_2: Set[str] = set()

    q2.append(endWord)
    visited_2.add(endWord)

    depth = 0

    while len(q1) > 0 and len(q2) > 0:

        if len(q1) <= len(q2):
            q = q1
            visited = visited_1
            other_visited = visited_2
            target = endWord
        else:
            q = q2
            visited = visited_2
            other_visited = visited_1
            target = beginWord

        breadth = len(q)

        for i in range(breadth):

            current_word = q.popleft()

            if current_word in other_visited:
                return depth + 1

            if current_word == target:
                return depth + 1

            for search_key in self._next_key(current_word):
                for next_word in dict_graph[search_key]:
                    if next_word in visited:
                        continue
                    q.append(next_word)
                    visited.add(next_word)
        depth += 1

    return 0

TODO: Figure out a way to write bidirectional BFS in a concurrent way. It should be possible as long as maintain the visited set as a dictionary that maps word to depth. When one go routine finds the words in another dictionary, it returns that? How does the other go routine terminates?

Time Complexity

The worst case for a bidirectional BFS happens when the start and end cannot meet somewhere in the middle. The search graph and the time complexity then are the same as a regular BFS. But on average, the performance for a bidirectional BFS is a lot better. The search depth will be reduced by a factor 2. That is a huge optimization because the number of nodes will be reduced exponentially.

Approach

Time

Space

Optimized BFS

176ms

17.5mb

Bidirectional BFS

108ms

17.6mb

Reading

Last updated