Next / Previous / Contents / TCC Help System / NM Tech homepage

4. Design of the script

Before examining the code, here is an overview of the design.

4.1. Definitions of terms

cell

One of the locations where a letter will be placed in the final puzzle. The blank spaces within the puzzle are not considered cells.

slot

A slot is a sequence of two or more contiguous cells that will be occupied by a word in the final puzzle.

4.2. Data structures

Here is an entity-relationship diagram illustrating the relationships between puzzles, cells, and slots.

We represent the puzzle internally using instances of four corresponding classes.

  • An instance of class Puzzle represents the entire puzzle, and contains the primary logic for solving it. It is a container for a collection of multiple cells, slots, and words.

  • A Cell instance represents the location of one cell of the puzzle. A given cell may be related to two slots (if it is the intersection between a horizontal slot and a vertical slot), or it may be related to only one slot.

  • A Slot instance represents the location of one word slot in the puzzle.

  • A Word instance represents one of the words that fits in some slot.

To describe the geometry of the puzzle, we will use row and column indexes. An instance of helper class Coord represents one location in the puzzle as a row index and a column index, both counting from zero. The upper left corner of the puzzle is row 0, column 0.

For various reasons, we'll define a complete ordering relation on Coord instances. This ordering is the usual scan order for European languages: starting in the upper left corner, across the first row, then left to right in the second row, and so on.

We also define a complete ordering for Slot instances. Slots are ordered according to the row and column coordinates of their first cell, with their orientation as a secondary key: if two slots share the same first cell, the horizontal slot is considered before the vertical slot. We define numbers for the two orientations: 0 for horizontal, 1 for vertical.

4.3. Overview of the solution algorithm

Note in the entity-relationship diagram in Section 4.2, “Data structures” that each word is shown as being related to exactly one slot, and vice versa. This is true for the puzzle in a solved state. The problem we have to solve is how to get the puzzle into the solved state—or states, since a poorly-constructed puzzle may have multiple solutions.

Probably the simplest algorithm to implement would be a brute force recursive algorithm. Once we have defined an order for the slots, we would try every word that fits in the first slot, and for each possible choice, recursively call a solver that tries all the options for the next slot, and so forth until we either reach a contradiction (where there are no words that will fit in some slot), or reach a solution state.

For an example of a similar puzzle solver using this technique, see A sudoku solver. For some of the hardest sudoku puzzles, the solution on relatively modern processors may take several minutes.

Although this is not an impractical interval, the author wished to try for major efficiency gains by reducing the number of candidates for each slot before starting the recursive process. For almost all published puzzles, a technique of cyclic reduction (described below) is sufficient to solve them, but the author has encountered a few expert-level puzzles that cyclic reduction would not solve the puzzle.

The next three sections describe three phases of solution. The goal of the program is to build a data structure representing a solution, that is, it shows only one word for each slot.

4.4. Initialization

The purpose of the initialization phase is to assign to each slot in the puzzle a list of all the possible words that may be used to fill it.

Ignoring for the moment any initial clues provided with the puzzle, the primary criterion for selecting words is their length. So the first step is to associate all three-letter words with each three-letter slot; all four-letter words with each four-letter slot; and so forth.

As for initial clues, there may be none—we may get only an empty framework. However, in general, the initial puzzle may show any number of cells filled in. The most common case is that one slot will have its entire word filled in, but the author has seen puzzles with more than one initial word placed, and also puzzles that had only a sprinkling of single letters here and there.

Conceptually, then, in the general case the initial clues can be represented as a set of cells that have their initial letters filled in. These initial clue cells can be used to eliminate some of the possibilities for one or both slots that pass through that cell.

4.5. Cyclic reduction

Once we have a tentative set of the word choices for each slot, we can reduce the number of choices for each slot by three processes of elimination:

  • Intersection test: where two slots intersect, they must have the same letter in that position.

    So, if we look at each cell that is an intersection between two slots, clearly the letter at that cell must be one that is available in both the choices for the horizontal slot and the choices for the vertical slot.

  • Unique-word test: if there is only one word that fits in a given slot, that word may not be used in any other slot.

  • Unique-slot test: if there is only one slot that has a particular word in its set of choices, that slot must contain that word.

The “cyclic reduction” phase of solution, then, consists of walking through all the intersection cells and trimming the lists of choices for slots by these three mechanisms.

Each time we eliminate some choices from a slots, there is a ripple effect—changes propagate to other slots, allowing us to eliminate more choices, which propagates to other slots, and so on.

Therefore, whatever we do to eliminate choices, we must repeat this process until we either solve the puzzle, or until another pass through the puzzle does not eliminate any more choices.

If we reach a solution, all we need to do is display the current of the puzzle and terminate execution.

If instead we reach a point where some slots still have multiple words that may fit, and cyclic reduction does not eliminate any choices, then we'll have to proceed to Section 4.6, “The recursive algorithm”.

One design question remains: how do we sequence the three kinds of tests that we use to eliminate choices? For example, while we are reducing choices with the intersection test, whenever the count of the word choices for a given slot goes down to one, we can immediately remove that word from all other slots. However, these further removals will affect other intersection tests, and so on, with the potential for changes that ripple through the entire puzzle.

Just to keep things conceptually simple, the approach will be to define a reduction cycle that applies all three tests at least once to every component of the puzzle. This cycle will be repeated until we either solve the puzzle or stop making progress.

Here's an outline of the reduction cycle:

  1. Step through all the cells in the puzzle. For every cell that is related to both a horizontal and vertical slot, reduce the choices for both slots by applying the intersection test.

  2. Step through every slot in the puzzle. If any slot has only one choice, remove that choice from all other slots: this is the unique-word test.

  3. Step through every word in the puzzle. If any word is a choice for only one slot, remove all other choices from that slot, and remove that choice from all other slots.

Note that this steps need to know three things: what slots intersect a given cell; what word choices remain for a given slot; and which slots have a given word as a choice. See Section 4.7, “The WordBank class” for the data structures that support these features.

4.6. The recursive algorithm

As discussed in Section 4.3, “Overview of the solution algorithm”, recursive solution relies on organizing the set of slots into an ordered set. Once we know which slot is considered the first slot, we try all the different choices for the first slot, then try all the different choices for the second slot, and so forth until we either reach a solution, or we reach a contradiction—where there are no remaining candidates for a given slot.

The recursive method for a given slot assumes that all previous slots have been solved (that is, that each has exactly one word choice remaining). The recursive method is a generator: it generates all the solutions that start at the current state of the puzzle, if there are any solutions based on the current state.

The basis case for recursion is when we reach the last slot and there is only word that fits that slot. This means that the current state is a solution to the puzzle, so the method yields self, which is yielded back through all the recursive layers and finally yielded by the first-level method of the recursive algorithm.

If we reach a slot and there are no choices left for the slot, the recursive method raises a StopIteration exception to signify that there are no solutions based on the current state.

The general case is when the current slot has multiple choices remaining. This case is easy to describe in general: we try each of the remaining choices for the current slot, and for each choice, try recursively to solve the remaining slots.

This is tricky because we may have to backtrack some of the actions that occur during recursive solution.

For example, suppose we only have three unsolved slots: call them A, B, and C, in order.

So we pick one of the words that is still a choice for slot A, and recur until we reach the recursion for slot B. At that point we pick the first choice for slot B and recur. But then suppose later we find that with those choices for slots A and B, none of the choices for slot C work anymore.

At that point, the choices that we removed from slot C must be put back. This is where the backtracking comes in.

So the backtracking is handled in the recursive algorithm in this way. At this point, we are assuming that the current slot is not the last slot, and that there are two or more choices for that slot.

  1. Remember the current set of choices for the slot.

  2. Temporarily remove all those choices from the slot.

  3. For each of the original choices:

    1. Reinstate that choice for this slot.

    2. Recursively call the solver method starting at the following slot.

    3. Remove that same choice from the slot's choices.

  4. Reinstate all the original choices for this slot.

4.7. The WordBank class

The algorithms described in the previous sections need two subtle features:

Accordingly, we define a class WordBank that contains these mechanisms.

  • An instance of this class is primarily a container for the word list.

  • This instance is the logical place to hold the current list of word choices for each slot. We can keep a running count of the total number of choices for the entire puzzle. When this count is exactly the same as the number of slots, we know that there is only one choice per slot, hence the puzzle is solved. The obvious data structure here is a dictionary whose keys are slot instances, and each related value is a list of the word choices for that slot.

    Furthermore, the cyclic reduction algorithm can record the value of this running count before each pass through the set of slots, and then check it again after the pass. If the value has not changed, we know we have reached the point where we are no longer eliminating choices.

  • Building and reducing the list of choices for each slot will be done by calling methods on this instance. Method .mayBe() tells the instance that a given word may be the solution to a given slot; method .isNot() tells the instance that a given word is not the solution to a given slot.

  • Applying the unique-slot test (see Section 4.5, “Cyclic reduction”) means that we need to know the set of slots that include a given word in their choices. This will be implemented by a dictionary whose keys are the words in the puzzle, and each corresponding value is a list of the slots for which this word is still one of the choices.