## 3. Design overview

The author has been tinkering with automated maze generators for many decades, but all his previous efforts tended to produce mazes that were too easy; that is, the paths from entrance to exit were too short. The image at the top of this document is an example of a more satisfying maze: the solution path visits over half the cells of the maze, and gets into all four corners.

The current effort implements an idea that the author believes is original. There may be others who have invented this technique, but the author is not aware of any.

The algorithm as implemented builds on a well-known technique, Kruskal's algorithm. We will first review the existing algorithm, then discuss the improvement.

### 3.1. Kruskal's algorithm

The Wikipedia article on maze generation is a useful survey of some of the commonly used algorithms. The “randomized Kruskal's algorithm” is easily implemented, and produces a maze with no loops. As it is described in the reference article:

1. Create a list of all walls, and create a set for each cell, each containing just that one cell.

2. Visit each wall in some random order. For each wall that adjoins two cells that are in different sets, remove that wall and merge the sets of the adjoining cells into one union set.

To translate this to an object-oriented design, we will need three classes: `Maze` as a container for the entire puzzle, `Cell` to represent a cell, and `Wall` to represent a wall. Instances of the `Wall` class will have an attribute `.solid` that is initially `True`, meaning that the wall has not been removed. Kruskal's algorithm will change the `.solid` attribute of the removed walls to `False`, and the code that renders the image will not draw those walls.

To simplify life for the solver of the maze, we will construct a rectangular maze with the start cell at a random location on the left-hand wall and the goal at a random location on the right-hand wall.

The constructor for the `Maze` class will initially place all solid walls, including the exterior walls, and make them all solid.

### 3.2. Improving the solution path

In the author's opinion, the better mazes are the ones with the longer solution paths. The more circuitous the route, the more challenging.

The author's original idea was to make maze generation a two-stage process.

1. Generate a random route from the start cell to the goal cell, and break just the walls on this route using Kruskal's algorithm. After this step, all the cells on the solution route will be in the same set, and all other cells will still be in their original, singleton sets.

The algorithm for this step should often produce routes that are significantly longer than the shortest route from start to goal. For the details, see Section 3.3, “Generating the solution route”.

2. Run Kruskal's algorithm to break other walls randomly until all cells are in the same set.

### 3.3. Generating the solution route

The geometry of the maze is straightforward. We define cell positions as tuple (row, column) where horizontal rows are numbered starting at 0 with the top row, and vertical columns are numbered starting at 0 with the left-hand row.

In Python terms, the solution route is represented as an instance of class `Path` containing a sequence of (row, column) tuples. There are only four legal moves from a given cell: up, down, left, and right.

The author's first attempt at a route generation algorithm was flawed. It sometimes produced satisfyingly long routes, but in other cases the compute time was prohibitive. In practical terms, it looked like it was in a hard loop.

The flawed algorithm:

1. If the last cell of the path is the goal cell, we are done.

2. To extend the path, we need to add a new cell that is an immediate neighbor of the last cell in the path. If the last cell has any neighbors that are not already part of the path, add to the path one of those neighbors, randomly chosen, and recursively extend the path until we either reach the goal or reach a cell that has no neighbors not already part of the path.

The problem with this approach is that when there is no route from the end of the current path to the goal cell, the program can waste astronomical amounts of time backtracking, especially when there are thousands of cells.

The solution to this problem is to add one rule to the above algorithm: never add a new cell to the path unless there is a route from that new cell to the goal cell that does not use any cell already in the path.

The serendipitous result of this improvement is that the path often wanders into nearly closed areas of the maze, but never enters a closed area unless there is a way to get out of it. As currently constituted, the program generates 100×100 mazes in well under a minute.