Next / Previous / Contents / Shipman's homepage

7. Source code for bigfiles.py

The script starts out by making up a list of directory trees to be visited. If there are any command line arguments, these arguments make up the list of directories. If there are no arguments, we default to a list containing one entry, ".".

The following steps are done for each directory in the list:

  1. Look at every file in and under that directory. Use PathInfo to take a snapshot of that file, and build a list of these PathInfo instances. This process of “walking the directory tree” is easy because the os.path module has a function called walk() that handles the process of visiting every directory in the tree. For documentation on this function, see the Python Library Reference.

  2. Sort this list in descending order by the size attribute.

    Python makes it easy to sort a list: all list objects have a .sort() method that sorts the list in place. But how do we get a list of PathInfo objects to sort in descending order by size? The .sort() method can take as an optional argument a function that compares two objects. However, a more Pythonic way to do it is to define a new class that inherits from the PathInfo class called BigInfo. In that class we define a special method named .__cmp__() that tells Python how to order two objects of that class.

  3. Print a heading for this section of the report, then go through the sorted list and print one line for each PathInfo instance in that list.

One of the goals of object-oriented programming is to minimize, if not eliminate, the use of global variables. An earlier version of this program used a global variable to hold the list of PathInfo objects. A better way is to define a class that holds this list. The methods in the class have access to the list, but no code outside the class needs such access. This is in accord with the generally accepted software design principle of “information hiding”: we will feed the class constructor the name of a directory, and it will return an object that has everything we need to generate the final report.

We'll call this class BigReport, because it represents a report on big files. With this design, the overall program flow for each directory becomes:

  1. Instantiate a BigReport object. Pass the name of the directory to its constructor.

  2. The BigReport object has a method named .genFiles() that generates the lines of the report.