The script starts out by making up a list of directory
trees to be visited. If there are any command line
arguments, these arguments make up the list of directories.
If there are no arguments, we default to a list containing
one entry, ".".
The following steps are done for each directory in the list:
Look at every file in and under that directory. Use
PathInfo to take a snapshot of that file, and build a list
of these PathInfo instances. This process of
“walking the directory tree” is easy
because the os.path module has
a function called walk() that
handles the process of visiting every directory in the
tree. For documentation on this function, see the
Python Library Reference.
Sort this list in descending order by the size attribute.
Python makes it easy to sort a list: all list objects
have a .sort() method that
sorts the list in place. But how do we get a list of
PathInfo objects to sort in descending order by size? The
.sort() method can take as an
optional argument a function that compares two objects.
However, a more Pythonic way to do it is to define a
new class that inherits from the PathInfo class called
BigInfo. In that class we
define a special method named .__cmp__() that tells Python how to order
two objects of that class.
Print a heading for this section of the report, then go
through the sorted list and print one line for each
PathInfo instance in that list.
One of the goals of object-oriented programming is to
minimize, if not eliminate, the use of global variables.
An earlier version of this program used a global variable
to hold the list of PathInfo objects. A better way is to
define a class that holds this list. The methods in the
class have access to the list, but no code outside the
class needs such access. This is in accord with the
generally accepted software design principle of
“information hiding”: we will feed the class
constructor the name of a directory, and it will return an
object that has everything we need to generate the final
report.
We'll call this class BigReport,
because it represents a report on big files. With this
design, the overall program flow for each directory
becomes:
Instantiate a BigReport
object. Pass the name of the directory to its
constructor.
The BigReport object has a
method named .genFiles() that
generates the lines of the report. (Generators are a
relatively new feature of Python, since version 2.2.
See the Python
language reference section on generators.)
The actual code for the bigfiles.py script starts with the Linux
“pound bang line” that makes the script
self-executing.
#!/usr/bin/env python #================================================================ # bigfiles.py: Script to show files in descending order by size. # For documentation in "literate programming" style, see: # http://www.nmt.edu/help/lang/python/examples/pathinfo/ #---------------------------------------------------------------- SCRIPT_NAME = "bigfiles.py" EXTERNAL_VERSION = "1.1"
Next we need to import a few Python modules: sys for command line arguments and standard
streams; os for numerous file- and
directory-related functions; and of course pathinfo.py.
#================================================================ # Imports #---------------------------------------------------------------- import sys, os import pathinfo