Next / Previous / Contents / Shipman's homepage

9. Main program

Here is the main and its intended function.

deduper
# - - - - -   m a i n

def main():
    """Main.

      [ if (the command line is valid) and
        (the effective directory specified by the command line
        exists) ->
          sys.stdout  +:=  report of sets of two or more files in
              or under that directory that have the same hash
          sys.stderr  +:=  report of files in or under that
              directory that are unreadable
        else ->
          sys.stderr  +:=  error message ]
    """

The first step is to check the command line arguments and digest them into an argparse.Namespace instance. See Section 10, “checkArgs(): Process the command line arguments”.

deduper
    #-- 1
    # [ if the command line arguments are valid ->
    #     baseDir  :=  the effective value of the DIR argument
    #     minSize  :=  the effective value of the SIZE argument
    #   else ->
    #     sys.stderr  +:=  error message
    #     stop execution ]
    args = checkArgs()
    baseDir = getattr(args, DIR_ATTR)
    minSize = getattr(args, SIZE_ATTR)

Next we construct the database; see Section 17, “class FileData: The database”.

deduper
    #-- 2
    # [ fileData  :=  a new FileData instance representing an empty
    #                 database with minimum size (minSize) ]
    fileData = FileData(minSize)

Python's os.path.walk() function takes care of recursively visiting all subdirectories. It needs three arguments:

  1. The starting directory.

  2. A “visitor function” that will be called to process each directory, including the starting directory: see Section 12, “visitor(): Process one directory's contents”.

  3. An arbitrary state item that will be passed to the visitor function so that it can accumulate whatever data the visitor function is collecting. In this case, the FileData instance accumulates rows representing qualifying files.

deduper
    #-- 3
    # [ fileData  +:=  rows representing files no smaller than
    #       fileData.minSize that are located in or under baseDir
    #   sys.stderr  +:=  error messages for unreadable files
    #       in or under baseDir ]
    os.path.walk(baseDir, visitor, fileData)

For the report generation logic, see Section 15, “report(): Generate the report”.

deduper
    #-- 4
    # [ sys.stdout  +:=  report of sets of two or more rows in fileData
    #       that have the same hash, ordered by the lowest path in
    #       each such set ]
    report(fileData)