Next / Previous / Contents / Shipman's homepage

17. class FileData: The database

This class is our interface to a small SQLite database as described in Section 5, “Design”. Conceptually, it is a container class for instances of Section 17.2, “FileData.PathHash: The mapped class” representing the rows of the table. Here is its interface:

deduper
# - - - - -   c l a s s   F i l e D a t a

class FileData(object):
    '''Represents the SQLite database.

      Exports:
        FileData(minSize):
          [ minSize is a positive int ->
              return a new, empty database with minimum size
              (minSize) ]
        .minSize:         [ as passed to constructor, read-only ]
        .add(path, hash, size):
          [ (path is an absolute path name) and
            (hash is a sha256 hex digest) and
            (size is the file's size in bytes as an int) ->
              self  :=  self + (a new row made from those values) ]
        .genPaths():
          [ generate the PathHash instances in self in ascending order
            by .path ]
        .genByHash(hash):
          [ generate the PathHash instances in self for .hash=(hash)
            in ascending order by .path ]

The reason we store the minimum file size as an attribute of this class is the os.path.walk() function. The “visitor function”, which in our case is Section 12, “visitor(): Process one directory's contents”, receives as an argument whatever was passed as the third argument to os.path.walk(). Because the visitor function needs to know the minimum file size as well as have access to the FileData instance, we made the file size an attribute of FileData.

Internal to the class are: the table; the mapped class connected to that table; and the PathHash class that represents one row – a class within a class.

deduper
      State/Invariants:
        .path_table:  [ schema.Table ]
        ._engine:  [ schema.Engine instance ]
        ._Session:  [ session constructor ]
        .s:  [ an instance of self._Session ]
        .PathHash:  [ represents one row ]
          .path:  [ path name column ]
          .hash:  [ hash column ]
          .size:  [ size column ]
    '''