Next / Previous / Contents / Shipman's homepage

8. Manifest constants

deduper
# - - - - -   M a n i f e s t   c o n s t a n t s

Because Python has no constants per se, we use the convention that manifest constants have names in ALL_CAPS with “_” as the word separator.

When we hash a file, we like to use big blocks to minimize I/O overhead. Here is the block size we use.

deduper
# [ BLOCK_SIZE  :=  how much of a file to read at once while hashing ]
BLOCK_SIZE = 1000*1000

These next constants define the external forms of the command line options and the corresponding attribute in the argparse.Namespace instance that results from command line processing.

deduper
# [ SIZE_LONG_ARG  :=  long name for the minimum-size command line
#                      option
#    SIZE_SHORT_ARG  :=  short name for that option
#    SIZE_ATTR  :=  argparse.Namespace attribute name for that
#                   option
#    SIZE_DEFAULT  :=  default value for that option ]
SIZE_LONG_ARG = '--size'
SIZE_SHORT_ARG = '-s'
SIZE_ATTR = 'size'
SIZE_DEFAULT = 100*1000   # I.e., 100k

Next, define the default value for the DIR command line argument, and its attribute name.

deduper
# [ DIR_ATTR  :=  argparse.Namespace attribute name for that
#                 argument
#   DIR_METAVAR  :=  external name of the path argument
#   DEFAULT_DIR  :=  default path argument ]
DIR_ATTR = 'dir'
DIR_METAVAR = 'DIR'
DIR_DEFAULT = '.'

Next is a table of the suffixes used to specify sizes in the conventional form, e.g., “2.5M” for two and a half megabytes. The keys in this table must match the regular expression that checks the argument, which allows a floating point number optionally followed by those same codes, only specified in a case-insensitive way. A matching regular expression will have groups named NUMBER_GROUP and SUFFIX_GROUP containing the numeric and (possibly empty) suffix, respectively. Note that this regex will not match a number like “.0” that has no digits before the decimal.

deduper
# [ SIZE_CODE_MAP  :=  a dict whose keys are the uppercased suffix codes
#       for size multipliers, and each related value is the multipler
#       as an int
#   SIZE_RE  :=  a compiled regular expression that matches a float
#       optionally followed by those same codes, case-insensitive,
#       and returns a re.MatchObject with the float in a group named
#       NUMBER_GROUP and the suffix in a group named SUFFIX_GROUP ]
SIZE_CODE_MAP = {
    '':  1,
    'K': 1000,
    'M': 1000*1000,
    'G': 1000*1000*1000,
 }
NUMBER_GROUP = 'n'
SUFFIX_GROUP = 's'
SIZE_RE = re.compile(
    r'(?P<{n}>'   # Start NUMBER_GROUP
      r'\d+'              # Matches one or more digits.
      r'('                # Start optional fraction subgroup
        r'\.'               # Matches a decimal point
        r'\d*'              # Matches zero or more digits
      r')?'               # End fraction subgroup
    r')'                # End NUMBER_GROUP
    r'(?P<{s}>'   # Start SUFFIX_GROUP
      r'[kKmMgG]?'        # Matches an optional suffix code
    r')'                # End SUFFIX_GROUP
    r'$'                # Insure that the entire string matches
.format(n=NUMBER_GROUP, s=SUFFIX_GROUP))