Next / Previous / Contents / Shipman's homepage

6.25. GENUS_SPECIES_RE

A regular expression that matches the scientific name on a standard forms line. It will match either “Genus species” or “Genus (Subgenus) species”.

We reuse the rank codes of two of these ranks (Section 6.10, “GENUS_CODE; and Section 6.12, “SPECIES_CODE as the regular expression group names. We can't use SUBGENUS_CODE for a group name because it contains a hyphen, which is not allowed as a group name in the re package; so we define a constant SUBGENUS_FIELD for that group.

When extracting matched strings from a Match object M, keep in mind that the subgenus group is optional, so that M.group(SUBGENUS_CODE) will return the value None if the subgenus is omitted.

nomcompile3
SUBGENUS_FIELD = "sg"
GENUS_SPECIES_RE = re.compile (
    r'(?P<%s>'          # Start group GENUS_CODE
      r'[A-Z]'            # Matches a capital letter
      r'[a-z]+'           # Matches one or more lowercase letters
    r')'                # End group GENUS_CODE
    r'\s+'              # Matches one or more spaces
    r'('                # Start optional group
      r'\('               # Matches '('
      r'(?P<%s>'          # Start group SUBGENUS_FIELD
        r'[A-Z]'            # Matches a capital letter
        r'[a-z]+'           # Matches one or more lowercase letters
      r')'                # End group SUBGENUS_FIELD
      r'\)'               # Matches ')'
      r'\s+'              # Matches one or more spaces
    r')?'               # End optional group
    r'(?P<%s>'          # Start group SPECIES_CODE
      r'[a-z]+'           # Matches one or more lowercase letters
    r')'                # End group SPECIES_CODE
    % (GENUS_CODE, SUBGENUS_FIELD, SPECIES_CODE) )