Thanks to Richard Polt, who finagled me a copy of Dirk Schumann’s “Patentbase” DvD, the two major chunks of research conducted by Schumann in the 1990’s are now reunited (and it feels so good!)
The original Patentbase was put together by Schumann by digging though the old US Patent Office website and hand-compiling an Excel spreadsheet of every typewriter-related US Patent he could find, linked to directories of TIFF images of the patents themselves. An incredible amount of work compiled out to about 4gb with the 51,000+ images, that’s largely been rendered obsolete by Google Patents, where you can find the images online, horribly OCR’d that they are.
The really important thing that remains of Dirk’s patentbase is the fact that he’d singled out the typewriter-related patents. As you might know from searching Google Patents, it’s not really easy to find particular kinds of patents because the OCR is so terrible that text-matching is largely hit or miss. Thus, I saw value in scraping the data from the Patentbase and leveraging it with Google Patents to create a new Patentbase, linked to the Google Patent documents. (thanks to Richard for that idea, I was at first trying to figure out how to convert 51,000 TIFFs into PNGs, which was dumb)