Thanks! That got me a lot of the way. You were correct, miles of garbage. I started thinking about cleaning it up, and that got me thinking about how the text in the PDF is probably technically easier to grab (less source metadata). Searching around I found a really neat project:
Apache PDFBox® - A Java PDF Library
The Apache PDFBox® library is an open source Java tool for working with PDF documents. This project allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. Apache PDFBox also includes several command-line utilities. Apache PDFBox is published under the Apache License v2.0.
And then Apache PDFBox | Command-Line Tools
java -jar pdfbox-app-2.y.z.jar ExtractText [OPTIONS] <inputfile> [Text file]
That basically did what I wanted. Thanks again, @draloff, you reminded me this was easier than I thought it would be.