talkgroup

Need to grab text from InDesign file (.indd)

I have an InDesign file I need to pull text from to format for a website. I thought Inkscape might figure it out, but no go. I don’t need editing abilities, just a way to grab text without selecting across columns, as these things do.

So, which free software opens .indd files? :slight_smile:

I have access to Fedora and Ubuntu.

Try something like strings <your-file-name>.indd | less and search for the start of the readable text you’re looking for? You’re going to get nine miles of garbage before and after but the text itself should be visible in the document

3 Likes

Thanks! That got me a lot of the way. You were correct, miles of garbage. I started thinking about cleaning it up, and that got me thinking about how the text in the PDF is probably technically easier to grab (less source metadata). Searching around I found a really neat project:

https://pdfbox.apache.org

Apache PDFBox® - A Java PDF Library

The Apache PDFBox® library is an open source Java tool for working with PDF documents. This project allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. Apache PDFBox also includes several command-line utilities. Apache PDFBox is published under the Apache License v2.0.

And then Apache PDFBox | Command-Line Tools

Usage: java -jar pdfbox-app-2.y.z.jar ExtractText [OPTIONS] <inputfile> [Text file]

That basically did what I wanted. Thanks again, @draloff, you reminded me this was easier than I thought it would be. :slight_smile:

Oh good! I’m glad. PDFs are even easier to get at since they’re not even binary, just plain text PostScript formatting commands and the content. So it’s pretty easy to use less if you just want to take a peek at them. Good ol’ Apache for maintaining such a useful little tool tho. I’m book marking that for later use :slight_smile:

2 Likes