If you have a document or book manuscript, you may need to extract and attach figure references to inline images in a document. If you want to give them names associated with the chapter in which they appear, this can end up being more complicated than you’d thought it was going to be.

In this example, I have a large book manuscript with several hundred images I want to extract to Drive with names that reflect which chapter they are in and index them in a spreadsheet.

My starting chapter paragraphs are identified as DocumentApp.ParagraphHeading.HEADING1.

The approach

Using the technique described in Sorting bookmarks in a document, I can identify where all the images and all the paragraphs appear in a document. From this it can easily be deduced which chapter a given image is extracted from.

Code Walkthrough

Setup – it’s a container bound script for the active document.

Identify the paragraphs that are the chapter headers and figure out their chapter numbers and position in document.

Get all the images and figure out which chapters they are in by comparing their document position with that of each chapter

Index each image by its relative position within its chapter – will be used as the figure number and filename

Write them all out, converting them to jpg – this could be preceded by deleting all the files in the receiving folder, but I’ve taken that out for safety in case you use this code as is.

Record the results to a sheet along with some control information

Utility function for figuring out position in document

Utility function for figuring out Drive folder from folder path name

The output

Here’s a clip of the directory with some of the generated images.


and the index sheet.

The code

The whole thing

For more like this see Google Apps Scripts Snippets
Why not join our forum, follow the blog or follow me on twitter to ensure you get updates when they are available.