This is now deprecated since the demise of Gadgets on Sites

This was also part of each evenings scheduled run. The site was serialized and each analytics entry is matched up to a page on the site. This whole project was heavily recursive, right from the site organization, right through to the final output where rankings were shown both for the page and the topic to which the page belongs.

Here’s how the pages were fetched

Again I’m using exponential backoff to avoid those tiresome rate limit errors.
Now I have a a nice tree of all pages on my site, ready to be matched