Scraper wiki is an environment which allows you to put some structure around data published in web pages. In addition, it has an sqlLite datastore through which many users of scraperWiki make data from their endeavours available. In this blog post, I showed how to use the scraperWiki API to pull data about public projects published on scraperWiki. Now let's look at getting data out of the scraperwiki datastore into Excel and Google Docs using the scraper wiki API. This page will cover the Excel implementation, with the Google Apps script covered here. However, as with all projects on this site, the implementations are as close as the language syntax allows. All code can be downloaded and is in the cDataSet.xlsm file.
You can scrape directly from VBA as well - see Regex, scraping and visualizing for an example
Before we get started on a walkthrough, this ends up being just a neat one liner to create a table in Excel from a given scraperWiki - it looks like this
So we'll assign a button to that, to ask which one you want
Using Rest to Excel library we can easily populate a page with all known scrapers. Here's a snippet from a list of about 1000 below.
The short_name is the key to all scrapers in scraperWiki. Not all scrapers have valid data though, so first let's take a pass through this table and check out which ones have. To do that, we use the API again, this time accessing the sqlLite database schema to see if there are any data tables. For this and for retrieving data we will use this new rest library entry
For simplicity we'll add an extra column showing the default sql that would be needed to get data from the first data table in each scraperWiki's datastore. That gives us something like this
Loop through each row in the scraperWiki directory and construct a default sql statement for those rows that have a dataStore with a table in it
Get any tables that exist for this shortname, and construct an sql
Access the sqlLite datastore using the API and see what tables we can find.
Now we've analyzed the whole directory for scrapers with any data, we can pick one and execute it
For the purposes of this test I've chosen something at random called 'fantasy_premier_league_player_stats_4', and here's a snippet of the data it returns
You can get the code at this gist , but you it is also contained in cdataSet.xlsm, which you can download to include all the needed libraries
scraperWiki is a very powerful tool with which you can create your own datasets, as well as access public datasets of others, and get them into Excel or see Scraperwiki data to Google Docs. Take a look at how the Excel Rest Library for more like this. In the meantime why not join our forum,follow the blog or follow me on twitter to ensure you get updates when they are available.
Services > Desktop Liberation - the definitive resource for Google Apps Script and Microsoft Office automation > excel to json and back > Rest to Excel library >