So this just happened. BigQuery integration with Sheets!. This is great; now you can use a spreadsheet as a federated datasource for BigQuery, and therefore query (and join with other massive datasets) your spreadsheet data in real time.
It’s good, but it’s not quite all that yet, so here’s a few things I found out that might save you some time as you start to enjoy this awesomeness. I assume that it’s not fully released yet, and these things will be sorted out soon.
It’s a CSV
Actually, the Google Sheets doesn’t exist as an option, so the only selection that works is CSV.
The url
The ? in the BigQuery UI says the format is
You can do that, but it’s probably more convenient to use the url from the sheets editor.
https://docs.google.com/spreadsheets/d/1HLrLO2xnbk_1vezB_kmPvDFv1oX50HcD1P1lQfv7KKY/edit#gid=0
Selecting the sheet
The gid= parameter is used to identify which sheet to use, but this is apparently ignored. The first sheet is used regardless of the gid specified. So this means that only the first sheet in the workbook is able to be BigQuerified.
The schema
You’d think that the sheet would be able to automatically figure out its own schema and data type, using the heading row of the sheet, but it doesn’t. You have to specify the schema manually – entering each field. It’s a good idea to go into ‘edit as text’ once you’ve set the schema up and save the definition somewhere – which will look something like this, since you can’t seem to be able to edit the setting (or even get to this option) once the table has been created.
ipnumber:STRING,name:STRING
Skipping the header row
Since this is being treated as a CSV, the header row is nothing special – and you need to tell it to skip it – since you’ve set up the schema manually anyway, otherwise the header row will end up as part of your data.
Changing the sheet structure
If you ,say, add a column, after creating the schema
that’s to say my schema says this
but I’ve since added a new column in my sheet
This query
SELECT * FROM [airports.bb]
returns the data based on the original schema
which means that the schema is position based and not in any way dynamic. It’s not surprising since the heading row is not used anyway, but it’s certainly something to look out for.
Summary
All this means that what we are able to do today doesn’t seem to exactly match the expectations raised by the released blogpost – Hopefully when we see the version with the Google Sheets input option – all this will be taken care of.