Data wrangling with named columns in Google Spreadsheet

In Using crossfilter with Google Apps Script I showed how to use the powerful crossfilter library to do some data wrangling. But using some of JavaScript's Advanced Array functions, and working with named columns instead of column numbers gives you a lot of wrangling power straight out of the box. 

In the example, we are going to play around with some of the data I used in Eurovision results with crossfilter and dc.js.

The data 

We have several thousand rows of data that starts like this

year country received from
2000 ISR 6 FRA
2000 ISR 1 MKD
2000 NLD 8 ISR
2000 NLD 2 FRA
2000 NLD 5 MLT
2000 NLD 8 BEL
2000 NLD 5 CYP
2000 NLD 1 ISL
2000 NLD 4 ESP
2000 NLD 1 DEU
2000 NLD 2 HRV
2000 NLD 3 TUR
2000 NLD 1 IRL
2000 GBR 1 ISR

I'd like to create a summary sheet showing the total votes received for each country, and also sort this as well as the original data into reverse order of total votes received. You can do this is in juts a few lines of code.

Here's the start of the summary data result


country total votes
RUS 1949
GRC 1750
UKR 1572
SWE 1481
TUR 1427
DNK 1252
SRB 1199
NOR 1149
AZE 1089
ROU 1059
BIH 1035
DEU 943
EST 733
ESP 731

And the original data, resorted in the same order of countries.


year country received from
2008 RUS 6 MKD
2008 RUS 12 UKR
2008 RUS 7 DEU
2008 RUS 12 EST
2008 RUS 4 BIH
2008 RUS 6 ALB
2008 RUS 3 BEL
2008 RUS 12 LVA
2008 RUS 6 BGR
2008 RUS 10 SRB
2008 RUS 12 ISR
2008 RUS 8 CYP
2008 RUS 10 MDA
2008 RUS 10 ROU
2008 RUS 6 PRT
2008 RUS 5 NOR
2008 RUS 10 HUN
2008 RUS 5 AND
2008 RUS 6 POL


The code


function myFunction() {
  
  // get all the data on the sheet
  var ss = SpreadsheetApp.getActiveSpreadsheet();
  var sheet = ss.getSheetByName("votes");
  var summarySheet = ss.getSheetByName("summary");
  var values = sheet.getDataRange().getValues();
  
  // separate the headers & data
  var headers = values[0];
  var data = values.slice(1);

  // map the header names to column numbers
  var headMap = {};
  headers.forEach ( function (d,i) {
    var name = d ? d : 'column '+(i+1);
    headMap[name] = i;
  });
  
  // do things with the data - lets sum up all the votes by country
  var votesByCountry = data.reduce(function(summary,row) {
    
    if (!summary[row[headMap.country]]) {
      summary[row[headMap.country]] = 0;
    }
    summary[row[headMap.country]] += row[headMap.received];
    return summary;
  },{});
  

  // lets sort the oroginal data by total votes received by country, in reverse order, and re output the result
  sheet.getRange(2,1,data.length,data[0].length).setValues( 
    data.sort ( function (a,b) {
      return votesByCountry[b[headMap.country]] - votesByCountry[a[headMap.country]];
    }));
    
    
  // lets put the summary result to another sheet and sort in reverse order
  summarySheet.getRange(1,1,Object.keys(votesByCountry).length+1,2).setValues(
    [['country','total votes']]
    .concat(Object.keys(votesByCountry).map (function(k) {
      return [k,votesByCountry[k]] 
    })
    .sort ( function (a,b) {
      return b[1] - a[1];
    })));
}


Walkthrough


First separate the data into header and data, then create an object that maps column names to the physical column they appear in the data. This means we can forget all about column numbers, or even change the shape of the sheet, and our subsequent code will still work.

  // get all the data on the sheet
  var ss = SpreadsheetApp.getActiveSpreadsheet();
  var sheet = ss.getSheetByName("votes");
  var summarySheet = ss.getSheetByName("summary");
  var values = sheet.getDataRange().getValues();
  
  // separate the headers & data
  var headers = values[0];
  var data = values.slice(1);

  // map the header names to column numbers
  var headMap = {};
  headers.forEach ( function (d,i) {
    var name = d ? d : 'column '+(i+1);
    headMap[name] = i;
  });


Now create a summary object all votes received by all countries


  // do things with the data - lets sum up all the votes by country
  var votesByCountry = data.reduce(function(summary,row) {
    
    if (!summary[row[headMap.country]]) {
      summary[row[headMap.country]] = 0;
    }
    summary[row[headMap.country]] += row[headMap.received];
    return summary;
  },{});

And we're ready to output the result!

First sort the original data based on the total votes received for each country, and write it back to the original sheet in a new order


  // lets sort the oroginal data by total votes received by country, in reverse order, and re output the result
  sheet.getRange(2,1,data.length,data[0].length).setValues( 
    data.sort ( function (a,b) {
      return votesByCountry[b[headMap.country]] - votesByCountry[a[headMap.country]];
    }));

Now write the summary results, including a heading row


  // lets put the summary result to another sheet and sort in reverse order
  summarySheet.getRange(1,1,Object.keys(votesByCountry).length+1,2).setValues(
    [['country','total votes']]
    .concat(Object.keys(votesByCountry).map (function(k) {
      return [k,votesByCountry[k]] 
    })
    .sort ( function (a,b) {
      return b[1] - a[1];
    })));

You want to learn Google Apps Script?

Learning Apps Script, (and transitioning from VBA) are covered comprehensively in my my book, Going Gas - from VBA to Apps script, available All formats are available now from O'Reilly,Amazon and all good bookshops. You can also read a preview on O'Reilly

If you prefer Video style learning I also have two courses available. also published by O'Reilly.
Google Apps Script for Developers and Google Apps Script for Beginners.




For help and more information join our forumfollow the blogfollow me on twitter

Comments