Comparing Google and Microsoft - image emotion detection

In the examples for Some OAUTH2 walkthroughs, I used the Google CloudVision API to demonstrate various authentication scenarios. The result was an evolving webapp which could do some cool things applying that API to images. I thought it might be fun to expand it a little to do the same thing using Microsoft's project oxford vision API, which has many similar capabilities.

All the code for this is on Github and is a development of the examples shown in Some OAUTH2 walkthroughs. I'm not going to dig into the code too much in this post. You are welcome to take a look and play around with it yourself. 

Authorization and payment.

Both APIS need authorization
  • Cloudvision with OAUTH2 ( you can use either service accounts or web app flow)
  • Project Oxford provides an API key, but it took me on a complex trip to sign up to a free Azure trial first (which I think is mandatory but I'm not 100% sure as things didn't go that smoothly). 
You need to pay for both APIS, but there is a free tier.
  • I needed to enable billing on my Google cloud project I used for OAuth2
  • I had to provide my credit card to sign up for Azure.

Capabilities

Both have similar capabilities such as image classification, face detection and emotion detection. I'm going to focus on emotion detection in this post, since it presupposes the others.

There's a video version of this post here.

My opinion

The Google one allows you to do more things at once, but the result is that it's more complicated to use. It also returns categorized emotional scores, whereas Microsoft sticks its neck out and gives you an actual value. Microsoft is also more ambitious in its emotional analysis range, attempting to detect more subtle emotions. Although neither are perfect, I think the Microsoft one gives more balanced and subtle results, and its easier to use. So for once, it's Microsoft for me. Let me know what you think from the examples below.

Results

Before I get into the details, here are the side by side results of emotion analysis on a couple of images. 


Some notes on results

The emotions detected (they are called slightly different things, but I've normalized them for comparison), are a little different. Both detect joy, sorrow, anger and surprise, but Microsoft go a little further and try to look for contempt, disgust, fear and neutrality. Google offer some functional type measures such as headwear detection, underexposed and blurred.

Measures

Whereas Microsoft returns values between 0 and 1 (actually there are a couple of results that were very tiny negative numbers), 



Google return a classification.



So that I could compare them as a chart, I assigned these weights to the categorizations.

    scales: {
      "VERY_LIKELY":0.95, 
      "VERY_UNLIKELY":0,
      "POSSIBLE":0.5,
      "LIKELY":0.7,
      "UNLIKELY":0.3,
      "UNKNOWN":0
    }

Some more results


Let's take a look at a few more comparisons. Google (on the right) got this one completely wrong - Bernie doesn't look too happy to me in this picture. He looks angry and somewhat disgusted - Microsoft got it right.


Google did better in this one, capturing both the surprise and anger in this Trump image.

In this image Microsoft did better - picking up  the Trump trademark surprise and anger 

I love this photo, which is 100% joy, recognized by both. Strangely, Google didn't detect she was wearing a hat.

There really isn't a category to describe this dopey image of Francois Hollande, but they both concluded he was happy, with Microsoft throwing in some surprise.

Google completely failed to notice any emotion in this Trump picture, whereas Microsoft picked up the clear sorrow, anger and even contempt in the expression. Google just focused on the hat. In fact it often thinks people are wearing hats when they are not (see the picture from homeland at the beginning of this post)

Both did well in this Hillary image, picking up both the surprise and joy.

Face and label detection


I've covered Google label detection in other posts in Some OAUTH2 walkthroughs, but it's interesting that Google certainly likes to detect  hair or headwear.

even giving first classification of hair to Trump when he's wearing a hat.



I haven't tried Microsoft label detection. Face detection is of course a pre-cursor to being able to analyze emotion. The Google face detection and emotion detection are wrapped up - both are returned from the same query. Microsoft has a separate api call for each.

Some implementation notes. 


Since the main purpose of this app was about authentication scenarios, it's a bit rough for a finished app. For example it will not handle cases where no emotion can be detected, nor with multiple faces in the same photo, so make sure your folder contains images appropriate for the type of analysis being done. If you decide to use the code and enhance it, I'd love to host your write up in my guest post section of this site. Please let me know if you'd like to do this.

All the code for this is on Github

This is what has been implemented in the app as it stands today.


Using Cache

I would recommend you always use cache, since this avoids a call to the API. It will be faster, and might avoid some charges.  Google allows you to send multiple queries in one request, whereas Microsoft needs a query for each request. I've cached each result separately , so the only queries that are actually made are for new or changed images (or if cache has expired for that image). 

Setting up authentication

I've covered this a number of times in Some OAUTH2 walkthroughs, and all flavors of that are in this example. Once you have your credentials create and run a one off function that looks like this, substituting your credentials and ids as appropriate. 
  // web account for cloud vision - taken from downloaded credentials
  cGoa.GoaApp.setPackage (propertyStore , 
    cGoa.GoaApp.createPackageFromFile (DriveApp , {
      packageName: 'cloudvision',
      fileId:'0B92ExLh4POiZUjIzbXFreTVPdjQ',
      scopes : cGoa.GoaApp.scopesGoogleExpand (['cloud-platform','drive']),
      service:'google',
      apiKey:'AIzaxxxxh632MbPE',
      msEmotionKey:'e9xxxxxxxxxab2' // you can store arbirary properties in goa too.
    }));

Folder of Images

You should put the images you want to analyze in a folder and select it in the App. I put  a limit on the number of images it will analyze to avoid accidentally generating potential charges. You'll find that maxFiles property in the Cloudvision namespace.

For more like this, see Google Apps Scripts snippets. Why not join our forumfollow the blog or follow me on twitter to ensure you get updates when they are available. 

You want to learn Google Apps Script?

Learning Apps Script, (and transitioning from VBA) are covered comprehensively in my my book, Going Gas - from VBA to Apps script, All formats are available from O'ReillyAmazon and all good bookshops. You can also read a preview on O'Reilly

If you prefer Video style learning I also have two courses available. also published by O'Reilly.
Google Apps Script for Developers and Google Apps Script for Beginners.


Comments