Pretrained Machine Learning API

Machine Learning with Sara Robinson

Sara Robinson presents two examples of computer vision classification tasks. First, she shows an orange and an apple, and poses the question of the best means of visually classifying the two fruits. Using the color distribution of the pixels in the two images would be a possible approach. Next, she presents images of sheep dogs and mops, many of which are not easily distinguished even with the human eye. Obviously, writing heuristic rules to classify between dogs and mops would not be easy. Insteady, she posits that the right approach is to create machine learning models to figure out the differences automatically.

Next, she introduces the two ways that Google Cloud can help users add machine learning to their applications. These two ways are custom models versus pretrained models.

The tools available to help construct custom machine learning models are:

  • TensorFlow is an open-source library provided by the Google Brain team. It allws you to train your own ML models using your own data.
  • Machine Learning Engine is a means of running TensorFlow models on managed Google infrastructure.

Sara also introduces “friendly machine learning,” which consists of a set of pre-trained APIs to give you access to pre-trained models with a single REST API request. The APIs include the following:

  • Vision API,
  • Speech API,
  • Translation API,
  • Natural Language API, and
  • Video Intelligence API.

Pre-trained ML APIs

Cloud Vision

Cloud Vision is a ML model that allows the user to perform complex image detection with a single REST API request. As an example, GIPHY is an app that users search for GIFS across the web and share them on social channels. GIPHY uses the vision API for OCR, or optical character recognition.

At its core, cloud vision provides label detection, which tells the user what this is a picture of. Web detection goes further, searching for similar images across the web. It extracts content from those images to return additional details on the image.

  • OCR: extracts text from images.
  • Logo detection: finds company logos.
  • Landmark detection determines whether an image contains a common landmark, such as the Golden Gate Bridge, and will provide latitude and longitude for that landmark.
  • Crop hints help you crop your photos to focus on a particular subject.
  • Explicit content detection determines whether an image contains appropriate content.

All these tools can be used from the browser before being implemented. Visit the vision API product page, to upload sample images.

Cloud Video Intelligence

Cloud Video Intelligence is an API that lets users understand their videos at shot, frame, or video level. Among the things Cloud Video Intelligence can tell you are

  • Label detection: tells users what the video is about, at both the high-level and at a shot level.
  • Video & scene-level annotations
  • Shot change detection: provide the timestamps every time the camera changes shots.
  • Explicit content detection: finds inappropriate scenes.
  • Regionalization: allows users to specify the region where video API requests should be executed.

Cloud Speech

Cloud Speech enables speech to text transcription in over 100 languages. It takes as an input an audio file and returns a text transcript. It also supports speech timestamps that provides start and end times for every word in the audio transcription. This enables quick searching in the recording. It includes profanity filtering. Lastly, in works with either batch or streaming transcription.

Cloud Translation provides a means for developers to translate text into over 100 different languages. It also can be used to detect the language being entered.

Cloud Natural Language

Cloud Nature Language enables developers to understand text with a single API request. To accomplish this, it performs several distinct functions.

  • Extract entities
  • Detect sentiment: tells you whether a sentence is positive or negative.
  • Analyze syntax: delves into linguistic syntax to extract various parts of speech. It provides the “lemma” of words as well (like “root,” helps -> help).
  • Classify content: classifies content into different categories. There are over 700 classifications that are possible.

“Wootric” is a company that helps companies analyze user feedback. The provide this service to their clients by using entity and sentiment analysis. This service helps their clients to route and respond to feedback in near realtime. It can narrow the scope of the feedback to various categories, such as “pricing too high,” “usability,” “product problem,” “product cost,” or “delivery-shipping.” So, the company can inform its clients that a certain user is angry about usability, for example. This is a huge improvement over reviewing every piece of feedback manually.

Lab: Machine Learning APIs

The notebook for these examples is already hosted in a GitHub repository. To begin, I run the following command from within Datalab, which clones the repository.

%bash
git clone https://github.com/GoogleCloudPlatform/training-data-analyst
rm -rf training-data-analyst/.git

To begin, I visit the API console, and choose credentials on the left-hand menu. “Create Credentials” generates an API key for the application. Keys can be restricted by IP address to prevent abuse.

APIKEY="AIzaSyB6X9rfuYhi380gujcGw_w-AGMWhVnQIu0"

Next, I enable the APIs that will be used in this lab, also from the API console.

I will invoke the APIs from Python, so I run the following command to install the Python package.

!pip install --upgrade google-api-python-client

Translate API

# running Translate API
from googleapiclient.discovery import build
service = build('translate', 'v2', developerKey=APIKEY)

# use the service
inputs = ['is it really this easy?', 'amazing technology', 'wow']
outputs = service.translations().list(source='en', target='fr', q=inputs).execute()
# print outputs
for input, output in zip(inputs, outputs['translations']):
  print u"{0} -> {1}".format(input, output['translatedText'])

The output of this code follows.

is it really this easy? -> est-ce vraiment si simple?
amazing technology -> technologie étonnante
wow -> sensationnel

Vision API

import base64
IMAGE="gs://cloud-training-demos/vision/sign2.jpg"
vservice = build('vision', 'v1', developerKey=APIKEY)
request = vservice.images().annotate(body={
        'requests': [{
                'image': {
                    'source': {
                        'gcs_image_uri': IMAGE
                    }
                },
                'features': [{
                    'type': 'TEXT_DETECTION',
                    'maxResults': 3,
                }]
            }],
        })
responses = request.execute(num_retries=3)

The image processed using the preceding code follows. It was obtained here.

Then, I print the response, which follows.

foreigntext = responses['responses'][0]['textAnnotations'][0]['description']
foreignlang = responses['responses'][0]['textAnnotations'][0]['locale']
print foreignlang
print foreigntext

‘zh’ indicates this is Chinese text.

zh
请您爱护和保
护卫生创建优
美水环境

The translation of the sign is given by the following

inputs=[foreigntext]
outputs = service.translations().list(source=foreignlang, target='en', q=inputs).execute()
# print outputs
for input, output in zip(inputs, outputs['translations']):
  print u"{0} -> {1}".format(input, output['translatedText'])
请您爱护和保
护卫生创建优
美水环境
 -> Please care for and protect the health to create a beautiful water environment

Sentiment Analysis using Language API

lservice = build('language', 'v1beta1', developerKey=APIKEY)
quotes = [
  'To succeed, you must have tremendous perseverance, tremendous will.',
  'It’s not that I’m so smart, it’s just that I stay with problems longer.',
  'Love is quivering happiness.',
  'Love is of all passions the strongest, for it attacks simultaneously the head, the heart, and the senses.',
  'What difference does it make to the dead, the orphans and the homeless, whether the mad destruction is wrought under the name of totalitarianism or in the holy name of liberty or democracy?',
  'When someone you love dies, and you’re not expecting it, you don’t lose her all at once; you lose her in pieces over a long time — the way the mail stops coming, and her scent fades from the pillows and even from the clothes in her closet and drawers. '
]
for quote in quotes:
  response = lservice.documents().analyzeSentiment(
    body={
      'document': {
         'type': 'PLAIN_TEXT',
         'content': quote
      }
    }).execute()
  polarity = response['documentSentiment']['polarity']
  magnitude = response['documentSentiment']['magnitude']
  print('POLARITY=%s MAGNITUDE=%s for %s' % (polarity, magnitude, quote))
POLARITY=1 MAGNITUDE=0.9 for To succeed, you must have tremendous perseverance,
  tremendous will.
POLARITY=-1 MAGNITUDE=0.5 for It’s not that I’m so smart, it’s just that I stay
  with problems longer.
POLARITY=1 MAGNITUDE=0.9 for Love is quivering happiness.
POLARITY=1 MAGNITUDE=0.9 for Love is of all passions the strongest,
  for it attacks simultaneously the head, the heart, and the senses.
POLARITY=1 MAGNITUDE=0.2 for What difference does it make to the dead, the
  orphans and the homeless, whether the mad destruction is wrought under the
  name of totalitarianism or in the holy name of liberty or democracy?
POLARITY=-1 MAGNITUDE=0.4 for When someone you love dies, and you’re not
  expecting it, you don’t lose her all at once; you lose her in pieces over a
  long time — the way the mail stops coming, and her scent fades from the
  pillows and even from the clothes in her closet and drawers.

Speech API

The speech API is run on the audio file linked here.

sservice = build('speech', 'v1beta1', developerKey=APIKEY)
response = sservice.speech().syncrecognize(
    body={
        'config': {
            'encoding': 'LINEAR16',
            'sampleRate': 16000
        },
        'audio': {
            'uri': 'gs://cloud-training-demos/vision/audio.raw'
            }
        }).execute()
print response
{u'results':
  [{u'alternatives':
    [{u'confidence':
        0.98360395,
      u'transcript':
        u'how old is the Brooklyn Bridge'}]}]}

Print the confidence.

print response['results'][0]['alternatives'][0]['transcript']
print 'Confidence=%f' % response['results'][0]['alternatives'][0]['confidence']
how old is the Brooklyn Bridge
Confidence=0.983604

Summary

The following is a summary of the entire first course, entitled “How Google Does Machine Learning.”

The aim of this specialization is to teach those completing it how to build production machine learning models. The specialization is intended for varying audiences, from Python programmers to Data Scientists. The specialization includes various courses that provide a practical, real-world intro to machine learning.

Google views machine learning as a way to replace heuristic rules that tend to build up over time. As an example, there was a time when Google used heuristic rules to determine what results to return if a user googled something potentially location-sensitive, like “giants.” Now, those rules have been replaced by ML models.

One of the best use-cases for ML is to use it to personalize business offerings for customers. An example of this Google Maps. A deterministic algorithm could provide the most basic maps functionality. Machine Learning, however, is required to enable maps to infer what the user might want to know, or see. Those types of decisions and insights are impossible at scale without machine learning.

Next, Google employees shared the “secret sauce.” The secret sauce is the organizational know-how Google has acquired over many years of managing more value-generating ML systems than any other company in the world. In particular, there are 5 phases that business processes go through before the business process is ready for machine learning. These are (1) individual contributor, (2) delegation, (3) digitization, (4) big data and analytics, and (5) machine learning.

Another lesson Google shared was the importance of recognizing that unconscious human biases can be amplified by ML models, if models are not trained using data that is gathered very conscientiously.

Finally, the instructor introduced the Quiklabs system and Datalab, which will be the means for our performing most of the labs in this specialization. Then, Lak discussed how Cloud storage and compute engine provide the CPU and storage necessary for distributed notebooks. Python notebooks were demonstrated running BigQuery queries using thousands of machines simultaneously. Last, various ML APIs were demonstrated. As ML matures, many repeatable tasks will be available in pre-trained form using free APIs.