Machine Learning in the Marketplace

This content is taken from my notes on the Coursera course “How Google does Machine Learning.” It is part of the “Machine Learning with TensorFlow on Google Cloud Platform” specialization.

The specialization is sponsored by Google Cloud and this particular course is presented by Valliappa Lakshmanan, or “Lak,” a Technical Lead for Google Cloud’s Big Data and Machine Learning professional services.

AI versus ML

  • AI is a discipline, like Physics
  • ML is a toolset, like Newton’s Laws of Mechanics

Newton’s Laws can be used to calculate the Physics of how long a ball will take to reach the ground after you drop it off a cliff. In the same way, Machine Learning is a specific way of solving AI problems, in contrast to Expert Systems, for example. In ML, machines don’t start intelligent, they become intelligent.

Stages of Machine Learning

The form of machine learning this specialization focuses on is “Supervised Learning.” This is the most mature form of machine learning.

Stage 1: Train an ML model with examples. Training data consists of an input, like an picture of a cat, and a label, “cat.” The ML model itself is a mathematical function with adjustable parameters. Training the model consists of making many tiny adjustments to the model function so the output becomes closer to the label for appropriate inputs.

Stage 2: Predict (or “Infer”) the label for an image it has never seen before. The key to correct predictions is in a model’s ability to generalize about new data. The fuel that powers this generalization is lots of data.

Similar to his recommendation that data scientists not lose sight of data serving while they are modeling, Lak also says data scientists must not lose sight of the inference stage of machine learning. A major pitfall many data science teams fall into is overemphasizing model training at the expense of prediction/inference. This specialization will emphasize the processes associated with putting ML models into production. Google calls this “end to end machine learning.”

ML in Google Products

Neural Networks are the Focus of this Specialization, but there are many other common mathematical models used in machine learning, including linear methods, decision trees, radial basis functions, tree ensembles, radial basis functions followed by linear methods, etc.

Neural networks have proven themselves to be best or nearly the best in a wide variety of tasks. They have existed since the 1970s, but at that time they only had a single hidden layer. Three factors have led to the popularity of deep learning today:

  1. Computational power - training deep neural networks takes a lot of computing power, which we now have.
  2. Data availability - deep neural networks require much more data to train than models with a single hidden layer.
  3. Computational tricks - researchers had to develop a variety of techniques to prevent layers becoming all zero or blowing up, becoming NaN.

With these problems solved, deep learning has become very popular, enabling the solution of hard problems like language translation, image classification, speech understanding, and so on. At Google circa 2012, there were almost no deep learning models in production. As of Q1 2017, there were over 4000. In fact, machine learning forms a part of almost every product Google offers, whether YouTube, Play, Chrome, Gmail, Hangouts… all use ML models, and most use dozes of ML models.

Just because there is one business problem does not mean that there should be one neural network to solve that problem. Some business problems require many neural networks to answer it completely. In the case of Google, YouTube does not incorporate a single ML model, but rather dozens.

An an example, consider the business problem of predicting stock-outs at a store. Individual neural networks may be required to (1) predict product demand, (2) predict inventory, and (3) predict restocking time.

In this course, we will train, deploy, and predict with a single model. In practice though, data scientists may develop many ML models to solve a use-case. Lak: “Avoid the trap of pursuing a single, monolithic, one-model-solves-a-whole-problem solution.”

Demo: ML in Google products

Google Photos might be the single best example of the possibilities machine learning models create. Users can upload photos to Google Cloud, and Google’s machine learning models will automatically tag them. Later, users can search their photos using keywords that will be matched to those tags.

Google Translate allows users to point their cell phone camera at a sign and read a translation of the sign in their native language. Google translate uses multiple machine learning models to accomplish this, including (1) identifying the sign, (2) reading the sign using OCR, (3) identifying the language, (4) translating the language, (5) superimposing text, and (6) selecting the correct font.

Gmail’s “Smart Reply” feature provides users with three possible responses to a received email. In Lak’s view, this is the most sophisticated ML model in production today.

Replacing Heuristics

The following comment from Eric Schmidt was in the context of his discussion about Google becoming an “AI-first” company.

“Machine learning. This is the next transformation… the programming paradigm is changing. Instad of programming a computer, you teach a computer to learn something and it does what you want.” - Eric Schmidt, Executive Chairman of the Board, Google

Lak points out that there is nothing in Eric’s comment about “data.” Google conceptualizes ML as being more about logic and replacing heuristic rules. It is ultimately a means of replacing programming itself.

As an example, Lak discusses google search some years back, which consisted of many hard-coded rules about what content to display a user who uses Google to search. For example, location-based rules would dictate the results a user saw if they searched for “giants.” Bay Area users would receive San Francisco Giants results, New York users would receive New York Giants results. Obviously, with time, the resulting rule-based, heuristic code base grew unwieldy. RankBrain is the deep neural network for search ranking that replaced that large set of heuristic rules, dramatically improving search results in the process.

What problems can ML solve today? Lak’s answer: “Anything for which you are writing rules.” So, its not just about predictive analytics. This is a much more expansive field of possibilities for machine learning to be applied to. Google considers ML a means to scale, automate, and personalize. An important caveat to all this is that for ML to be a viable solution, data is required.

Its All About Data

A few years ago Google found that location based queries were becoming more common: like “Japanese toys in San Francisco,” “Live lobster in Kissimmee,” and “vegan donuts near me.” Consider the query, “coffee near me.” How can this be turned into a machine learning problem? Recall ML is about taking a bunch of examples and converting that knowledge into future predictions.

The prediction in this case is whether or not the user will appreciate and/or act on the result provided by Google. But, there are a huge number of considerations that need to be taken into account in order to deliver the best result to the user. Among these: How far is too far? How much does the rating of the restaurant matter? How about the service time?

Rather than guessing at all these, Google gathers a lot of data about user behavior. Recall that this data is essentially labeled training data for an ML model. In this situation, the input is the distance to the shop and the label is, does the user like the result? A massive amount of data is aggregated and then the model is fit. Because of the large amount of data available, many of the aforementioned idiosyncrasies of consumer behavior are taken into account.

Framing an ML Problem

The following is are three examples of how to frame an ML problem. The first two examples are my work, the final one is Lak’s own response. brief write-up below is an example of how to frame an ML problem. Two use-cases are considered.

The Approach

First, each use-case is framed as an ML problem, answering
(1) What is being predicted?
(2) What data is needed?
Second, each use-case is framed as a software problem, answering
(3) What is the API for the problem during prediction?
(4) Who will use this service?
Third, each use-case is framed as a data problem, answering
(5) What data are we analyzing?
(6) What data are we predicting?
(7) What data are we reacting to?

Retail Industry: Customer ROI and lifetime value
ML: What is being predicted? The lifetime value of customers and the probable ROI of reaching customers by various advertising channels.
What data is needed? Which customers were reached by which channel? How much was invested in each advertising channel?
Software: What is the API? ROI(marketSegment, productSegment, channel, month) An API to generate expected RIO for a given market segment and a given marketing channel.
Who will use this service? This would be figured out by internal or external marketing analysts using a variety of internal or purchased, proprietary databases. Actual APIs may or may not be used.
Data: What data are we analyzing? Sales data, as aggregated by member ID or something similar. Marketing investment data.
What data are we predicting? Future value of customers, specifically, their expected spending going forward. Spending as a function of marketing investment, as well.
What data are we reacting to? Real-time sales data. Real-time inventory data.

Financial Services: Credit worthiness evaluation
ML: What is being predicted? Whether or not a given customer will be profitable for the bank.
What data is needed? The standard creditworthiness metrics: Age, Years at Employer, Years at Address, Income, Credit Card Debt, Automobile Debt
Software: What is the API? loanApproved(Age, YearsAtEmployer, YearsAtAddress, Income, CreditCardDebt, AutomobileDebt)
Currently banks purchase creditworthiness ratings from the three credit bureaus or FICO. Some may also rely on internal, proprietary systems.
Who will use this service? Investment advisors and personal bankers, when evaluating customers for loans or lines of credit
Data: What data are we analyzing? Customer creditworthiness data, possibly real-time payments received and capital availability data as well.
What data are we predicting? Future profitability of given credit applicants.
What data are we reacting to? Customer creditworthiness data, capital availability data

Lak’s Walkthrough:

Manufacturing: Demand Forecasting
ML: What is being predicted?
How many units should be produced this month?
What data is needed?
Historical data on units sold (sales price, returns)
Competing product price
If selling a subassembly of a larger product, how many of a larger product were sold?
Economic figures for this month last year (consumer confidence, interest rate)
Software: What is the API?
predictDemand(productID, month)
Lak notes that the ancillary data that the ML model relies on are not included as inputs to the API. It is assumed that the model has that built-in.
Who will use this service?
Product managers, logistics managers
How are they doing it today?
Examine trends of phone sales, overall economy, trade publications, etc.
Data: What data are we analyzing?
Collect: economic data, industry data, competing product data, and internal figures What data are we predicting?
Build features that current experts are using for traditional analytics, then build these in as inputs to the model
What data are we reacting to?
Depending on the predicted demand, there may be an automatic reaction to place orders from various suppliers

ML in Applications

Lak presents a case study where a company called Aucnet, a Japanese car auction company, uses a machine learning system to determine a car’s make, model, and price from a series of photos that dealers upload. The system saves significant time over the previous, manual way that dealers obtained and entered this information. The system is an example of a highly-engineered, custom ML system.

Pre-trained Models

In stark contrast to the previous case study, one of the easiest ways to incorporate ML into existing application is to use pre-trained models, which are off-the-shelf solutions. An example of this type of solution is NLP, or Natural Language Processing, which enables companies to determine the sentiment expressed in an email or other text communication. With this information, they can determine how their customer contact staff should prioritize the communication.

Gartner (the world’s leading research and advisory company) estimates that 50% of enterprises will be spending more per annum on bots and chatbot creation than traditional mobile app development, by 2021.

The trend is for interfaces to move away from traditional point-and-click web interfaces and toward natural-language, chat interfaces. These are so called “conversational interfaces.”

There are a variety of domains where Google exposes Machine Learning Services that are trained on Google’s own data. Among these are the Vision API, Speech API, Jobs API, Translation API, Natural Language API, and Video Intelligence API.

The ML Marketplace is Evolving

The trend is toward increasing levels of ML abstraction, following the traditional trend in software where applications are fairly low-level while the technology is new and maturing. As the technology matures, things get more abstract and high-level.

This specialization is about building custom models. But, going forward, increasingly it will be possible to build applications that use machine learning by leveraging APIs created by someone else. Of course, these APIs will need to be built by someone, and that someone may be the people completing this course.

A Data Strategy

Lak poses a question: what constitutes machine learning? All three of the following

  1. Routing from point A to point B subject to a set of constraints can be solved by the A* algorithm - not complex.
  2. Google maps ascertains that the user is on floor 2 of a structure and provides that information, unprompted. Whatever data the system uses to determine this, it is obvious this cannot be accomplished by simple heuristics.
  3. Maps recommends visiting a nearby location that may be of interest to the user, based on information the system knows about what the user likes. ML has turned the maps application into a personal assistant.

Again, machine learning is about scaling beyond simple hand-written rules. But, with abstraction, you get to the point that you are able to do things that you could never do writing handwritten rules. The ultimate promise is to be able to create totally custom experiences for each of your users depending upon that particular user’s likes and preferences. Lak notes that the most complex situation, item 3, above, includes on the screen a question asking the user, “Is this card useful right now?” Soliciting feedback from the user is necessary to continue improving the model.

Lak: If ML is a rocket engine, data is the fuel. Lots of data are required.

A truism in machine learning is that “data wins every time.” In other words, “Simple ML and more data > Complex ML and small data.” So, given the choice between gathering more and more varied data and building a more complex model, gather more data. An ML strategy is first-and-foremost a data strategy.

Training and Serving Skew

How does an organization get started with machine learning? The most well-trod route Google’s customers take is to go from a use case where the customer is currently performing manual data analysis, and then translating that to an ML problem.

There are several reasons to progress from manual data analysis to machine learning.

  1. If you are doing manual data analysis, it is likely that you have the data already. Collecting data is often the longest and most difficult part of a machine learning project.
  2. It is useful to go through a manual analysis stage before progressing to machine learning, because if you cannot analyze your data well manually, you are unlikely to be able to analyze it with the added complexity of machine learning. Manual analysis helps you fail fast.
  3. Building a good machine learning model requires a deep knowledge of the data. The first step should be to get to know the data, which is facilitated by performing manual data analysis.
  4. ML is a journey toward automation and scale. If you can’t do analytics, you will not be able to do ML.

Many ML projects fail because of “training-serving skew.” This is the situation where a company has a certain system for processing historical data and training models, and there is a different system, possibly written by a different group, doing prediction. Fundamentally, the problem is that the model needs to see the exact same data during serving as it was used to seeing during training, or else the predictions are going to be off.

One way to resolve this is use the same code that was used to process historical data during training and reuse it during predictions. To do this, it is necessary for your data pipelines to process both batch and stream. This is the key insight behind Cloud Dataflow, which is a way to author data pipelines in Python, Java, or even visually with cloud dataprep. The open-source version is Apache Beam, where “B” stands for Batch, and “eam” stands for stream.

Another reason to use the same model for both training and prediction is for performance reasons. During training, scale to cope with large data volumes is the metric, usually accomplished via distributed training. During prediction, the key performance element is speed of response. The system needs to be able to handle large numbers of queries per second. This is a key insight behind TensorFlow.

Many ML frameworks exist for training, but not so many are capable of operationalizing.

An ML Strategy

Lak’s one takeaway for this section: “The magic of ML comes with quantity, not complexity.” The best ML approach is to connect many small, simple models. This enables your team to try many things, fail quickly, and iterate.

90% of enterprise data is unstructured data. Emails, videos, texts, reports, catalogs, events, news, etc. Dealing with unstructured data has become much easier with ML APIs, which extracts meaning from the mess, which can then be passed into other models.

So, don’t start with unstructured data, start with ML APIs and use their outputs as inputs to your custom ML models.

Transform Your Business

One of the best uses of ML in today’s world is to surprise and delight users. Fulfill user intent in interesting ways, using the capabilities of ML.

There are three ways your business can benefit from ML:

  1. Infuse your apps with ML. Simplify user input, and adapt to the user.
  2. Use ML to fine-tune your business. Streamline your processes, create new business opportunities.
  3. Use ML to delight your users. Anticipate their needs, and creatively fulfill their intent.

This content is taken from my notes on the Coursera course “How Google does Machine Learning.” It is part of the “Machine Learning with TensorFlow on Google Cloud Platform” specialization.

The specialization is sponsored by Google Cloud and this particular course is presented by Valliappa Lakshmanan, or “Lak,” a Technical Lead for Google Cloud’s Big Data and Machine Learning professional services.