Machine learning: building the bigger picture by leveraging data

What you learn:

  • Different types of machine learning.
  • Learn more about types of guided and unsupervised approaches to machine learning.

Machine learning (ML) is a data analysis method that automates the building of analytical models. It is a branch of artificial intelligence (AI) based on the idea that systems can learn from data, identify patterns and make decisions with minimal human intervention. ML algorithms build a model based on sample data or training data to make predictions or make decisions without being programmed to perform a particular task.

Such algorithms are used in numerous applications, including medicine, autonomous vehicles, speech recognition and machine vision, where it is difficult or unfeasible to use traditional algorithms to perform the required tasks. It’s also behind chatbots and predictive text, language translation apps, and even the shows and movies recommended by Netflix.

When companies use artificial intelligence programs, chances are they are using machine learning. So much so that the terms are often used interchangeably and sometimes ambiguously as an all-encompassing form of AI. This subfield aims to create computer models that exhibit intelligent behavior similar to that of humans, meaning they can recognize a visual scene, understand a text written in natural language, or perform an action in the real world.

Forms of Machine Learning

ML is related to computational statistics, which focuses on making predictions using computers, but not all ML is statistical learning. Some implementations of ML use data and neural networks in a way that mimics the workings of a biological brain.

The study of mathematical optimization provides methods, theory and application domains for ML. Data mining is another related field, focused on exploratory data analysis through unsupervised learning.

To this end, learning algorithms operate on the basis that strategies, algorithms, and interpretations have worked well in the past, and are likely to continue to work well in the future. These inferences may be obvious, such as “since the sky is blue today, it will most likely be blue tomorrow.”

They can also be nuanced, meaning that while the platform may be the same, there may be subtle differences within the subset. For example, if X families have geographically separated species with different color variants, there is a good chance that multiple Y variants exist.

ML methods

Machine learning uses a decision-making process that produces results based on the input data, which can be labeled or unlabeled. Most are equipped with an error function that evaluates the prediction of the model.

If examples are known, an error function can make a comparison to assess the accuracy of the model. If the model can better fit the data points in the training set, weights are adjusted to reduce the differences between the known example and the model estimate. The algorithm repeats the evaluation and optimization process, updating the weights autonomously until a certain level of accuracy is reached.

The methods (see figure above) used to achieve that accurate result fall into four main categories:

Learning under supervision

Supervised learning is defined by using labeled data sets to train algorithms that classify data or accurately predict outcomes. The learning algorithm receives a series of inputs and the corresponding correct output, and the algorithm learns by comparing the actual output with the correct output to find errors. It then adjusts the model accordingly. A cross-validation process is then used to ensure that the model avoids overfitting or underfitting.

Supervised learning helps organizations solve several real-world problems at scale, such as classifying spam in a separate folder from your inbox. Some methods used in supervised learning include neural networks, naive bayes, linear regression, logistic regression, random forest, support vector machines (SVM), and more.

Unsupervised Learning

Unsupervised learning is used against data without historical labels, meaning the system will not get the correct answer and the algorithm will have to figure out what is shown. The goal is to explore the data and find a structure or pattern hidden within it. This method works well on transaction data.

For example, it can identify segments of customers with similar characteristics that can then be treated similarly in marketing campaigns. Or it can find the key characteristics that separate customer segments.

Popular techniques include self-organizing maps, nearest neighbor mapping, k-means clustering, and singular value decomposition. These algorithms are also used to segment text topics, recommend items, and identify data outliers. In addition, they are used to reduce the number of features in a model through the process of dimensionality reduction, principal component analysis (PCA), and singular value decomposition (SVD). Other algorithms applied to unsupervised learning include neural networks, probabilistic clustering methods, and more.

Semi-supervised learning

This approach to ML offers a happy medium between the supervised and unsupervised methods. During training, it uses a smaller labeled dataset to guide classification and feature extraction from a larger, unlabeled dataset.

This type of learning can be used with methods such as classification, regression and prediction, and can solve the problem of not having enough labeled data (or not being able to label enough data) to train a guided learning algorithm. It is also useful when labeling costs are too high to allow for a fully labeled training process. Examples of semi-supervised learning are facial and object recognition.

Reinforced learning

Reinforcement learning is often associated with robotics, autonomous vehicles, gaming and navigation. This method allows the algorithm to discover through trial and error which actions yield the most significant rewards.

Three primary components are associated with this type of learning: the agent (the learner or decision maker), the environment (anything the agent interacts with), and actions (what the agent can do). The goal is for the agent to choose actions that maximize the expected reward over a period of time. By pursuing a good policy, the broker can quickly achieve the goal. So the goal of reinforcement learning is to learn the best policy.

Dimensionality Reduction

Dimensionality reduction is the task of reducing the number of features in a data set. Often there are too many variables to handle in ML tasks, such as regression or classification. These variables are also called features: the higher the number of features, the more difficult it is to model them. In addition, some of these features may be redundant, adding unnecessary noise to the data set.

Dimensionality reduction decreases the number of random variables considered by collecting a set of key variables, which can then be broken down into feature selection and feature extraction.

Applications

Many real-world applications use machine learning, including artificial neural networks (ANN), which are modeled after their biological counterparts. These consist of thousands or millions of processing nodes that are closely connected to perform many tasks, including speech recognition/translation, gaming, social networking, medical diagnosis and more.

For example, with Facebook, ML personalizes how a member’s feed is delivered. If the member stops frequently to read posts from certain groups, it will prioritize those activities earlier in the feed.

In addition, ML is used in speech applications, including speech-to-text, which uses natural language processing (NLP) to convert human language into text. It can also be found with digital assistants such as Siri and Alexa, which use speech recognition for application interaction. Automated customer service, recommendation engines, computer vision, climate science, and even agriculture are among the many other uses.

Leave a Comment

Your email address will not be published.