Hands-On Artificial Intelligence for Beginners
上QQ阅读APP看书,第一时间看更新

Supervised learning algorithms

Supervised algorithms rely on human knowledge to complete their tasks. Let's say we have a dataset related to loan repayment that contains several demographic indicators, as well as whether a loan was paid back or not:

The Paid column, which tells us if a loan was paid back or not, is called the target - it's what we would like to predict. The data that contains information about the applicants background is known as the features of the datasets. In supervised learning, algorithms learn to predict the target based on the features, or in other words, what indicators give a high probability that an applicant will pay back a loan or not? Mathematically, this process looks as follows:

Here, we are saying that our label  is a function of the input features , plus some amount of error  that it caused naturally by the dataset. We know that a certain set of features will likely produce a certain outcome. In supervised learning, we set up an algorithm to learn what function will produce the correct mapping of a set of features to an outcome. 

To illustrate how supervised learning works, we are going to utilize a famous example toy dataset in the machine learning field, the Iris Dataset. It shows four features: Sepal Length, Sepal Width, Petal Length, and Petal Width. In this dataset, our target variable (sometimes called a label) is Name. The dataset is available in the GitHub repository that corresponds with this chapter:

import pandas as pd
data = pd.read_csv("iris.csv")
data.head()

The preceding code generates the following output:

Now that we have our data ready to go, let's jump into some supervised learning!