What is Machine Learning?
If We want to tell what is Machine Learning in common language, then we can explain it as Machine Learning is field in which you become teacher of computer.
Definition by Arthur Samuel which is an older, informal definition: “the field of study that gives computers the ability to learn without being explicitly programmed.”
Tom Mitchell provides a more modern definition: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”
Example: drive a car.
E = the experience of driving a car in real situation
T = the task of driving a car.
P = the probability that the program will drive a car perfectly.
It also contains Deep Learning, Neural Network and AI as their sub-field.
We can classify any machine learning problem into 2 categories
1 Supervised Machine Learning
2 Unsupervised Machine Learning
1 Supervised Machine Learning
It is defined as computer learn from ready output, we give set of true output and by it computer try to figure out solution for next test cases
So if We give close look at example of car driving. In this we give road map and also give some code by which machine knows how to do reverse and take turn to left or right with certain angle.
So by this We can observe We give data to machine, and we want correct output as machine will reach at correct destination. In this scenario we don’t consider traffic problem or accident possibility.
Examples
Identifying the ZIP code from handwritten digits on an envelope
Detecting fraudulent activity in credit card transactions.
Normally We can divide SML into 2 types
i Regression problem
ii Classification problem
i Regression problem
Understand using Example:
If We want to understand nature of bitcoin or we want to check which is best date of month to buy bitcoin so by this, you could consider graph and by linear or parabolic manner you will check
(We will understand this in more digestive manner.)
ii Classification problem
In classification problem you have choices like discrete mathematics has and by this you can solve this problem.
You have given data of last 50 matches of team result and you have to predict that team will win, lost, or draw against opponent.
Here we have 3 choices we can state it like 0,1,2. And predict it by given data.
2 Unsupervised Machine Learning
In unsupervised learning, only the input data is known and there is no known
output data given to the algorithm. While there are many successful applications of
these methods as well, they are usually harder to understand and evaluate.
Examples of unsupervised learning include:
• Identifying topics in a set of blog posts.
• Check Mail is spam or not orgive label to mail and categorize
For both supervised and unsupervised learning tasks, it is important to have a representation of your input data that a computer can understand. Often it is helpful to think of your data as a table. Each data point that you want to reason about (each
email, each customer, each transaction) is a row, and each property that describes that data point (say the age of a customer, the amount or location of a transaction) is column.
There are few tools which you use for learning or applying machine learning is listed as below:
TensorFlow(with Python, Java, GO and C)
DeepLearn.js (Yaa…You get it right JavaScript framework)
You can also use Matlab or Octave for it.
Some Famous Machine Learning algorithms as follows
- Naïve Bayes Classifier Algorithm
- K Means Clustering Algorithm
- Support Vector Machine Algorithm
- Apriori Algorithm
- Linear Regression
- Logistic Regression
- Artificial Neural Networks
- Random Forests
- Decision Trees
- Nearest Neighbors (We will discuss it later)
So as we see ML is base of future technologies like AI, Deep learning, Computer Vision and Neural Network
Now let’s take small tour to Tensorflow and deeplearn
TensorFlow
TensorFlow is basically used for Deep Learning. It is an open source AI library, using data flow graphs to build models. It allows developers to create large-scale neural networks with many layers. It is widely used by Google ML developers TensorFlow is mainly used for: Classification, Perception, Understanding, Discovering, Prediction and Creation.
TensorFlow is commonly used for Image and voice recognition as text application and video detection
And it is also used for programs like Mode ZOO: https://github.com/tensorflow/models
Deeplearn.js
deeplearn.js was originally developed by the Google Brain PAIR team to build powerful interactive machine learning tools for the browser
As we stated earlier deeplearn.js is an open source hardware-accelerated JavaScript library for machine intelligence. deeplearn.js brings performant machine learning building blocks to the web, allowing you to train neural networks in a browser or run pre-trained models in inference mode.
Application
Deeplearn.js library perform machine learning through browser from client side machine and example are as follows:
https://deeplearnjs.org/index.html#demos
Here we demonstrate IRIS dataset.
This name is familiar for who has learning license of data science.
Actually first we understand what is IRIS.
Here we use python3 sklearn and matplotlib library so make sure you have installed that correctly.
History
Iris dataset is actually created by R.A. Fisher in July, 1988. This is perhaps the best known database to be found in the
pattern recognition literature. Fisher’s paper is a classic in the field and
is referenced frequently to this day.
Basic Info:
The data set contains 3 classes of 50 instances each, where each class refers to a
type of iris plant. One class is linearly separable from the other 2.
1st here we understand how we identify type of plant
It uses 4 features:
- petal length
- petal vidth
- setal length
- setal width
In python because of sklearn libraries you don’t need to give data explicitely.
It has many pre-loaded dataSets, like IRIS, boston, breast-cancer.
Now take a look at code:
from sklearn import datasets from sklearn.model_selection import train_test_split import matplotlib.pyplot as plt from sklearn.neighbors import KNeighborsClassifier import numpy as np
As we discuss earlier we have to import datasets, and much we import numpy for managing arrays, train_test_split module and matplotlib for show the graphical representation of train model.
iris=datasets.load_iris() X_train, X_test, y_train, y_test = train_test_split(iris['data'], iris['target'],random_state=0) print(iris.keys()) print(iris['DESCR'][:100000] + "\n...") print(iris['target_names']) print(iris['feature_names']) print(X_train.shape,X_test.shape)
Here 1st line will load the data set and store into the iris.
If you give close look st 2nd line you see there are 4 variables: X_train, X_test, y_train, y_test. We also use there train_test_split header file which shuffle and divide dataset in train and test sets in 75% and 25% respectively.
3rd line ouput:
dict_keys([‘data’, ‘target’, ‘target_names’, ‘DESCR’, ‘feature_names’])
This contains pre-loaded data, target value(0,1,2), target names,information of datasets and feature names.
5th, 6th, 7th lines output:
[‘setosa’ ‘versicolor’ ‘virginica’]
[‘sepal length (cm)’, ‘sepal width (cm)’, ‘petal length (cm)’, ‘petal width (cm)’]
(112, 4) (38, 4)
Now moves to graphical representation:
fig, ax = plt.subplots(3, 3, figsize=(15, 15)) plt.suptitle("iris_pairplot") for i in range(3): for j in range(3): ax[i, j].scatter(X_train[:, j], X_train[:, i + 1], c=y_train,marker='^',s=40) ax[i, j].set_xticks(()) ax[i, j].set_yticks(()) if i == 2: ax[i, j].set_xlabel(iris['feature_names'][j]) if j == 0: ax[i, j].set_ylabel(iris['feature_names'][i + 1]) if j > i: ax[i, j].set_visible(False) plt.show()
here we take 2 variables fig and ax and take 9 subplot of size(15,15) in plt and give name iris-pairplot
after that we scatter the plot at specific sub plot using scatter and five ticks at X and Y axis.
when you run code you see there is only 6 figures meanwhile we take 9 subplots. This happens at last we set visibility false at j>i.
so this figure shows our training summary.
At last we use KNN for predict our ouput
knn = KNeighborsClassifier(n_neighbors=3) knn.fit(X_train, y_train) KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',metric_params=None, n_jobs=1, n_neighbors=1, p=2,weights='uniform' X_new = np.array([[ 0.1 , 3. , 3.5, 1.4 ]]) prediction = knn.predict(X_new) print(prediction) print(iris['target_names'][prediction]) y_pred = knn.predict(X_test) print(knn.score(X_test, y_test))
Here we use our import file and set nearest neighbour is equal to 1 and now we take array of [1][4] dimension and use predict function of knn file.
which give valid result to our test case. You will also check whole test case and represent in graph.
Here last line gives output as accuracy of our program as 0.97 out of 1.
Thanks for paying an attention